← Back to Projects
Deep dive into LLM reasoning model
In ProgressJanuary 2026
As I wanted to better understand how to improve the reasoning of an LLM and how they differ from traditional foundational model, I decided to follow through the book "Building a reasoning model from scratch" from Sebastian Raschka.
Finetuning LLaMA 3.2 model using QLoRA to predict product price based on description. Data preparation and model evaluation against business objectives.
Reasoning ModelsLLMReinforcement LearningDistillationModel EvaluationInference Scaling
Key concepts
There are two ways to improve reasoning: Increasing training compute or Increasing inference scaling (aka. inference-time scaling)
Inference scaling methods
- CoT: Extending CoT response to prompt the model to explain its reasoning. Not all models reasoning benefits from CoT. Some use case can lead the models to "Overthinking" which is when the model generates erraneous explanations and mislead itself. CoT does not provide the model with new knowledge but instead helop the model use its existing knowledge.
- Self-consistency: Parallel sampling via self-consistency where the model generated multiple responses and selects the most frequent one. This is a techniques from Google research paper (https://arxiv.org/abs/2203.11171). It is a form of simple majority voting, where we use temperature scaling and top-p filtering to generate multiple answers and then select the most frequent one.
- Self-refinement: Iterative self-refinement where the model reviews and improves its own reasoning and answers across multiple steps.