The Tokens

🧠 Brains, Monty & Transformers at Cafe...

You’re sitting in a quiet room, sipping coffee. Outside, the world is silent, inside, billions of neurons fire, cells glow, memories strengthen, expectations form, and predictions whisper through your mind, all while consuming about the same energy as a dim light bulb. Every time...

01 December 2025

AI isn’t a miracle cure, it’s physiotherapy...

Only works if you’re willing to move. I’ve been reflecting on how exciting it would be for GPT and Kimi K2 to learn an even better version of AI Snake Oil in their next training! Just kidding, of course! “AI isn’t our replacement, it’s...

08 November 2025

Non Deterministic AI

You ever asked GPT the same question twice and gotten different answers? Even when you set the temperature to zero (which should make responses deterministic), variations still occur. This seemingly simple observation reveals a fascinating technical challenge that affects every large language model currently...

27 September 2025

Glorious AI Noise

Keep Your Sanity When AI Breaks the Speed Limit I often hear this question about AI, and it can feel overwhelming to read, learn, and adapt. The real challenge lies in knowing where to begin, where to pause, and how to navigate through the...

29 June 2025

AI: A Slightly Silly Look into the Future

Have been actively working on AI since before ChatGPT, and it has been amazing to watch the growth and speed at which AI models are changing every discussion and conversation around us. Saturday evening, I decided to pen down some of my thoughts. These...

17 May 2025

Flash Attention

Write up is not written by the GPT of any form, and believe me, it feels good to have typos and grammatical mistakes to feel more human. I do not want to talk about how neural Networks were inspired using brains, but I do...

10 May 2025

GRPO At Its Best

Wild world of fine-tuning large language models is where we feed math problems to a 7-billion-parameter beast (Qwen2.5-7B-Instruct), run it on 8 fire-breathing A100 GPUs, and politely ask it to get smarter without throwing a tantrum. This writeup dives into GRPO, a Reinforcement Learning...

30 March 2025

Watch My Models Learn

Fancy models have set the bar high, but guess what? My model is taking a different route by mastering the art of improvement on every forward and backward pass! Let’s explore the numbers that prove this learning leap. (P.S. If you’re new here, check...

01 February 2025

AI Leadership

AI is becoming increasingly prevalent in technology, with many products and features being developed in prototype stages. However, pressure to add “AI” to everything often doesn’t always lead to a meaningful impact. To effectively harness AI, it is essential to understand business needs and...

23 November 2024

GPT & Me 🧠

This write-up is A Neural Network Love Story (Spoiler: It’s Complicated), one neuron at a time – while GPT pretends not to notice! It is my hands-on experience training a Generative Pretrained Transformer with 124 million parameters - powered by 8 massive NVIDIA A100 GPUs,...

10 November 2024

Neural Networks and Coffee Breaks ☕

Step at a Time 📚 This writeup provides a beginner explanation for understanding and training GPT-2. I started by implementing a transformer decoder. You can visit mini-autograd and mini-models for my older work, and now I am slowly graduating to setting up, training, and...

03 November 2024