During the openai gym era of RL, one of the great selling pts was that RL was ve...

al_th · 2025-03-11T08:16:46 1741681006

This is entirely doable.

I'm absolutely not versed in RL, but I wanted to understand GRPO, the RL algorithm behind Deepseek's latest model.

I started from a very simple LLM, inspired from Andrej Karpathy's "GPT from scratch" video (https://www.youtube.com/watch?v=kCc8FmEb1nY). Then, I added onto that the GRPO algorithm, which in itself is very simple.

I made a GitHub repo if you want to try it out : https://github.com/Al-th/grpo_experiment

363849473754 · 2025-03-11T12:10:11 1741695011

GRPO project is neat. Would you be willing to do a Karpathy-style explainer, breaking down the algorithm from scratch? It’s hard to understand on its own without prior background knowledge.

currymj · 2025-03-11T15:40:44 1741707644

Find materials on PPO which should be widespread since it is the most popular RL algorithm. GRPO works on the same principles, just makes certain estimates from samples rather than training an auxiliary neural network to make them.