You might find more information here helpful https://sabareesh.com/posts/llm-intro/
But i am still in process of evaluating post training process with RL. RLHF is almost a mirage that shows what is possible but not the full capability of what model can do