Align a pretrained model for real tasks: use SFT and RLHF to improve instruction following, reasoning, and safer behavior.
Fine-tuning & RL for LLMs: Intro to Post-training
Instructor: Sharon Zhou
Earn a certificate with PRO
- Intermediate
- 6 hours 10 mins
- 43 Video Lessons
- 11 Graded Assignments PRO
- Earn a certificate with PRO
- Instructor: Sharon Zhou
AMD- Learn more aboutMembership PRO Plan
Turn pretrained LLMs into production-ready models through post-training
Use evaluation to guide improvements: build evals that reveal problems, choose data and rewards accordingly, and iterate.
Get models ready for production, cost-aware: plan promotion and serving, monitor reliably, and account for compute and budget.
Why Enroll
Large language models are powerful, but raw pretrained models aren’t ready for production applications. Post-training is what adapts an LLM to follow instructions, show reasoning, and behave more safely.
Many developers still assume “LLMs inherently hallucinate,” or “only experts can tune models.” Recent advances have changed what’s feasible. If you ship LLM features (e.g., developer copilots, customer support agents, internal assistants) or work on ML/AI platform teams, understanding post-training is becoming a must-have skill.
This course, consisting of 5 modules and taught by Sharon Zhou (VP of AI at AMD and instructor to popular DeepLearning.AI courses), will guide you through various aspects of post-training:
- Post-training in the LLM lifecycle: Learn where post-training fits, key ideas in fine-tuning and RL, how models gain reasoning, and how these methods power products.
- Core techniques: Understand fine-tuning, RLHF, reward modeling, and RL algorithms (PPO, GRPO). Use LoRA for efficient fine-tuning.
- Evaluation and error analysis: Design evals, detect reward hacking, diagnose failures, and red team to test model robustness.
- Data for post-training: Prepare fine-tuning/LoRA datasets, combine fine-tuning + RLHF, create synthetic data, and balance data and rewards.
- From post-training to production: Learn industry-leading production pipelines, set go/no-go rules, and run data feeedback loops from your logs.
In partnership with
We built this course with AMD to bring post-training practices used in leading labs to working engineers. You’ll get hands-on labs powered by AMD GPUs, while the methods you learn remain hardware-agnostic.
Who should join?
This course is designed for developers, ML engineers, software engineers, data scientists, and students who want to apply post-training techniques to production LLM systems. It’s also valuable for product managers and technical leaders who need to make informed decisions about post-training strategies and lead cross-functional teams working on LLM products.
To make the most of this course, we recommend strong familiarity with Python and a basic understanding of how LLMs work.
Course Outline
Fine-tuning & RL for LLMs: Intro to Post-training
- Conversation between Sharon Zhou and Andrew NgVideo・10 mins
- BackgroundVideo・5 mins
- Where post-training (fine-tuning and RL) fits into LLM trainingVideo・6 mins
- Intuitions behind fine-tuning and RLVideo・4 mins
- Key components to making fine-tuning and RL workVideo・10 mins
- Post-training example: ReasoningVideo・5 mins
- Post-training example: Safety and security (RLAIF)Video・4 mins
- Post-training in the wildVideo・4 mins
- Module 1: Quiz
Graded・Quiz
・30 mins - Module 1: Graded Lab
Graded・Code Assignment
・1 hour - Join the DeepLearning.AI Forum to ask questions, get support, or share amazing ideas!Reading・5 mins
- Module 1 Lecture NotesReading・1 min
- Data: What you need and how to prepare itVideo・7 mins
- Data: Tokens for models to read/write DataVideo・10 mins
- Fine-tuning math: Loss, gradients, weight updates (Part 1)Video・6 mins
- Fine-tuning math: Loss, gradients, weight updates (Part 2)Video・5 mins
- Fine-tuning: Hyperparameters & hyperparameter tuning (Part 1)Video・7 mins
- Fine-tuning: Hyperparameters & hyperparameter tuning (Part 2)Video・4 mins
- Module 2 Graded Lab: 1
Graded・Code Assignment
・1 hour - Fine-tuning: Parameter efficient fine-tuning (PEFT)Video・9 mins
- RL: Rewards and preference learningVideo・9 mins
- RL: Training objective and RLHFVideo・8 mins
- RL: PPO and GRPO AlgorithmsVideo・8 mins
- Module 2: Quiz
Graded・Quiz
・30 mins - Module 2 Graded Lab: 2
Graded・Code Assignment
・1 hour - Module 2 Lecture NotesReading・1 min
- Why evals are the north starVideo・2 mins
- Evals for post-training: Test sets and metricsVideo・8 mins
- RL test environments and monitoring RL updatesVideo・5 mins
- Reward hackingVideo・4 mins
- Error analysis: Why it mattersVideo・2 mins
- Error analysis: Diagnosing errors & interventionsVideo・4 mins
- Error analysis: errors → causes → fixesReading・10 mins
- How to invest in good evalsVideo・5 mins
- Red Teaming: Real world failuresVideo・4 mins
- Module 3: Quiz
Graded・Quiz
・30 mins - Module 3: Graded Lab
Graded・Code Assignment
・1 hour - Module 3 Lecture NotesReading・1 min
- How much data you need for post-trainingVideo・7 mins
- Data for fine-tuningVideo・6 mins
- Data for RL (Part 1)Video・7 mins
- Data for RL (Part 2)Video・4 mins
- Putting it togetherVideo・2 mins
- Synthetic data pipelinesVideo・7 mins
- Template engineeringVideo・4 mins
- Constitutional AI, revisitedVideo・5 mins
- Balancing data and rewardsVideo・5 mins
- Module 4: Quiz
Graded・Quiz
・30 mins - Module 4: Graded Lab
Graded・Code Assignment
・1 hour - Module 4 Lecture NotesReading・1 min
- A production post-training pipelineVideo・7 mins
- AgentsVideo・9 mins
- RL promotion rules (go/no-go)Video・6 mins
- Data-feedback flywheelVideo・5 mins
- Monitoring and observabilityVideo・4 mins
- Infrastructure (Part 1)Video・4 mins
- Infrastructure (Part 2)Video・7 mins
- Production-ready checklistVideo・3 mins
- Module 5: Quiz
Graded・Quiz
・30 mins - Module 5: Graded Lab
Graded・Code Assignment
・1 hour - AcknowledgmentsReading・1 min
- Module 5 Lecture NotesReading・1 min

Elevate your learning experience with Pro
Upgrade to Pro and gain unlimited accomplishments on your resume
Instructor
What Learners From Previous Courses Say About DeepLearning.AI
Jan Zawadzki
“Within a few minutes and a couple slides, I had the feeling that I could learn any concept. I felt like a superhero after this course. I didn’t know much about deep learning before, but I felt like I gained a strong foothold afterward.”
Kritika Jalan
“The whole specialization was like a one-stop-shop for me to decode neural networks and understand the math and logic behind every variation of it. I can say neural networks are less of a black box for a lot of us after taking the course.”
Chris Morrow – Deep Learning Specialization
“During my Amazon interview, I was able to describe, in detail, how a prediction model works, how to select the data, how to train the model, and the use cases in which this model could add value to the customer.”
Frequently Asked Questions
We recommend starting with a beginner course such as the Machine Learning Specialization.
Yes! This course is perfect for anyone with a background in Python ready to dive deeper into the post-training of large language models.
Please send an email to [email protected] to receive assistance.
The DeepLearning.AI Pro membership costs $25/mo billed annually and $30/mo billed monthly.
More pricing details are available on the membership page.
Important details:
- All prices are listed in USD
- Payments are processed securely via Stripe
- Taxes may apply depending on your location
Yes! You’ll earn a certificate upon completing the course, recognizing your skills in post-training large language models
Join today and be on the forefront of the next generation of AI!
Want to learn more about Generative AI?
Keep learning with updates on curated AI news, courses, events, as well as Andrew’s thoughts from DeepLearning.AI!
