Fine-tuning & RL for LLMs: Intro to Post-training
Instructors: Sharon Zhou
Fine-tuning & RL for LLMs: Intro to Post-training
Intermediate
5 Courses
43 Video Lessons
8 Reading Lessons
6 Graded Assignments
Instructor: Sharon Zhou
AMD
Turn pretrained LLMs into production-ready models through post-training
Align a pretrained model for real tasks: use SFT and RLHF to improve instruction following, reasoning, and safer behavior.
Use evaluation to guide improvements: build evals that reveal problems, choose data and rewards accordingly, and iterate.
Get models ready for production, cost-aware: plan promotion and serving, monitor reliably, and account for compute and budget.
Why Enroll
Large language models are powerful, but raw pretrained models aren’t ready for production applications. Post-training is what adapts an LLM to follow instructions, show reasoning, and behave more safely.
Many developers still assume “LLMs inherently hallucinate,” or “only experts can tune models.” Recent advances have changed what’s feasible. If you ship LLM features (e.g., developer copilots, customer support agents, internal assistants) or work on ML/AI platform teams, understanding post-training is becoming a must-have skill.
This course, consisting of 5 modules and taught by Sharon Zhou (VP of AI at AMD and instructor to popular DeepLearning.AI courses), will guide you through various aspects of post-training:
- Post-training in the LLM lifecycle: Learn where post-training fits, key ideas in fine-tuning and RL, how models gain reasoning, and how these methods power products.
- Core techniques: Understand fine-tuning, RLHF, reward modeling, and RL algorithms (PPO, GRPO). Use LoRA for efficient fine-tuning.
- Evaluation and error analysis: Design evals, detect reward hacking, diagnose failures, and red team to test model robustness.
- Data for post-training: Prepare fine-tuning/LoRA datasets, combine fine-tuning + RLHF, create synthetic data, and balance data and rewards.
- From post-training to production: Learn industry-leading production pipelines, set go/no-go rules, and run data feeedback loops from your logs.
In partnership with
We built this course with AMD to bring post-training practices used in leading labs to working engineers. You’ll get hands-on labs powered by AMD GPUs, while the methods you learn remain hardware-agnostic.Who should join?
This course is designed for developers, ML engineers, software engineers, data scientists, and students who want to apply post-training techniques to production LLM systems. It’s also valuable for product managers and technical leaders who need to make informed decisions about post-training strategies and lead cross-functional teams working on LLM products.
To make the most of this course, we recommend strong familiarity with Python and a basic understanding of how LLMs work.
Instructor
Learner Reviews
Frequently Asked Questions
Course Outline
43 Video Lessons • 8 Reading Lessons • 6 Graded Assignments
Conversation between Sharon Zhou and Andrew Ng
Video • 10 mins
Background
Video • 5 mins
Where post-training (fine-tuning and RL) fits into LLM training
Video • 6 mins
Intuitions behind fine-tuning and RL
Video • 4 mins
Key components to making fine-tuning and RL work
Video • 10 mins
Post-training example: Reasoning
Video • 5 mins
Post-training example: Safety and security (RLAIF)
Video • 4 mins
Post-training in the wild
Video • 4 mins
Module 1: Quiz
Graded・Quiz • 30 mins
Module 1: Graded Lab
Graded・Code Assignment • 1 hour
Join the DeepLearning.AI Forum to ask questions, get support, or share amazing ideas!
Reading • 5 mins
Module 1 Lecture Notes
Reading • 1 min
Fine-tuning & RL for LLMs: Intro to Post-training
Intermediate
5 Courses
43 Video Lessons
8 Reading Lessons
6 Graded Assignments
Instructor: Sharon Zhou
AMD
Want to learn more about Generative AI?
Keep learning with updates on curated AI news, courses, and events, as well as Andrew's thoughts from DeepLearning.AI!
