Understand text generation: see how transformers produce output one token at a time, and why that explains so much about their behavior.
Transformers in Practice
Instructor: Sharon Zhou
Earn a certificate with PRO
- Intermediate
- 3 hours 7 mins
- 19 Video Lessons
- 8 Code Examples
- 6 Graded Assignments PRO
- Earn a certificate with PRO
- Instructor: Sharon Zhou
AMD- Learn more aboutMembership PRO Plan
Understand what's actually happening inside your LLMs
Look inside the model: build intuition for what attention is really doing, how positional encoding works, and how layers combine to make predictions.
Optimize for production: learn how quantization, KV caching, and flash attention help transformers run efficiently on GPUs.
Why Enroll
If you’ve worked with LLMs, you’ve probably run into slow inference, out-of-memory errors, or hallucinations you couldn’t explain. There’s no shortage of resources on how transformers work, but most of them either ask you to build one from scratch or get lost in theory that doesn’t connect to the problems you’re actually facing.
Transformers in Practice is different. Taught by Sharon Zhou, VP of Engineering & AI at AMD, this course gives you a complete practical view of how transformers work, from how they generate text to what’s happening inside the model to how it all gets optimized to run on real hardware. Interactive visualizations throughout let you see key concepts in action and build intuition that actually sticks.
Here’s what you’ll learn:
- Model Behavior: You’ll learn how LLMs generate text through an autoregressive loop, selecting one token at a time from a probability distribution. You’ll see how sampling parameters like temperature shape the output, why hallucinations happen, and how techniques like RAG, constrained generation, and chain-of-thought reasoning all work within this same loop.
- Model Architecture and Attention: You’ll look inside the transformer to understand what attention is really doing, how positional encoding tracks token order, and how multiple layers and attention heads work together to turn an input sequence into a next-token prediction.
- Scaling and Deploying: You’ll learn why GPUs are well-suited for transformer inference and where the real bottlenecks are. You’ll build practical intuition for quantization, KV caching, flash attention, and speculative decoding, including the tradeoffs each one introduces for cost, speed, and output quality.
You’ll earn a certificate upon completing the course, recognizing your skills in transformer-based language models.
In partnership with
We built this course with AMD to help engineers move beyond treating LLMs as black boxes. You’ll build practical intuition for how transformers generate text, process context, and run efficiently on GPUs, while learning techniques and concepts that apply across transformer-based models and hardware environments.
Who should join?
This course is designed for software engineers, ML engineers, and developers who work with LLMs and want to understand what’s actually happening under the hood.
You don’t need to have built a model from scratch, but you should be comfortable using LLMs through an API or chat interface and have a basic understanding of neural network concepts like weights, layers, and training.
Course Outline
Transformers in Practice
- Conversation between Sharon Zhou and Andrew NgVideo・4 mins
- Transformers in practiceVideo・2 mins
- The autoregressive loopVideo・4 mins
- Visualization tutorialVideo・3 mins
- Visualization: The autoregressive LoopCode Example・10 mins
- Token samplingVideo・5 mins
- Visualization: Selecting the Next TokenCode Example・10 mins
- Autoregressive dynamicsVideo・3 mins
- Visualization: How Sampling Shapes OutputCode Example・10 mins
- Structured outputsVideo・4 mins
- Visualization: Constrained Generation with Finite State Machines (FSM)Code Example・10 mins
- Grounding in contextVideo・7 mins
- Thinking and reasoningVideo・5 mins
- Additional Readings for Module 1Reading・10 mins
- Module 1: Graded Lab
Graded・Code Assignment
・10 mins - Module 1: Quiz
Graded・Quiz
・30 mins - Join the DeepLearning.AI Forum to ask questions, get support, or share amazing ideas!Reading・1 min
- AttentionVideo・8 mins
- Visualization: Decoder-Only TransformersCode Example・10 mins
- Visualization: Interpretable Attention HeadsCode Example・10 mins
- Positional encodingVideo・4 mins
- Model layersVideo・5 mins
- DecodingVideo・4 mins
- Visualization: Decoding Intermediate LayersCode Example・10 mins
- Ungraded LabCode Example・10 mins
- Additional Readings for Module 2Reading・10 mins
- Module 2: Graded Lab
Graded・Code Assignment
・10 mins - Module 2: Quiz
Graded・Quiz
・30 mins
- Putting the model in productionVideo・4 mins
- QuantizationVideo・5 mins
- KV cacheVideo・3 mins
- Ungraded Lab: KV CacheCode Example・10 mins
- Flash attentionVideo・6 mins
- Visualization: Flash AttentionCode Example・10 mins
- Speculative decodingVideo・6 mins
- Visualization: Speculative DecodingCode Example・10 mins
- Issues that arise in productionVideo・5 mins
- Additional Readings for Module 3Reading・10 mins
- Module 3: Graded Lab
Graded・Code Assignment
・10 mins - Module 3: Quiz
Graded・Quiz
・30 mins - AcknowledgementsReading・1 min

Elevate your learning experience with Pro
Upgrade to Pro and gain unlimited accomplishments on your resume
Instructor
What Learners From Previous Courses Say About DeepLearning.AI
Jan Zawadzki
“Within a few minutes and a couple slides, I had the feeling that I could learn any concept. I felt like a superhero after this course. I didn’t know much about deep learning before, but I felt like I gained a strong foothold afterward.”
Kritika Jalan
“The whole specialization was like a one-stop-shop for me to decode neural networks and understand the math and logic behind every variation of it. I can say neural networks are less of a black box for a lot of us after taking the course.”
Chris Morrow – Deep Learning Specialization
“During my Amazon interview, I was able to describe, in detail, how a prediction model works, how to select the data, how to train the model, and the use cases in which this model could add value to the customer.”
Frequently Asked Questions
Please send an email to [email protected] to receive assistance.
The DeepLearning.AI Pro membership costs $25/mo billed annually and $30/mo billed monthly.
More pricing details are available on the membership page.
Important details:
- All prices are listed in USD
- Payments are processed securely via Stripe
- Taxes may apply depending on your location
Yes! If you’re a DeepLearning.AI Pro member, you’ll earn a certificate upon completing the course, recognizing your skills in AI prompting.
Join today and be on the forefront of the next generation of AI!
Want to learn more about Generative AI?
Keep learning with updates on curated AI news, courses, events, as well as Andrew’s thoughts from DeepLearning.AI!

