Semantic Caching for AI Agents
Instructors: Tyler Hutcherson, Iliya Zhechev
- Intermediate
- 1 Hour 14 Minutes
- 7 Video Lessons
- 5 Code Examples
- Instructors: Tyler Hutcherson, Iliya Zhechev
What you'll learn
Understand how semantic caching reduces inference cost and latency by reusing model responses based on meaning instead of exact text.
Evaluate cache performance with metrics like hit rate, precision, and latency, and enhance it using techniques such as cross-encoders and LLM validation.
Build a faster AI agent by integrating semantic caching, minimize redundant calls, and deliver faster responses as the cache warms up.
About this course
Join our new short course, Semantic Caching for AI Agents! Learn from Tyler Hutcherson, Applied AI Engineering Lead, and Iliya Zhechev, Senior Research Engineer at Redis.
In this course, you’ll build a semantic cache that makes your AI agents faster and more cost-effective by recognizing when different questions mean the same thing. For example, when someone asks “How do I get a refund?” and another asks “I want my money back,” your cache will reuse the answer instead of making another API call, reducing the need for redundant model calls.
In detail, you’ll learn to:
- Build your first semantic cache from scratch – Build a working cache to see how each component works, then implement it using Redis’ open source tools.
- Measure cache effectiveness with key metrics – Track cache hit rate, precision, recall, and latency to understand your cache’s real impact.
- Enhance cache accuracy with advanced techniques – Use threshold tuning, cross-encoders, LLM validation, and fuzzy matching to make your cache more effective.
- Build a fast AI agent with semantic caching – Integrate semantic caching into an AI agent that reuses results, skips redundant work, and gets faster over time.
Start building AI agents that respond faster and cost less to run.
Who should join?
Developers and ML engineers familiar with Python, embeddings, and basic LLM applications who want to optimize their AI systems’ latency and cost. Experience with basic caching concepts is helpful but not required.
Course Outline
7 Lessons・5 Code ExamplesIntroduction
Video・3 mins
Overview of Semantic Caching
Video・9 mins
Build Your First Semantic Cache
Video with code examples・10 mins
Measuring Cache Effectiveness
Video with code examples・13 mins
Enhancing Cache Effectiveness
Video with code examples・12 mins
Fast AI Agent with Semantic Cache
Video with code examples・16 mins
Conclusion
Video・1 min
Quiz
Reading・9 mins
OPTIONAL: Project
Code examples・1 min
Instructors
Semantic Caching for AI Agents
- Intermediate
- 1 Hour 14 Minutes
- 7 Video Lessons
- 5 Code Examples
- Instructors: Tyler Hutcherson, Iliya Zhechev
Course access is free for a limited time during the DeepLearning.AI learning platform beta!
Want to learn more about Generative AI?
Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!


