Short CourseIntermediate1 Hour 14 Minutes

Semantic Caching for AI Agents

Instructors: Tyler Hutcherson, Iliya Zhechev

Enroll for Free

Intermediate
1 Hour 14 Minutes
7 Video Lessons
5 Code Examples
Instructors: Tyler Hutcherson, Iliya Zhechev
Redis

What you'll learn

Understand how semantic caching reduces inference cost and latency by reusing model responses based on meaning instead of exact text.
Evaluate cache performance with metrics like hit rate, precision, and latency, and enhance it using techniques such as cross-encoders and LLM validation.
Build a faster AI agent by integrating semantic caching, minimize redundant calls, and deliver faster responses as the cache warms up.

About this course

Join our new short course, Semantic Caching for AI Agents! Learn from Tyler Hutcherson, Applied AI Engineering Lead, and Iliya Zhechev, Senior Research Engineer at Redis.

In this course, you’ll build a semantic cache that makes your AI agents faster and more cost-effective by recognizing when different questions mean the same thing. For example, when someone asks “How do I get a refund?” and another asks “I want my money back,” your cache will reuse the answer instead of making another API call, reducing the need for redundant model calls.

In detail, you’ll learn to:

Build your first semantic cache from scratch – Build a working cache to see how each component works, then implement it using Redis’ open source tools.
Measure cache effectiveness with key metrics – Track cache hit rate, precision, recall, and latency to understand your cache’s real impact.
Enhance cache accuracy with advanced techniques – Use threshold tuning, cross-encoders, LLM validation, and fuzzy matching to make your cache more effective.
Build a fast AI agent with semantic caching – Integrate semantic caching into an AI agent that reuses results, skips redundant work, and gets faster over time.

Start building AI agents that respond faster and cost less to run.

Who should join?

Developers and ML engineers familiar with Python, embeddings, and basic LLM applications who want to optimize their AI systems’ latency and cost. Experience with basic caching concepts is helpful but not required.