Short CourseIntermediate1 Hour 33 Minutes

Multi-Vector Image Retrieval

Instructor: Kacper Łukawski

Enroll for Free

Intermediate
1 Hour 33 Minutes
7 Video Lessons
6 Code Examples
Instructor: Kacper Łukawski
Qdrant

What you'll learn

Understand multi-vector retrieval and how representing images as collections of patch-level embeddings enables more accurate search than single-vector methods.
Explore the ColPali model for image retrieval and apply optimization techniques like quantization and pooling to reduce memory usage while maintaining search quality.
Build a production multi-modal RAG system combining ColPali retrieval with the MUVERA model’s efficient search to handle complex documents mixing text and images.

About this course

Join our new short course, Multi-Vector Image Retrieval! Learn from Kacper Łukawski, Senior Developer Advocate at Qdrant.

Most retrieval systems represent each image with a single vector, but multi-vector techniques represent images as collections of smaller vectors—one for each patch. This detailed representation enables fine-grained matching between text query tokens and image patches, delivering higher-quality search on complex documents that combine text, images, and diagrams.

In this course, you’ll learn multi-vector retrieval concepts, implement ColPali for image search, and apply optimization techniques to make these systems production-ready. You’ll work with real course materials to build a complete multi-modal RAG system.

In detail, you’ll:

Learn multi-vector text retrieval with ColBERT – Understand how multi-vector retrieval works by implementing ColBERT for text, then explore the computational and memory challenges of late interaction search.
Implement ColPali for multi-vector image retrieval – Apply multi-vector concepts to images using ColPali, which adapts vision language models to create patch-level embeddings for fine-grained visual search.
Optimize ColPali’s memory footprint – Apply scalar and binary quantization, plus row, column, and hierarchical pooling techniques to dramatically reduce memory usage while preserving search quality.
Enable fast search with MUVERA embeddings – Convert multi-vector representations into high-dimensional single vectors using MUVERA, unlocking HNSW search for significantly faster retrieval.
Build a multi-modal RAG system with ColPali – Combine ColPali retrieval, memory optimizations, and MUVERA search into a complete RAG pipeline that retrieves and reasons over visual documents.

Start building retrieval systems that understand images at the patch level and deliver accurate multi-modal search.

Who should join?

AI builders working with multi-modal data who want to implement advanced image retrieval. Familiarity with Python and vector embeddings is recommended.

Course Outline

7 Lessons・6 Code Examples

Introduction
Video・3 mins
Multi-vector Text Retrieval: ColBERT
Video with code examples・17 mins
Multi-vector Image Retrieval: ColPali
Video with code examples・15 mins
Optimizing retrieval with multi vector representations
Video with code examples・15 mins
MUVERA Embeddings
Video with code examples・18 mins
Building multi-modal RAG with ColPali
Video with code examples・11 mins
Conclusion
Video・1 min
Optional: Hands-On Project
Code examples・1 min
Quiz
Reading・10 mins

Instructor

Kacper Łukawski

Developer Relations Lead at Qdrant