Large Multimodal Models

1 Post

Inference time procedure for GILL
Large Multimodal Models

Text or Images, Input or Output: GILL, an innovative approach to multimodal model training

GPT-4V introduced a large multimodal model that generates text from images and, with help from DALL-E 3, generates images from text. However, OpenAI hasn’t fully explained how it built the system. A separate group of researchers described their own method.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox