Large Multimodal Models (LMMs)

2 Posts

AYA Vision architecture diagram showing vision encoder, multimodal merging, and LLM backbone for image processing
Large Multimodal Models (LMMs)

Equally Fluent in Many Languages: Cohere’s Aya Vision beats multilingual rivals in text & image understanding

Multilingual AI models often suffer uneven performance across languages, especially in multimodal tasks. A pair of lean models counters this trend with consistent understanding of text and images across major languages.
Phi-4 Mini multimodal architecture integrating vision, audio, and text with token merging and LoRA-adapted weights for AI processing.
Large Multimodal Models (LMMs)

Microsoft Tackles Voice-In, Text-Out: Microsoft’s Phi-4 Multimodal model can process text, images, and speech simultaneously

Microsoft debuted its first official large language model that responds to spoken input.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox