4-Bit Efficiency, 16-Bit Accuracy: Microsoft researchers show that heavily quantized versions of Llama can perform as well as near-full-precision
Using an 8-bit number format like FP8 during training saves computation compared to 16- or 32-bit formats, but it can yield less-accurate results. Researchers trained models using 4-bit numbers without sacrificing accuracy.
Text-Only LLM Goes Multimodal: LLMs learn to caption images, video, and audio without further training
Large language models excel at processing text but can’t interpret images, video, or audio directly without further training on those media types. Researchers devised a way to overcome this limitation.
OpenAI Launches Cost-Effective Alternatives: OpenAI replaces GPT-4.5 with GPT-4.1 Family, plus o3 and o4-mini, new models focused on reasoning and coding
OpenAI refreshed its roster of models and scheduled the largest, most costly one for removal.
Better Than Trees for Tabular Data: Transformers can outperform decision trees at predicting unlabeled spreadsheet cells
If you have a collection of variables that represent, say, a cancer patient and you want to classify the patient’s illness as likely cancer or not, algorithms based on decision trees, such as gradient-boosted trees, typically perform better than neural networks.
Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode
Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.
Better Performance From Merged Models: Localize-and-Stitch improves methods for merging and fine-tuning multiple models
Merging multiple fine-tuned models is a less expensive alternative to hosting multiple specialized models. But, while model merging can deliver higher average performance across several tasks, it often results in lower performance on specific tasks. New work addresses this issue.
Higher Reasoning: OpenAI debuts o1 and pro mode for $200/month
OpenAI launched not only its highly anticipated o1 model but also an operating mode that enables the model to deliver higher performance — at a hefty price.
Benchmark Tests Are Meaningless: The problem with training data contamination in machine learning
The universe of web pages includes correct answers to common questions that are used to test large language models. How can we evaluate new models if they’ve studied the answers before we give them the test?
Models Ranked for Hallucinations: Measuring language model hallucinations during information retrieval
How often do large language models make up information when they generate text based on a retrieved document? A study evaluated the tendency of popular models to hallucinate while performing retrieval-augmented generation (RAG).
Load More
Subscribe to The Batch
Stay updated with weekly AI News and Insights delivered to your inbox