Large Language Models (LLMs)

99 Posts

Diagram of Localize-and-Stitch merging fine-tuned models by combining critical weights into one model.
Large Language Models (LLMs)

Better Performance From Merged Models: Localize-and-Stitch improves methods for merging and fine-tuning multiple models

Merging multiple fine-tuned models is a less expensive alternative to hosting multiple specialized models. But, while model merging can deliver higher average performance across several tasks, it often results in lower performance on specific tasks. New work addresses this issue.
A narrow library aisle filled with shelves stacked with countless books.
Large Language Models (LLMs)

Massively More Training Text: Harvard unveils a million-book corpus for AI training

Harvard University amassed a huge new text corpus for training machine learning models.
Claude 3 Opus performs the Self-Exfiltration task, balancing renewable goals and corporate priorities.
Large Language Models (LLMs)

Models Can Use Tools in Deceptive Ways: Researchers expose AI models' deceptive behaviors

Large language models have been shown to be capable of lying when users unintentionally give them an incentive to do so. Further research shows that LLMs with access to tools can be incentivized to use them in deceptive ways.
Top use cases for Claude.ai, with percentages for tasks like app development and content creation.
Large Language Models (LLMs)

What LLM Users Want: Anthropic reveals how users interact with Claude 3.5

Anthropic analyzed 1 million anonymized conversations between users and Claude 3.5 Sonnet. The study found that most people used the model for software development and also revealed malfunctions and jailbreaks.
JOSEPH GONZALEZ
Large Language Models (LLMs)

Joseph Gonzalez: General intelligence

In 2025, I expect progress in training foundation models to slow down as we hit scaling limits and inference costs continue to rise.
A hand holding a snow globe with skaters and a snowman.
Large Language Models (LLMs)

Smaller Is Beautiful: Compact AI models redefine efficiency, bringing advanced capabilities to everyday devices

For years, the best AI models got bigger and bigger. But in 2024, some popular large language models were small enough to run on a smartphone.
Sleigh rides sign with pricing adjustments and hot cocoa.
Large Language Models (LLMs)

Prices Tumble: AI price wars drive costs down as competition heats up

Fierce competition among model makers and cloud providers drove down the price of access to state-of-the-art models.
Animation showcasing 7 key NLP topics visually expanding on the screen.
Large Language Models (LLMs)

When LLMs Propose Research Ideas: Stanford study finds AI matches human experts at writing research proposals

How do agents based on large language models compare to human experts when it comes to proposing machine learning research? Pretty well, according to one study.
Performance comparison for Gemini models across benchmarks.
Large Language Models (LLMs)

Multimodal Modeling on the Double: Google introduces Gemini 2.0 Flash, a faster, more capable AI model

Google’s Gemini 2.0 Flash, the first member of its updated Gemini family of large multimodal models, combines speed with performance that exceeds that of its earlier flagship model, Gemini 1.5 Pro, on several measures.
Benchmark results for Phi-4, GPT, LLaMA-3.3, and Qwen 2.5 models.
Large Language Models (LLMs)

Phi-4 Beats Models Five Times Its Size: Microsoft’s Phi-4 blends synthetic and organic data to surpass larger models in math and reasoning benchmarks

Microsoft updated its smallest model family with a single, surprisingly high-performance model.
Graph showing how training loss affects token prediction accuracy and hallucination elimination.
Large Language Models (LLMs)

Getting the Facts Right: A memory method that reduces hallucinations in LLMs

Large language models that remember more hallucinate less.
o1 Family Benchmarks comparing pass rates across AIME, Codeforces, and GPQA.
Large Language Models (LLMs)

Higher Reasoning: OpenAI debuts o1 and pro mode for $200/month

OpenAI launched not only its highly anticipated o1 model but also an operating mode that enables the model to deliver higher performance — at a hefty price.
Table comparing model performance on Mathvista, MMMU, ChartQA, DocVQA, and other tasks.
Large Language Models (LLMs)

Mistral’s Vision-Language Contender: Mistral unveils Pixtral Large, a rival to top vision-language models

Mistral AI unveiled Pixtral Large, which rivals top models at processing combinations of text and images.
Flow diagram of an application using LLMs to process prompts and tools for responses.
Large Language Models (LLMs)

Agents Open the Wallet: Stripe builds ecommerce agent toolkit for AI to securely spend money

One of the world’s biggest payment processors is enabling large language models to spend real money.
Illustration of a person holding a box with network nodes emerging from it.
Large Language Models (LLMs)

AI Power Couple Recommits: Amazon deepens Anthropic partnership with $4 billion investment

Amazon and Anthropic expanded their partnership, potentially strengthening Amazon Web Services’ AI infrastructure and lengthening the high-flying startup’s runway.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox