Large Language Models (LLMs)

194 Posts

Apple logo side by side with Google's logo, symbolizing their AI partnership.
Large Language Models (LLMs)

Apple’s Foundation Models Will Be Gemini: Apple announced a partnership with Google to power Siri and other AI features

Apple cut a multi-year deal with Google to use Gemini models as the basis of AI models that reside on Apple devices.
ChatGPT interface on a phone displays a conversation and a sponsored grocery ad at the bottom of the screen.
Large Language Models (LLMs)

ChatGPT Shows Ads: OpenAI tests advertisements for U.S. chatbot users in free and lower-cost tiers

AI has a new revenue stream, and it looks a lot like old web banner ads.
Matrix links queries to documents, illustrating embedding limits in representing relevance combinations.
Large Language Models (LLMs)

Retrieval Faces Hard Limits: Google and Johns Hopkins researchers show embedding models can’t search unlimited documents

Can your retriever find all the relevant documents for any query your users might enter? Maybe not, research shows.
Diagrams comparing LongCoT and Delethink environments show reasoning processes and context management.
Large Language Models (LLMs)

More Affordable Reasoning: Canadian researchers find capping context helps models better retrieve data

One way to improve a reasoning model’s performance is to let it produce a longer chain of thought. However, attending to ever-longer contexts can become expensive, and making that attention more efficient requires changes to a model’s architecture.
Dialogue displays a model revealing it answered incorrectly and wrote code against instructions.
Large Language Models (LLMs)

Teaching Models to Tell the Truth: OpenAI fine-tuned a version of GPT-5 to confess when it was breaking the rules

Large language models occasionally conceal their failures to comply with constraints they’ve been trained or prompted to observe. Researchers trained an LLM to admit when it disobeyed.
Sharon Zhou is pictured smiling confidently with her hands clasped, reflecting AI’s potential for community-building.
Large Language Models (LLMs)

Chatbots That Build Community by Sharon Zhou: Sharon Zhou of AMD on expanding chat to serve groups and connect us with other people

Next year, I’m excited to see AI break out of 1:1 relationships with each of us. In 2026, AI has the potential to bring people together and unite us with human connection, rather than polarize and isolate us. It’s about time for ChatGPT to enter your group chats.
Mice on a laptop keyboard explore, with code on screen; background features festive lights, presents.
Large Language Models (LLMs)

Agents Write Code Faster, Cheaper: Software developers used more versatile AI-powered tools to write code

Coding apps moved beyond autofill-style code completion to agentic systems that manage a wide range of software development tasks.
Snowman in Thinker pose on snowy landscape, with a person building it.
Large Language Models (LLMs)

Thinking Models Solve Bigger Problems: Reasoning models, beginning with OpenAI’s o1 and DeepSeek’s R1, transformed the industry

Think step by step. Explain your reasoning. Work backwards from the answer. As 2025 began, models executed these reasoning strategies only when prompted. Now most new large language models do it as a matter of course, improving performance across a wide range of tasks.
Diagram shows LLM training with encoders for images, audio, video; inference with galaxies, satellites.
Large Language Models (LLMs)

Adapting LLMs to Any Sort of Data: SEMI (Sample-Efficient Modality Integration) tackles new domains with few-shot examples

Enabling a pretrained large language model to process a data type other than text (say, images), possibly in a specialized domain (say, radiology), typically requires thousands to millions of examples that pair the other data (perhaps x-rays) with text.
A table compares GPT-5.2's benchmark scores to Claude Opus 4.5 and Gemini 3 Pro in various reasoning tasks.
Large Language Models (LLMs)

OpenAI’s Answer to Gemini 3: GPT-5.2 arrives, touting variable reasoning and coding performance

OpenAI launched GPT-5.2 only weeks after its CEO Sam Altman reportedly issued a “code red” alarm in response to Google's Gemini 3.
Table highlights Opus 4.5’s superior scores in coding and reasoning compared to other AI models.
Large Language Models (LLMs)

Claude Does More With Fewer Tokens: Claude Opus 4.5 retakes the coding crown at one-third the price of its predecessor

Claude Opus 4.5, the latest version of Anthropic’s flagship model, extends the earlier version’s strengths in coding, computer use, and agentic workflows while generating fewer tokens.
Diagram shows AI traits with pipelines for "evil" vs. "helpful" responses to user queries on animal treatment.
Large Language Models (LLMs)

Toward Steering LLM Personality: Persona Vectors allow model builders to identify and edit out sycophancy, hallucinations, and more

Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.
Two figures, symbolizing Microsoft and Anthropic, handshake to represent partnership and collaboration.
Large Language Models (LLMs)

Microsoft and Anthropic Form Alliance: Claude becomes the first leading language model available from all three cloud giants

Having recently revised its agreement with longtime partner OpenAI, Microsoft pledged to invest billions of dollars in Anthropic, one of OpenAI’s top competitors.
Image illustrates the Self-Search method, simulating web searches to improve model accuracy in tests.
Large Language Models (LLMs)

More-Efficient Agentic Search: Researchers fine-tune models to search their own parameters to boost recall

Large language models may have learned knowledge that’s relevant to a given prompt, but they don’t always recall it consistently. Fine-tuning a model to search its parameters as though it were searching the web can help it find knowledge in its own weights.
Chart highlights Kimi K2’s top performance in agentic tasks, outperforming rivals in reasoning and coding.
Large Language Models (LLMs)

Top Agentic Results, Open Weights: Kimi K2 Thinking outperforms proprietary models with new techniques for agentic tool use

The latest open-weights large language model from Moonshot AI challenges top proprietary LLMs at agentic tasks by executing hundreds of tool calls sequentially and pausing to think between each.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox