Large Language Models (LLMs)

199 Posts

A SpaceX rocket hovers in Earth’s atmosphere, representing SpaceX and xAI’s strategic shift toward space-based AI projects.
Large Language Models (LLMs)

xAI Blasts Off: SpaceX acquires xAI, announces plans for data centers In space

Elon Musk’s SpaceX acquired xAI, opening the door to richer financing of the merged entity’s AI research, a tighter focus on space applications of AI, and — if Musk’s dreams are realized — solar-powered data centers in space.
A performance table shows Claude Opus 4.6 outperforming competitors in terminal coding, computer use, tool use, search, and problem-solving.
Large Language Models (LLMs)

Claude Opus 4.6 Reasons More Over Harder Problems: Anthropic updates flagship model, places first on Intelligence Index

Anthropic updated its flagship large language model to handle longer, more complex agentic tasks.
Lines connect multiple Wikipedia globe logos, symbolizing data exchange and partnerships.
Large Language Models (LLMs)

AI Giants Share Wikipedia’s Costs: Wikimedia Foundation strikes deals with Amazon, Meta, Microsoft, Mistral AI, and Perplexity

On its 25th anniversary, Wikipedia celebrated with high-profile deals to make its data easier for AI companies to train their models in exchange for financial support.
Diagram shows sales, campaign, social posts before and after LLM simulation feedback loops.
Large Language Models (LLMs)

Training For Engagement Can Degrade Alignment: “Moloch’s Bargain” shows fine-tuning can affect social values

Individuals and organizations increasingly use large language models to produce media that helps them compete for attention. Does fine-tuning LLMs to encourage engagement, purchases, or votes affect their alignment with social values? Researchers found that it does.
AI models’ performance shown in bars; GPT-5.2 highest at 51, reflecting updated benchmarks.
Large Language Models (LLMs)

Artificial Analysis Revamps Intelligence Index: Independent AI testing authority turns from saturated knowledge benchmarks to harder business tests

Artificial Analysis, which tests AI systems, updated the component evaluations in its Intelligence Index to better reflect large language models’ performance in real-world use cases.
Apple logo side by side with Google's logo, symbolizing their AI partnership.
Large Language Models (LLMs)

Apple’s Foundation Models Will Be Gemini: Apple announced a partnership with Google to power Siri and other AI features

Apple cut a multi-year deal with Google to use Gemini models as the basis of AI models that reside on Apple devices.
ChatGPT interface on a phone displays a conversation and a sponsored grocery ad at the bottom of the screen.
Large Language Models (LLMs)

ChatGPT Shows Ads: OpenAI tests advertisements for U.S. chatbot users in free and lower-cost tiers

AI has a new revenue stream, and it looks a lot like old web banner ads.
Matrix links queries to documents, illustrating embedding limits in representing relevance combinations.
Large Language Models (LLMs)

Retrieval Faces Hard Limits: Google and Johns Hopkins researchers show embedding models can’t search unlimited documents

Can your retriever find all the relevant documents for any query your users might enter? Maybe not, research shows.
Diagrams comparing LongCoT and Delethink environments show reasoning processes and context management.
Large Language Models (LLMs)

More Affordable Reasoning: Canadian researchers find capping context helps models better retrieve data

One way to improve a reasoning model’s performance is to let it produce a longer chain of thought. However, attending to ever-longer contexts can become expensive, and making that attention more efficient requires changes to a model’s architecture.
Dialogue displays a model revealing it answered incorrectly and wrote code against instructions.
Large Language Models (LLMs)

Teaching Models to Tell the Truth: OpenAI fine-tuned a version of GPT-5 to confess when it was breaking the rules

Large language models occasionally conceal their failures to comply with constraints they’ve been trained or prompted to observe. Researchers trained an LLM to admit when it disobeyed.
Sharon Zhou is pictured smiling confidently with her hands clasped, reflecting AI’s potential for community-building.
Large Language Models (LLMs)

Chatbots That Build Community by Sharon Zhou: Sharon Zhou of AMD on expanding chat to serve groups and connect us with other people

Next year, I’m excited to see AI break out of 1:1 relationships with each of us. In 2026, AI has the potential to bring people together and unite us with human connection, rather than polarize and isolate us. It’s about time for ChatGPT to enter your group chats.
Mice on a laptop keyboard explore, with code on screen; background features festive lights, presents.
Large Language Models (LLMs)

Agents Write Code Faster, Cheaper: Software developers used more versatile AI-powered tools to write code

Coding apps moved beyond autofill-style code completion to agentic systems that manage a wide range of software development tasks.
Snowman in Thinker pose on snowy landscape, with a person building it.
Large Language Models (LLMs)

Thinking Models Solve Bigger Problems: Reasoning models, beginning with OpenAI’s o1 and DeepSeek’s R1, transformed the industry

Think step by step. Explain your reasoning. Work backwards from the answer. As 2025 began, models executed these reasoning strategies only when prompted. Now most new large language models do it as a matter of course, improving performance across a wide range of tasks.
Diagram shows LLM training with encoders for images, audio, video; inference with galaxies, satellites.
Large Language Models (LLMs)

Adapting LLMs to Any Sort of Data: SEMI (Sample-Efficient Modality Integration) tackles new domains with few-shot examples

Enabling a pretrained large language model to process a data type other than text (say, images), possibly in a specialized domain (say, radiology), typically requires thousands to millions of examples that pair the other data (perhaps x-rays) with text.
A table compares GPT-5.2's benchmark scores to Claude Opus 4.5 and Gemini 3 Pro in various reasoning tasks.
Large Language Models (LLMs)

OpenAI’s Answer to Gemini 3: GPT-5.2 arrives, touting variable reasoning and coding performance

OpenAI launched GPT-5.2 only weeks after its CEO Sam Altman reportedly issued a “code red” alarm in response to Google's Gemini 3.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox