Dear friends,
Another year of rapid AI advances has created more opportunities than ever for anyone — including those just entering the field — to build software. In fact, many companies just can’t find enough skilled AI talent. Every winter holiday, I spend some time learning and building, and I hope you will too. This helps me sharpen old skills and learn new ones, and it can help you grow your career in tech.
To be skilled at building AI systems, I recommend that you:
- Take AI courses
- Practice building AI systems
- (Optionally) read research papers
Let me share why each of these is important.
I’ve heard some developers advise others to just plunge into building things without worrying about learning. This is bad advice! Unless you’re already surrounded by a community of experienced AI developers, plunging into building without understanding the foundations of AI means you’ll risk reinventing the wheel or — more likely — reinventing the wheel badly!
For example, during interviews with job candidates, I have spoken with developers who reinvented standard RAG document chunking strategies, duplicated existing evaluation techniques for Agentic AI, or ended up with messy LLM context management code. If they had taken a couple of relevant courses, they would have better understood the building blocks that already exist. They could still rebuild these blocks from scratch if they wished, or perhaps even invent something superior to existing solutions, but they would have avoided weeks of unnecessary work. So structured learning is important! Moreover, I find taking courses really fun. Rather than watching Netflix, I prefer watching a course by a knowledgeable AI instructor any day!

At the same time, taking courses alone isn’t enough. There are many lessons that you’ll gain only from hands-on practice. Learning the theory behind how an airplane works is very important to becoming a pilot, but no one has ever learned to be a pilot just by taking courses. At some point, jumping into the pilot's seat is critical! The good news is that by learning to use highly agentic coders, the process of building is the easiest it has ever been. And learning about AI building blocks might inspire you with new ideas for things to build. If I’m not feeling inspired about what projects to work on, I will usually either take courses or read research papers, and after doing this for a while, I always end up with many new ideas. Moreover, I find building really fun, and I hope you will too!
Finally, not everyone has to do this, but I find that many of the strongest candidates on the job market today at least occasionally read research papers. While I find research papers much harder to digest than courses, they contain a lot of knowledge that has not yet been translated to easier-to-understand formats. I put this much lower priority than either taking courses or practicing building, but if you have an opportunity to strengthen your ability to read papers, I urge you to do so too. (You can also watch an old video where I give advice on reading papers.) I find taking courses and building to be fun, and reading papers can be more of a grind, but the flashes of insight I get from reading papers are delightful.
Have a wonderful winter holiday and a Happy New Year. In addition to learning and building, I hope you'll also spend time with loved ones — that, too, is important!
Love,
Andrew
Top AI Stories of 2025

A New Era Dawns
2025 may be remembered as the dawn of AI’s industrial age. Innovations propelled model performance to new heights, AI-driven applications became indispensable, top companies battled over skilled practitioners, and construction of infrastructure drove the U.S. gross domestic product. As in past winter holiday seasons, this special issue of The Batch traces the major themes of the past 12 months. The coming year promises to consolidate these changes as we weave the technology more tightly into the fabric of daily life.

Thinking Models Solve Bigger Problems
Think step by step. Explain your reasoning. Work backwards from the answer. As 2025 began, models executed these reasoning strategies only when prompted. Now most new large language models do it as a matter of course, improving performance across a wide range of tasks.
What happened: Late last year, OpenAI introduced the first reasoning, or “thinking,” model, o1, which baked in an agentic reasoning workflow. In January, DeepSeek-R1 showed the rest of the world how to build such capabilities. The result: immediate improvements in math and coding performance, more accurate answers to questions, more capable robots, and rapid progress in AI agents.
Driving the story: An early form of reasoning took off with “Large Language Models Are Zero-Shot Reasoners,” the paper that introduced the prompt addendum, “let’s think step by step.” The authors found that manually adding these words to a prompt improved a model’s output. Researchers soon realized they could train this capability into models so they would employ this and other reasoning strategies without explicit prompting. The key: fine-tuning via reinforcement learning (RL). Giving a pretrained LLM a reward for producing correct output trained it to “think” things through before it generated output.
- The first few reasoning models were trained via RL specifically to solve math problems correctly, answer science questions accurately, and/or generate code that passed unit tests. This enabled o1-preview, for instance, to outperform its non-reasoning predecessor GPT-4o by 43 percentage points on AIME 2024 (competition math problems) and 22 percentage points on GPQA Diamond (PhD-level science questions), while it completed Codeforces’ coding problems in the 62nd percentile relative to competitive human coders, compared to GPT-4o’s 11th percentile.
- Reasoning models performed even better when they learned to use tools like calculators, search engines, or bash terminals. For example, on a challenging test of multimodal understanding and technical expertise in 100 domains, OpenAI o4-mini with tools achieved 17.7 percent accuracy, more than 3 points higher than it managed without tools.
- Robotic action models have been trained to reason via RL. For example, rewarding ThinkAct for reaching a goal position yielded roughly an 8 percent performance improvement on robotics tasks compared to non-thinking models like OpenVLA.
- Reasoning models also help agents to tackle difficult problems. For instance, AlphaEvolve used Google Gemini to repeatedly generate, evaluate, and change code, ultimately producing faster algorithms for real-world problems. Similarly, AI Co-Scientist used Gemini to generate scientific research proposals and then review, rank, and improve them. Among other results, it proposed a hypothesis to answer a longstanding question about microbial resistance to antibiotics. Human scientists independently proposed and validated the same hypothesis at about the same time.
Yes, but: Reasoning models may not be as rational as they seem.
- In a controversial paper, Apple concluded that reasoning models couldn’t solve puzzles beyond a certain level of complexity, even when the models were given algorithms that solved them. The models’ inability to apply the algorithms calls into question apparent similarities between machine and human reasoning.
- Anthropic found that, while a model’s reasoning steps can help to explain how it reached a conclusion, they may also omit crucial information that contributed to the conclusion. For instance, including reasoning models can be led to produce a particular output by including a hint in the prompt, but their reasoning steps may fail to mention the hint.
Where things stand: Reasoning dramatically improves LLM performance. However, better output comes at a cost. Gemini 3 Flash with reasoning enabled used 160 million tokens to run the benchmarks in Artificial Analysis’ Intelligence Index (and achieved a score of 71), while Gemini 3 Flash without reasoning used 7.4 million tokens (achieving a much lower score of 55). Moreover, generating reasoning tokens can delay output, adding to pressure on LLM inference providers to serve tokens faster. But researchers are finding ways to make the process more efficient. Claude Opus 4.5 and GPT-5.1 set to high reasoning achieve the same Intelligence Index score, but the former uses 48 million tokens, while the latter uses 81 million.

Big AI Lures Talent With Huge Pay
Leading AI companies fought a ferocious war for talent, luring top talent from competitors with levels of compensation more commonly associated with pro sports.
What happened: In July, Meta launched a hiring spree to staff the new Meta Superintelligence Labs, offering up to hundreds of millions of dollars to researchers from OpenAI, Google, Anthropic, and other top AI companies. The offers included large cash bonuses and compensation for equity forfeited by leaving another company. Meta’s rivals, in turn, poached key employees from Meta and each other, driving up the market value of AI talent to unprecedented levels.
Driving the story: Meta upended traditional pay structures by offering pay packages worth as much as $300 million over four years with liquid compensation that sometimes vastly exceeded the stock options that, at other companies, vest over many years. Having hired Scale AI CEO Alexandr Wang and key members of his team, Meta chief Mark Zuckerberg compiled a wish list, The Wall Street Journal reported.
- Zuckerberg made house calls to convince people to jump ship, sometimes bringing homemade soup. The effort netted, among others, OpenAI’s Jason Wei and Hyung Won Chung, two researchers who worked on reasoning models.
- Andrew Tulloch, who had co-founded Thinking Machines Lab with OpenAI’s ex-CTO Mira Murati, initially turned down a package at Meta that included bonuses worth $1.5 billion, The Wall Street Journal reported. A few months later, he changed his mind and joined Meta.
- Meta hired Ruoming Pang, who oversaw AI models at Apple. The pay package came to hundreds of millions of dollars over several years, Bloomberg reported. Meta’s offer exceeded the pay packages of top Apple leaders who weren’t its CEO, and Apple declined to counter it.
- Amid the turnover, Microsoft AI CEO Mustafa Suleyman took more than 20 researchers and engineers from Google, including Amar Subramanya, who was vice president of engineering.
- Elon Musk’s xAI hired over a dozen AI researchers and engineers from Meta. Musk decried his rival’s “insane” offers and touted his company’s “hyper merit-based” culture and greater potential for growth of equity.
Behind the news: The trajectory of salaries for AI engineers reflects AI’s evolution from academic curiosity to revolutionary technology.
- In 2011, when Google Brain started under the direction of Andrew Ng, AI talent was concentrated in academia. As neural networks found their way into commercial products like search engines and AI assistants, machine learning engineer roles became a standard corporate tier.
- In 2014, when Google acquired DeepMind, AI salaries significantly outpaced those in software engineering in general. DeepMind’s staffing costs came to around $345,000 per employee, The New York Times estimated. By 2017, when Google introduced the transformer architecture, top compensation had risen as high as $500,000.
- Circa 2023, with the rise of ChatGPT compensation took another jump. Pay packages for top-level software engineers reached beyond $700,000, according to one report.
Where things stand: As 2026 begins, the AI hiring landscape is much changed. To fend off recruiters, OpenAI has offered more stock-based compensation than its competitors, accelerated the vesting schedule for stock options awarded to new employees, and handed out retention bonuses as high as $1.5 million, The Wall Street Journal reported. Despite talk of an AI bubble in 2025, high salaries are rational for companies that plan to spend tens of billions of dollars to build AI data centers: If you’re spending that much on hardware, why not spend a small percentage of the outlay on salaries?

Data-Center Buildout Goes Big
Top AI companies announced plans to build data centers that are projected to burn through trillions of dollars and gigawatts of electricity in the next few years.
What happened: The AI industry’s capital spending topped $300 billion this year alone, much of it allocated to building new data centers to process AI. This was a preliminary budget, as companies mapped out ambitious plans to construct facilities the size of small towns with the energy needs of medium-size cities. The race to build enough processing power to satisfy hoped-for demand for inference and training could cost $5.2 trillion by 2030, the consultancy McKinsey & Company projected.
Driving the story: Top AI companies announced a cascade of data-center projects across the world. Each gigawatt of data-center capacity will cost roughly $50 billion to build.
- In January, OpenAI launched Stargate, a $500 billion project in partnership with Oracle, SoftBank, and the Emirati investment firm MGX. Ultimately the company announced plans to build 20 gigawatts of data-center capacity worldwide and forecast demand up to 5 times that amount. OpenAI CEO Sam Altman said he hopes eventually to add 1 gigawatt of capacity per week.
- Meta spent around $72 billion on infrastructure projects in 2025, primarily in the U.S., a figure executives said would rise considerably in 2026. The company’s Hyperion project includes a $27 billion, 5-gigawatt data center in rural Louisiana. The project’s funding deal will keep the assets and debt off Meta’s books.
- Microsoft spent $80 billion on data center projects worldwide in 2025, including facilities in Wisconsin and Atlanta that will be connected by a dedicated fiber-optic network to operate as a massive supercomputer. To supply electricity, the company signed a 20-year deal to restart the Three Mile Island nuclear reactor in Pennsylvania, which will supply 835 megawatts starting in 2028. The company also pledged to expand its European cloud and AI capacity to 200 data centers across Europe.
- Amazon expected to spend $125 billion on infrastructure in 2025 and more in 2026. Its $11 billion Project Rainier is a 2.2 gigawatt data center in Indiana that runs on 500,000 Amazon Trainium 2 chips. In addition, Amazon planned to spend roughly $14 billion to expand data centers in Australia, and about $21 billion in Germany between 2025 and 2029.
- Alphabet expected to spend up to $93 billion on infrastructure in 2025, up from a forecast of $75 billion. The company announced a $40 billion project to add 3 data centers in Texas through 2027. It also committed $15 billion to a project in India, announced roughly $6 billion in investments in Germany, and rolled out new or expanded projects in Australia, Malaysia, and Uruguay.
Yes, but: Can the U.S. economy and infrastructure support such immense investments? There are reasons to wonder.
- The expense of the data-center buildout will require an estimated $2 trillion in annual AI revenue by 2030, according to consultants at Bain & Co. That would be more than the combined 2024 earnings of Amazon, Apple, Alphabet, Microsoft, Meta, and Nvidia.
- The current electric grid may be insufficient to power these data centers. Two facilities in Silicon Valley are sitting idle because local utilities don’t have the capacity to connect them to the grid, according to Bloomberg.
- In mid-December, Blue Owl Capital, which had been in talks to help finance a $10 billion data center for Oracle and OpenAI, pulled out of the deal, Financial Times reported. The report cited concerns about Oracle’s growing debt amid the data-center buildout. Blue Owl continues to finance other Oracle-OpenAI data-center projects.
Where things stand: Despite concerns about an AI bubble, the boom in building infrastructure is generating real jobs and sales in an otherwise tepid economy. Investment in data centers and AI accounted for nearly all the growth of the U.S. gross domestic product in the first half of 2025, according to Harvard economist Jason Furman. At this stage, there is evidence to back the idea that 2025 lifted the curtain on a new industrial age.

Agents Write Code Faster, Cheaper
Coding apps moved beyond autofill-style code completion to agentic systems that manage a wide range of software development tasks.
What happened: Coding emerged as the application of agentic workflows with the most immediate business value. Claude Code, Google Gemini CLI, OpenAI Codex and other apps turned coding agents into one of Big AI’s fiercest competitive battlegrounds. Smaller competitors developed their own agentic models to remain in the game.
Driving the story: When Devin, the pioneering agentic code generator, arrived in 2024, it raised the state of the art on the SWE-Bench benchmark of coding challenges from 1.96 percent to 13.86 percent. In 2025, coding agents that use the latest large language models routinely completed more than 80 percent of the same tasks. Developers embraced increasingly sophisticated agentic frameworks that enable models to work with agentic planners and critics, use tools like web search or terminal emulation, and manipulate entire code bases.
- When reasoning models arrived late in 2024, they immediately boosted coding power and cut costs, as reasoning enabled agents to map out tasks to be completed by less expensive models. The addition of variable reasoning budgets made it easier for agents to use a single model, devoting more tokens to planning and fewer to making simple edits. By the end of 2025, Gemini 3 Pro, Claude Opus 4.5, and GPT-5.2 emerged as top models for coding and agentic workflows.
- Open-weights models quickly followed. Z.ai GLM-4.5 and Moonshot Kimi K2 emerged as open-weights favorites, enabling automated-coding startups to cut their costs dramatically. Released in July, Qwen3-Coder provided a hefty 480 billion-parameter model trained on over 5 trillion tokens of code, nearly matching the performance of Claude Sonnet 4.
- Anthropic wrapped an agentic framework around Claude to create an application: Claude Code. Introduced in February, Claude Code was an instant hit and set expectations for what agentic coding systems should do. OpenAI responded with its Codex application based on coding-specialized versions of its GPT-5 series. Where Claude Code initially ran locally, the Codex app ran in a browser, helping to popularize coding agents that run in the cloud. By the end of the year, these agents were able to manage longer-running problems using multiple sub-agents — typically an initializer to start tasks and track progress and various coding agents to complete different tasks — each with its own context window.
- A tug-of-war between model makers and developers of integrated development environments (IDEs) led popular IDE providers like Anysphere (Cursor) and Cognition AI (Windsurf) to build their own models. Conversely, Google built its own IDE, Antigravity, which debuted in November.
Behind the news: Agentic systems steadily ratcheted up the state of the art on the popular SWE-Bench coding benchmark, and researchers looked for alternate ways to evaluate their performance.
- These efforts led to SWE-Bench Verified, SWE-Bench Pro, LiveBench, Terminal-Bench, 𝜏-Bench, CodeClash, and others.
- Because different providers trust (or cherry-pick) different benchmarks, it has become more difficult to evaluate agents’ performance. Choosing the right agent for a particular task remains a challenge.
Yes, but: At the beginning of 2025, most observers agreed that agents were good for generating run-of-the-mill code, documentation, and unit tests, but experienced human engineers and product managers performed better on higher-order strategic problems. By the end of the year, companies reported automating senior-level tasks. Microsoft, Google, Amazon, and Anthropic said they were generating increasing quantities of their own code.
Where things stand: In a short time, agentic coding has propelled vibe-coding from puzzling buzzword to burgeoning industry. Startups like Loveable, Replit, and Vercel enable users who have little or no coding experience to build web applications from scratch. While some observers worried that AI would replace junior developers, it turns out that developers who are skilled at using AI can prototype applications better and faster. Soon, AI-assisted coding may be regarded as simply coding, just as spellcheck and auto-complete are part of writing.

China’s AI Chip Industry Takes Root
The United States government’s effort to deprive China of AI computing power backfired as China turned the tables and banned U.S.-designed chips.
What happened: China’s government issued a directive that all new state-funded data centers must be built using chips made by domestic suppliers, Reuters reported in November. The policy emerged shortly after the U.S. reversed its years-long ban on sales to China of advanced chips manufactured using U.S. technology, including products of AI-chip leader Nvidia and its rival AMD. Rather than constrain China, U.S. policies spurred investment and innovation in China’s semiconductor industry.
Driving the story: The U.S. government aimed to block China’s access to AI on the belief that the technology would have as much geopolitical importance as oil. President Trump took a hard line in his first term from 2017 to 2020, limiting China’s access to cutting-edge technology, and he doubled down on this policy throughout 2025. However, the U.S. approach looked increasingly untenable as China’s semiconductor industry made surprising progress, the immense economic value of the AI chip market became clear, and trade restrictions locked Nvidia, now one of the world’s most valuable companies, out of its largest potential market.
- In April, Trump blocked sales to China of Nvidia and AMD’s lower-performing chips, which had been designed to satisfy earlier export controls. Nvidia said the tighter restrictions cost the company $5.5 billion.
- China’s Huawei CloudMatrix 384 system proved capable of delivering performance that rivals competing Nvidia systems. A cluster of 384 Huawei Ascend 910C chips, the system consumes significantly more energy to power 5 times as many chips as Nvidia’s systems.
- In August, after Nvidia and AMD met with President Trump, who subsequently met with China’s President Xi Jinping, the White House reversed course. The U.S. reauthorized sales to China of made-for-export chips under an unprecedented deal that required U.S. chip vendors to give the government 15 percent of any resulting revenue.
- In October, exploding demand for AI hardware temporarily drove Nvidia’s market capitalization beyond a staggering $5 trillion, up from $3 trillion in mid-2024. Consequently, shutting the company out of China potentially limited the company’s value by tens of billions of dollars.
- China responded to the energy-inefficiency of its domestic chips by offering energy subsidies of up to 50 percent to companies that purchased chips made by Huawei and other domestic suppliers. Over the past 15 years, China’s ability to generate electrical power has surged while U.S. power generation has remained relatively flat, according to data from the International Energy Association. In 2025, President Trump signed executive orders to make more federal land available for lease and development by the energy industry and to fast track permitting for nuclear reactors.
- In November, after China banned U.S. chips, President Trump further softened U.S. restrictions, allowing Nvidia and AMD to sell their made-for-export chips.
Behind the news: U.S. efforts met with some success denying China access to the latest chip-manufacturing equipment. However, with respect to the chips themselves, the U.S. barriers proved to be porous.
- A thriving market for high-end Nvidia chips sprang up in China’s tech hub of Shanzhen.
- Chinese tech giants including Alibaba and ByteDance reportedly gained access to advanced AI computing power through cloud vendors who operate legally in other countries.
- DeepSeek reportedly stockpiled earlier Nvidia chips in advance of U.S. restrictions and optimized its DeepSeek-R1 and DeepSeek-V3 models to run well on outdated hardware.
Where things stand: China is signaling that it’s willing to do without American hardware. That may be a sign of confidence, given Huawei’s progress. It may also be a bluff, as some authorities evaluate China’s semiconductor industry still to be years behind the frontier of high-volume chip fabrication. Either way, the hard-line U.S. strategy backfired, and its relaxation of trade restrictions is a concession to economic and diplomatic realities.