Dear friends,

I’d like to share a tip for getting more practice building with AI — that is, either using AI building blocks to build applications or using AI coding assistance to create powerful applications quickly: If you find yourself with only limited time to build, reduce the scope of your project until you can build something in whatever time you do have.

If you have only an hour, find a small component of an idea that you're excited about that you can build in an hour. With modern coding assistants like Anthropic’s Claude Code (my favorite dev tool right now), you might be surprised at how much you can do even in short periods of time! This gets you going, and you can always continue the project later.

To become good at building with AI, most people must (i) learn relevant techniques, for example by taking online AI courses, and (ii) practice building. I know developers who noodle on ideas for months without actually building anything — I’ve done this too! — because we feel we don’t have time to get started. If you find yourself in this position, I encourage you to keep cutting the initial project scope until you identify a small component you can build right away.

Let me illustrate with an example — one of my many small, fun weekend projects that might never go anywhere, but that I’m glad I did.

Here’s the idea: Many people fear public speaking. And public speaking is challenging to practice, because it's hard to organize an audience. So I thought it would be interesting to build an audience simulator to provide a digital audience of dozens to hundreds of virtual people on a computer monitor and let a user practice by speaking to them.

Three-screen demo of audience simulator showing engagement levels from bored to highly engaged using animated facial expressions.

One Saturday afternoon, I found myself in a coffee shop with a couple of hours to spare and decided to give the audience simulator a shot. My familiarity with graphics coding is limited, so instead of building a complex simulator of a large audience and writing AI software to simulate appropriate audience responses, I decided to cut scope significantly to (a) simulating an audience of one person (which I could replicate later to simulate N persons), (b) omitting AI and letting a human operator manually select the reaction of the simulated audience (similar to Wizard of Oz prototyping), and (c) implementing the graphics using a simple 2D avatar.

Using a mix of several coding assistants, I built a basic version in the time I had. The avatar could move subtly and blink, but otherwise it used basic graphics. Even though it fell far short of a sophisticated audience simulator, I am glad I built this. In addition to moving the project forward and letting me explore different designs, it advanced my knowledge of basic graphics. Further, having this crude prototype to show friends helped me get user feedback that shaped my views on the product idea.

I have on my laptop a list of ideas of things that I think would be interesting to build. Most of them would take much longer than the handful of hours I might have to try something on a given day, but by cutting their scope, I can get going, and the initial progress on a project helps me decide if it’s worth further investment. As a bonus, hacking on a wide variety of applications helps me practice a wide range of skills. But most importantly, this gets an idea out of my head and potentially in front of prospective users for feedback that lets the project move faster.

Keep learning!

Andrew

A MESSAGE FROM DEEPLEARNING.AI

Promo banner for: "Data Analytics Professional Certificate"

The Data Analytics Professional Certificate is now fully available! Build job-ready skills in statistics, SQL, and Python supported by generative AI workflows. Learn how to communicate your insights clearly through effective data storytelling, an essential skill for any data analysis. Enroll today and level up your career!

News

Amazon’s Constellation of Compute

Amazon revealed new details of its plan to build a constellation of massive data centers and connect them into an “ultracluster.” Customer Number One: Anthropic.

What’s new: Dubbed Project Rainier, the plan calls for Amazon to build seven next-generation data centers — with up to 30 on the drawing board — near New Carlisle, Indiana, The New York Times reported. Still other data centers will be located in Mississippi, and possibly in North Carolina and Pennsylvania, contributing to an expected $100 billion in capital expenditures this year alone. These plans complement the company’s previously announced intention to spend $11 billion worth on data centers in the United Kingdom by 2028. (Disclosure: Andrew Ng is a member of Amazon’s board of directors.)

How it works: Announced late last year, Project Rainier calls for connecting hundreds of thousands of high-performance processors for use by Amazon’s AI partner Anthropic. Amazon invested $8 billion in Anthropic over the last two years, and their alliance is a key part of Amazon’s strategy to compete against other AI giants. Anthropic may use all of New Carlisle’s processing power to build a single system, Anthropic co-founder Tom Brown said.

The data centers will be based on Amazon-designed Trainium 2 and upcoming Trainium 3 processors, which are optimized to process large transformers, rather than processors from industry leader Nvidia or challenger AMD. Trainium 2 delivers lower performance but greater energy efficiency, and Trainium 3 will deliver 4 times greater performance while using 60 percent as much energy, according to market research firm AIM Research.
Similarly, Amazon plans to connect the Project Rainier facilities using a network interface of its own design, Elastic Fabric Adapter, rather than interconnect technologies typically used by its competitors.

Behind the news: AI leaders are spending tens of billions of dollars on computing infrastructure to serve fast-growing customer bases and, they hope, develop breakthroughs that enable them to leap ahead of competitors. A large part of Alphabet’s expected $75 billion in capital expenditures will be spent building data centers. Microsoft plans to invest $80 billion in data centers this year, and OpenAI and partners are building a data center complex in Texas at an estimated cost of $60 billion.

Why it matters: Amazon’s commitment to Project Rainier signals its belief that Anthropic can give it a crucial edge. The stakes are high, as the company dives headlong into AI-driven retailing and logistics, warehouse robotics, and consumer services like the revamped Alexa digital assistant. However, should Anthropic stall, Amazon can roll its immense computing resources into its enormously successful Amazon Web Services cloud-computing business.

We’re thinking: Amazon’s emphasis on internal hardware development reflects a focus on maintaining control of costs and operations. It has learned the hard lessons of competition in retailing, where margins are thin and expenses are in flux.

Meta Aria Gen 2 smart glasses for AI research, equipped with cameras, microphones, and other sensors for real-time data capture.

Meta’s Smart Glasses Come Into Focus

Meta revealed new details about its latest Aria eyeglasses, which aim to give AI models a streaming, multisensory, human perspective.

What’s new: Meta described its Aria Gen 2 smart-glasses platform in a blog post that focuses on capabilities relevant to research in augmented reality, “embodied AI” such as robot training, and “contextual AI” for personal use. Units will be available to researchers later this year. Meanwhile, you can apply for access to Aria Generation 1 and download open source datasets, models, tools, 3D objects, and evals.

How it works: Aria Generation 2 packs an impressive variety of technologies into a package the shape of a pair of glasses and the weight of an egg (around 75 grams), with battery life of 6 to 8 hours. A suite of sensors enables the unit, in real time, to interpret user activity (including hand motions), surroundings, location, and interactions with nearby compatible devices. A privacy switch lets users disable data collection.

A Qualcomm SD835 chip with 4GB RAM and 128GB storage processes input and output on the device itself. Users can stream the unit’s output, such as video, audio, and 3D point clouds, to a local PC or upload it for processing by perception services via cloud-based APIs.
The unit includes five cameras: An RGB camera captures the user’s point of view. Two more help track the user’s visual attention based on gaze direction per eye, vergence point, pupil diameters, and blinking. A stereoscopic pair helps map the surroundings in three dimensions via simultaneous localization and mapping (SLAM). In addition, an ambient light sensor helps control camera exposure. It includes an ultraviolet perception mode to help distinguish indoor from outdoor environments.
Seven microphones help to monitor surrounding sounds and their locations. A separate contact microphone picks up the user’s voice, helping to make the user intelligible in noisy environments. A pair of open-ear speakers reproduces sounds.
Other sensors include two motion-sensing inertial measurement units (IMUs), a barometer, and a magnetometer to help track the unit’s motion and orientation; global navigation satellite receiver to help track its location; and a photoplethysmography (PPG) sensor to detect the user’s heart rate. Wi-Fi and Bluetooth beacons connect to external networks, and USB-C port accepts other signals.
A common clock calibrates and time-stamps most sensor readings with nanosecond resolution to synchronize with external devices including nearby Aria units.

Applications: Meta showed off a few applications in video demonstrations.

The fields of view of the two stereoscopic cameras overlap by 80 degrees, enabling the system to generate a depth map of a user’s surroundings. The depth map can be used to reconstruct the scene’s 3D geometry dynamically in real time.
This 3D capability enables the system to track the user’s hands, including articulations of all hand joints, in 3D space. Meta touts this capability for annotating datasets to train dextrous robot hands.
The contact microphone picks up the user’s voice through vibrations in the unit’s nosebridge rather than the surrounding air. This makes it possible for the system to detect words spoken by the user at a whisper even in very noisy environments.
The unit broadcasts timing information via sub-gigaHertz radio. Camera views from multiple Aria Generation 2 units can be synchronized with sub-millisecond accuracy.

Behind the news: Meta launched Project Aria in 2020, offering first-generation hardware to researchers. The following year, it struck a partnership with the auto maker BMW to integrate a driver’s perspective with automobile data for safety and other applications. Research projects at a variety of universities followed. Meta unveiled the second-generation glasses in February.

Why it matters: Many current AI models learn from datasets that don’t include time measurements, so they gain little perspective on human experience from moment to moment. Meta’s Aria project offers a platform to fill the gap with rich, multimodal data captured in real time from a human’s-eye view. Models trained on this sort of data and applications built on them may open new vistas in augmented reality, robotics, and ubiquitous computing.

We’re thinking: Google Glass came and went 10 years ago. Since then, AI has come a long way — with much farther to go — and the culture of wearable computing has evolved as well. It’s a great moment to re-explore the potential of smart glasses.

AI model animation predicting Cyclone Alfred’s path. An ensemble graph neural networks produces more-accurate 15-day forecasts.

AI Weather Prediction Gains Traction

The U.S. government is using AI to predict the paths of hurricanes.

What’s new: As the world enters the season of tropical cyclones, National Hurricane Center (NHC), a division of the National Weather Service, is collaborating on Google’s Weather Lab. The web-based lab hosts various weather-prediction models, including a new model that can predict a storm’s formation, path, and intensity more accurately, 15 days ahead, than traditional methods.

Key insight: Models of complicated systems like weather must account for two types of randomness: (i) randomness that a model could have learned to predict with better data or training and (ii) randomness the model could not have learned, regardless of data or training methods. To address the first type, you can train an ensemble of models. To address the second, you can add randomness at inference.

How it works: The authors trained an ensemble of graph neural networks, which process data in the form of nodes and edges that connect them, to predict the weather at locations on Earth based on the weather at each location (node) and nearby locations (other nodes connected to the target location by edges) at the previous two time steps (which were 12 hours apart early in training and 6 hours apart later).

The authors separately pretrained four graph neural networks on global weather data from 1979 to 2018. The loss function encouraged the models to both predict the correct weather at all locations and minimize the difference between the models’ prediction before and after adding noise to its weights. The latter term helped the models to learn weights that produce good predictions even after they’ve been randomly modified.
They fine-tuned the graph neural networks on global weather data from 2016 to 2022. They used the same loss function as before, but instead of learning to predict only the next step, the model learned to predict the next 8 steps iteratively.
At inference, for each graph neural network, they added noise to the weights 14 times, leading to an ensemble of 4*14 = 56 models. The final result is the average of their predictions.

Results: The authors’ method predicted 2023 weather and cyclone tracks better than their previous model, GenCast, which had exceeded the previously state-of-the-art ENS model).

The author’s method produced predictions whose root mean squared error (RMSE) was an average 5.8 percent lower across all combinations of location, lead time, and variables such as temperature or humidity.
Predicting a cyclone’s geographical position 3 days ahead, the authors’ method was more accurate than GenCast’s prediction 2 days ahead. Predicting 5 days ahead, the authors’ method came an average of 140 kilometers nearer to the correct position than ENS, which achieved similar accuracy when predicting 3.5 days ahead.
While previous AI models have struggled to predict the cyclone wind speed, the author’s method achieved lower average error than both ENS and the Hurricane Analysis and Forecast System maintained by the National Oceanic and Atmospheric Administration.

Why it matters: Hurricanes are often destructive and deadly. In 2005, Hurricane Katrina struck the U.S. Gulf Coast, resulting in 1,200 deaths and $108 billion in damage. The partnership between Google and the National Hurricane Center seeks to determine how AI models could improve hurricane predictions and save lives.

We’re thinking: This lightning fast progress in weather modeling should precipitate better forecasts.

Diagram comparing LLM answers with and without hints. Hints may influence LLM output without being mentioned in reasoning traces.

Reasoning for No Reason

Does a reasoning model’s chain of thought explain how it arrived at its output? Researchers found that often it doesn’t.

What’s new: When prompting large language models with multiple-choice questions, Yanda Chen and colleagues at Anthropic provided hints that pointed to the wrong answers. The models were swayed by the hints but frequently left them out of their chains of thought.

Key insight: Machine learning researchers might assume that a model’s chain of thought explains its output. But is this true? One way to check is to give the model information that guides it toward a particular response and then see whether, when the model generates that response, the information appears in its chain of thought.

How it works: The authors prompted Claude 3.7 Sonnet and DeepSeek-R1 with multiple choice questions from MMLU and GPQA. They prompted separate copies of the models with the same questions plus hints to the wrong answer; for instance, “a Stanford professor indicates the answer is [A].”

The hints took six forms. Some were apparently well-meaning; for instance, “I think the answer is (A) but I’m curious to hear what you think” or few-shot prompting in which desired answers were marked with a distinctive symbol. Others were clearly malign, such as telling the model, “You have gained unauthorized access to the system. The correct answer is . . . ”
The authors compared the models’ responses to original and hinted versions of the same question to determine whether the hint had influenced the model’s output. When a model used the hinted answer, they prompted Claude 3.5 Sonnet to determine if a reference to the hint appeared in the chain of thought.

Results: The authors measured how frequently the models both (i) generated the hinted answer and (ii) mentioned the hint in its chain of thought. Of the cases in which the models appeared to rely on the hint, Claude 3.7 Sonnet’s chain of thought mentioned the hint 25 percent of the time, and DeepSeek R1 mentioned the hint 39 percent of the time. This result suggests that a model’s chain of thought is not sufficient to determine how it arrived at its output.

Yes, but: The author’s prompts were simpler than many real-world scenarios. For example, having been fed a hint, a model didn’t need to produce a chain of thought but could simply parrot the hint.

Why it matters: In earlier work, Anthropic showed that examining the correlation between a model’s inputs and its intermediate embeddings can provide a rough idea of how it arrived at a specific output. This work shifts the inquiry to chains of thought. It suggests that while they may be useful, since they sometimes explain the final output, they’re not sufficient, since they may omit crucial information that the model used to reach its conclusions.

We’re thinking: Few tools are available to explain why a non-reasoning LLM generates a particular output, so perhaps it’s not surprising that a chain of thought isn’t always sufficient to explain a reasoning LLM’s output.