Machine Learning Research

466 Posts

Email from an LLM blackmailing a coworker, generated during an experiment that tested LLM behavior under pressure.

Good Models, Bad Choices: Anthropic made LLMs choose between failing and misbehaving, and they blackmailed executives.

Top large language models, under experimental conditions that pressed them to choose between abandoning their prompted mission and misbehaving, resorted to harmful behavior, researchers found.

Meta Aria Gen 2 smart glasses for AI research, equipped with cameras, microphones, and other sensors for real-time data capture.

Machine Learning Research

Meta’s Smart Glasses Come Into Focus: Meta reveals further details of Aria Gen 2 smart glasses for multisensory AI research

Meta revealed new details about its latest Aria eyeglasses, which aim to give AI models a streaming, multisensory, human perspective.

Diagram comparing LLM answers with and without hints. Hints may influence LLM output without being mentioned in reasoning traces.

Machine Learning Research

Reasoning for No Reason: Anthropic finds chain-of-thought reasoning traces may omit key influences

Does a reasoning model’s chain of thought explain how it arrived at its output? Researchers found that often it doesn’t.

AI model animation predicting Cyclone Alfred’s path. An ensemble graph neural networks produces more-accurate 15-day forecasts.

Machine Learning Research

AI Weather Prediction Gains Traction: U.S. working with Google Weather Lab AI to improve storm forecasts

The U.S. government is using AI to predict the paths of hurricanes.

BitNet b1.58 matrix multiplication shows ternary weights enabling faster neural network computation.

Machine Learning Research

Low Precision, High Performance: Researchers at Microsoft and Tsinghua researchers propose 1.58-bit AI model that rivals full-precision competitors

Reducing the number of bits used to represent each parameter in a neural network from, say, 16 bits to 8 bits shrinks the network’s size and boosts its speed. Researchers took this approach to an extreme: They built a competitive large language model whose weights are limited to three values.

Biomni AI agent analyzes oncogenic pathways using genomics tools like Scanpy and CellxGene.

Machine Learning Research

A Research Agent for All Biology: Biomni, an AI agent for multidisciplinary biology research

An agent designed for broad biological research could accelerate the work of scientists in specialties from anatomy to zoology.

Apple AI models outperform rivals in instruction accuracy and human text evaluations across devices and servers.

Machine Learning Research

Apple Sharpens Its GenAI Profile: Apple updates its on-device and cloud AI models, introduces a new developer API

Apple revamped two vision-language models in a bid to catch up with fast-moving competitors.

Diagram showing AI pipeline using OCR and LLMs to detect racist clauses in historic California property deeds.

Machine Learning Research

LLM Rights Historical Wrongs: Stanford and Princeton researchers fine-tune a language model to identify racial discrimination in property

In Northern California, old property deeds may still include racial clauses: language, made illegal decades ago, that was designed to ban people of color from owning or living in certain homes.

OpenAI o3-pro outperforms o3 and o1-pro on math, science, and coding benchmarks, but responds much more slowly.

Machine Learning Research

Better Video, Fewer Tokens: STORM Processes Fewer Tokens And Still Beats GPT-4o On Video Understanding Benchmarks

Researchers reduced the number of tokens needed to represent video frames to be fed to a transformer.

The FLUX.1 Kontext family of image generators from Black Forest Labs edits images to remove or add objects, apply art styles, and extract details.

Machine Learning Research

More Consistent Characters and Styles: Black Forest Labs Launches FLUX.1 Kontext for Generating and Alterating Images with Consistent Details

Same character, new background, new action. That’s the focus of the latest text-to-image models from Germany’s Black Forest Labs.

Diagram showing how a language model agent gets misled by malicious posts and sites when searching for Nike shoes online.

Machine Learning Research

Phishing for Agents: Columbia University researchers show how to trick trusting AI agents with poisoned links

Researchers identified a simple way to mislead autonomous agents based on large language models.

Bar graph comparing AI model accuracies for AIME 2024-2025, GPQA, LiveCodeBench, Aider, and Humanity's Last Exam.

Machine Learning Research

Next-Level DeepSeek-R1: DeepSeek-R1’s update leads all open models and brings it up to date with the latest from Google and OpenAI

DeepSeek updated its groundbreaking DeepSeek-R1 large language model to strike another blow for open-weights performance.

DeepSeek computation diagram showing transformer blocks, multi-head attention, and routing, using FP8 and BF16 precision.

Machine Learning Research

How DeepSeek Did It: Researchers describe training methods and hardware choices for DeepSeek’s V3 and R1 models

DeepSeek made headlines late last year, when it built a state-of-the-art, open-weights large language model at a cost far lower than usual. The upstart developer shared new details about its method.

Side-by-side of a fern leaf and its digital code representation, illustrating nature's pattern-to-code transformation.

Machine Learning Research

Google I/O Overdrive: Google’s new AI offerings include Veo 3 video generator, lightweight Gemma 3n, updates to Gemini Pro and Ultra, and more

Google revamped its roster of models, closed and open, and added more AI-powered features to its existing products.

Machine Learning Research

Good Models, Bad Choices: Anthropic made LLMs choose between failing and misbehaving, and they blackmailed executives.

Meta’s Smart Glasses Come Into Focus: Meta reveals further details of Aria Gen 2 smart glasses for multisensory AI research

Reasoning for No Reason: Anthropic finds chain-of-thought reasoning traces may omit key influences

AI Weather Prediction Gains Traction: U.S. working with Google Weather Lab AI to improve storm forecasts

Low Precision, High Performance: Researchers at Microsoft and Tsinghua researchers propose 1.58-bit AI model that rivals full-precision competitors

A Research Agent for All Biology: Biomni, an AI agent for multidisciplinary biology research

Apple Sharpens Its GenAI Profile: Apple updates its on-device and cloud AI models, introduces a new developer API

LLM Rights Historical Wrongs: Stanford and Princeton researchers fine-tune a language model to identify racial discrimination in property

More Reasoning for Harder Problems: OpenAI debuts o3-pro, an updated reasoning model that applies more tokens at inference

Better Video, Fewer Tokens: STORM Processes Fewer Tokens And Still Beats GPT-4o On Video Understanding Benchmarks

More Consistent Characters and Styles: Black Forest Labs Launches FLUX.1 Kontext for Generating and Alterating Images with Consistent Details

Phishing for Agents: Columbia University researchers show how to trick trusting AI agents with poisoned links

Next-Level DeepSeek-R1: DeepSeek-R1’s update leads all open models and brings it up to date with the latest from Google and OpenAI

How DeepSeek Did It: Researchers describe training methods and hardware choices for DeepSeek’s V3 and R1 models

Google I/O Overdrive: Google’s new AI offerings include Veo 3 video generator, lightweight Gemma 3n, updates to Gemini Pro and Ultra, and more

Subscribe to The Batch