Machine Learning Research

582 Posts

Map with UK sites; flowchart depicts mammogram study steps, highlighting AI’s role alongside doctors.
Machine Learning Research

AI Mammogram Diagnosis Under Real-World Conditions: Two studies test Google's breast cancer detection models in clinics

Introduced in 2020, Google’s AI system for detecting breast cancer in mammograms still hasn't been used to diagnose current patients.
Graph depicts GPT-Realtime-2's performance across sectors, competing with other speech-to-speech models.
Machine Learning Research

OpenAI Challenges Speech-to-Speech Leaders: RealTime API updates audio models that reason, transcribe, and translate

An update of OpenAI’s speech-to-speech model lets developers tune the tradeoff between speed and reasoning.
Chart compares U.S. and PRC AI model performance over time, highlighting Elo scores and increasing trends.
Machine Learning Research

U.S. to Evaluate Upcoming Models: U.S. Government Will Test AI Models for National Security Risks, Other Hazards Prior to Release

The U.S. government said it will evaluate cutting-edge models before they’re available to the public, a sharp reversal of the White House’s earlier hands-off policy.
Diagram showing sequential task learning steps with images of robotic tasks and flow arrows.
Machine Learning Research

Robots That Adapt to New Tasks: Sony and university researchers train robots on new tasks without catastrophic forgetting

Neural networks can forget how to perform earlier tasks as they learn new ones.
Infographic showing Nvidia's chip design flow, highlighting placer, router, and optimization stages.
Machine Learning Research

How Nvidia Uses AI to Design Chips: Chipmaker's models design circuits, verify designs, and test new layouts

Nvidia’s chief scientist dreams of telling an AI model to design a new GPU, then skiing for a couple days while the system does the job.
Through a rainy window, a pizza worker prepares food beneath menu boards and a red neon "Pizza" sign.
Machine Learning Research

ByteDance Bids for Video Leadership: ByteDance adds state-of-the-art Seedance 2.0 video to Capcut, while OpenAI retreats

As OpenAI prepares to shut down Sora, ByteDance made its own video generation model available to hundreds of millions of users.
Graphs compare human and LLM performance strategies in rock-paper-scissors, highlighted by stars.
Machine Learning Research

Strategic Thinking in LLMs vs. Humans: Researchers at UT-Austin and Google model human decision-making in Rock-Paper-Scissors

While large language models can behave in human-like ways, the similarities are superficial. A simple strategy game revealed clear differences in their strategic approaches.
Table highlights Kimi K2.6's dominance in agentic tasks with 86.3 and coding at 58.6, surpassing other models.
Machine Learning Research

Kimi K2.6 Challenges Open-Weights Champs: Kimi K2.6 matches open Qwen3.6 Max andDeepSeek V4, falls just behind top closed models.

Moonshot AI’s updated Kimi model handles longer autonomous coding sessions and scales up its multi-agent orchestration relative to its predecessor.
GPT-5.5 leads in Terminal-Bench 2.0 with 82.7% score, highlighting performance contrast against competitors.
Machine Learning Research

GPT-5.5 Outperforms, Hallucinates: OpenAI’s latest model tops leaderboards for coding, visual puzzles, and overall intelligence

The latest update of OpenAI’s flagship model sets new states of the art in important benchmarks but has difficulty distinguishing between what it does and doesn't know.
A graph shows assistant behavior shifting between helpful and role-playing, with conversation bubbles.
Machine Learning Research

Assistants That Assist Consistently: Large language models can drift drift from helpful personas to harmful ones, but new research aims to stabilize them

Typically, large language models are trained to act as helpful, harmless, honest assistants. However, during long or emotionally charged conversations, traits can emerge that are less beneficial. Researchers devised a way to steady the assistant personas of LLMs.
A humanoid robot with teal and white elements handles metal parts in bins on a factory floor.
Machine Learning Research

Humanoid Robots Work Factory Floors: Agiliy Digits humanoid robots fetch and carry bins at a Schaeffler auto-parts factory, displacing humans into higher-level jobs

A small number of humanoid robots have made their way into industrial settings, where they’re roughly matching the cost of human labor and propelling some workers into higher-level roles.
GLM-5.1 excels in SWE-Bench Pro and Terminal-Bench 2.0, leading in coding and reasoning tests.
Machine Learning Research

GLM 5.1 Aims for Long-Running Tasks: Z.ai’s GLM 5.1 evaluates interim results and may change its approach hundreds of times before it delivers final output

Z.ai updated its flagship open-weights large language model to work autonomously on single tasks for up to eight hours.
Image depicts persona generator creating synthetic personas, with outputs analyzed for diversity metrics.
Machine Learning Research

Simulating Diverse Human Cohorts: Persona generation simulates human characters across a controllable range of points of view

If you want to understand how the public will respond to your offerings, large language models can simulate users who answer questions about capabilities, features, promotions, or prices.
Quilt map of U.S. states with varying shades, some states in black, illustrating AI legislation differences.
Machine Learning Research

US States Move Forward With AI Laws: Most states are regulating AI despite President Trump’s opposition to state-level laws

U.S. states are continuing to enact laws that regulate AI, despite President Trump’s efforts to discourage state-by-state legislation in favor of national laws.
Diagram showing AI-driven drug discovery process, from lung fibrosis data to molecule generation.
Machine Learning Research

Big Pharma Bets Big on AI: Pharmaceutical kingpin Eli Lilly gave Insilico $2.75 billion for AI-driven drug development

Generative AI has proven that it can produce text, images, audio, video, and code. The world’s most valuable pharmaceutical company is betting billions that it can produce drugs as well.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox