Machine Learning Research

552 Posts

Diagram depicts a math problem-solving workflow from problem generation to verification and revision.
Machine Learning Research

Agent Solves Stubborn Math Problems: Google’s Aletheia uses Gemini 3 Deep Think to find original mathematics solutions

LLMs have achieved gold-medal performance in math competitions. An agentic system showed strength in mathematical research as well.
Cursor hovers over a button labeled "Submit" on a platform showing task ratings and a typed approval note.
Machine Learning Research

Management for Agents: OpenAI’s Frontier agent insights and orchestration platform launches to select customers

Managers need to understand how their subordinates get work done, what resources they require, and what they accomplish. OpenAI’s latest product aims to fulfill this need when the teammates are AI agents.
AI-generated scenes including ornate signage, a beachgoer’s tattoo, and cactus and honeycomb cars, illustrating Nano Banana 2’s realism.
Machine Learning Research

Nano Banana 2 Ups Performance/Price: Gemini 3.1 Flash Image makes photo generation and edits easier and faster

Google launched a cheaper, faster successor to its flagship image generator, delivering greater interactivity at roughly half the price.
Bar graph depicts rising efficiency in AI models from 2023 to 2025, highlighting energy gains.
Machine Learning Research

Can Local AI Stand In for the Cloud?: Stanford and Together.AI researchers chart edge models’ performance in intelligence per watt

Projected demand for output from large language models is spurring a massive buildout of data centers. Researchers asked whether smaller models running on local devices could meaningfully lighten that load.
Officials and leaders stand together at the India AI Impact Summit 2026 in New Delhi.
Machine Learning Research

Global AI Summit Shows Optimism: Rising giant India presents itself as AI Counterweight to U.S. and China

The fourth global AI summit marked a decisive shift from focusing on theoretical hazards to spreading AI’s benefits throughout the world.
A benchmark table shows Gemini 3.1 Pro leading in performance across several tested metrics.
Machine Learning Research

Gemini Takes the Lead: Google releases Gemini 3.1 Pro in preview, tops Intelligence Index at same price

Google updated its flagship Gemini model, topping several benchmarks while undercutting competitors on performance per dollar.
Diagram shows SleepFM's data processing flow from sleep signals to disease prediction using neural networks.
Machine Learning Research

Sleep Signals Predict Illness: SleepFM detects signs of neurological disorders years before symptoms manifest

Difficulty sleeping often precedes heart disease, psychiatric disorders, and many other illnesses. Researchers used data gathered during sleep studies to detect such conditions.
Two comparison tables show AI model performance across varied benchmarks, highlighting LFM2.5-1.2B.
Machine Learning Research

Faster Reasoning at the Edge: Liquid AI’s small reasoning model mixes attention with convolutional layers for efficiency

Reasoning models in the 1 to 2 billion-parameter range typically require more than 1 gigabyte of RAM to run. Liquid AI released one that runs in less than 900 megabytes, and does it with exceptional speed and efficiency.
Benchmark table shows GLM-5 outperforming other models in reasoning, coding, and general agent tasks.
Machine Learning Research

GLM-5 Scales Up: Z.ai’s updated model boasts top open-weights Intelligence Index score

Z.ai more than doubled the size of its flagship large language model to deliver outstanding performance among open-weights competitors.
A performance table shows Claude Opus 4.6 outperforming competitors in terminal coding, computer use, tool use, search, and problem-solving.
Machine Learning Research

Claude Opus 4.6 Reasons More Over Harder Problems: Anthropic updates flagship model, places first on Intelligence Index

Anthropic updated its flagship large language model to handle longer, more complex agentic tasks.
Chart showing AI system audit assurance levels ranging from limited to very high, with increasing access and more comprehensive assessments.
Machine Learning Research

Toward Consistent Auditing of AI: OpenAI alumni found Averi to set standards for AI model audits

AI is becoming ubiquitous, yet no standards exist for auditing its safety and security to make sure AI systems don’t assist, say, hackers or terrorists. A new organization aims to change that.
Flowchart showing Dr. CaBot's AI reasoning process for diagnosing and suggesting medical tests.
Machine Learning Research

More Robust Medical Diagnoses: Inside Dr Cabot, an agent trained to diagnose complex conditions

AI models that diagnose illnesses typically generate diagnoses based on descriptions of symptoms. In practice, though, doctors must be able to explain their reasoning and plan next steps. Researchers built a system that accomplishes these tasks.
Flowchart showing Mistral Small 3.1 model distillation into smaller Ministral 3 models with post-training steps.
Machine Learning Research

Recipe for Smaller, Capable Models: Mistral uses cascade distillation on Mistral 3 to build Ministral family

Mistral compressed Mistral Small 3.1 into much smaller versions, yielding a family of relatively small, open-weights, vision-language models that perform better by some measures than competing models of similar size. The method combines pruning and distillation.
Flowchart showing Kimi K2.5 AI orchestrating tasks among various specialized subagents.
Machine Learning Research

Kimi K2.5 Creates Its Own Workforce: Moonshot AI takes the open model crown with vision updates, aided by subagents

An open source vision-language model unleashes minion agents that enable it to perform tasks more quickly and effectively.
A post on a forum titled "Can my human legally fire me for refusing unethical requests?"
Machine Learning Research

Agents Unleashed: Cutting through the OpenClaw and Moltbook hype

The OpenClaw open-source AI agent became a sudden sensation, inspiring excitement, worry, and hype about the agentic future.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox