Open-source DeepCoder matches top models ML researchers accept an AI-written workshop paper

Published
Apr 11, 2025
Reading time
4 min read
Courtroom scene with teams analyzing neural networks, symbolizing AI ethics and legal implications.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • Google’s A2A protocol helps agents work together
  • Amazon debuts unified speech-to-speech model
  • Claude’s new subscription plan for power users
  • Elon Musk’s battle with OpenAI takes a new turn

But first:

AI Scientist-v2 creates first peer-accepted workshop paper written entirely by AI

Sakana researchers updated AI Scientist, their fully autonomous scientific research system. AI Scientist-v2 independently creates hypotheses, conducts experiments, analyzes results, and writes scientific manuscripts in various machine learning fields. The new version improves upon its predecessor by eliminating human-authored templates and implementing a progressive agentic tree-search guided by an experiment manager agent. As a demonstration, AI Scientist-v2 generated three papers fully autonomously, one of which was accepted by ICLR, a peer-reviewed workshop. Along with Google’s similar co-scientist program, this advancement shows AI agents’ still-growing capability to perform complex scientific workflows on par with experienced human researchers. (Sakana)

Fully open-source model codes as well as o3-mini

Agentica and Together AI released DeepCoder-14B-Preview, a fully open-source code reasoning model that achieves 60.6 percent Pass@1 accuracy on LiveCodeBench, matching OpenAI’s o3-mini’s performance but with only 14 billion parameters. Researchers trained DeepCoder using reinforcement learning (RL) on 24,000 curated, verifiable coding problems over 2.5 weeks using 32 H100 GPUs. They developed several training innovations, including GRPO+ (a new, stabilized version of GRPO), iterative context lengthening, and various optimizations that accelerate RL training by up to 2.5 times. Despite being trained primarily for coding tasks, DeepCoder also shows strong math capabilities, scoring 73.8 percent on AIME 2024. The team open-sourced their dataset, code, training logs, and system optimizations under an MIT license to help democratize RL training for large language models. (DeepCoder and GitHub)

Google launches open protocol for agent collaboration

Google announced Agent2Agent (A2A), a new open protocol (complementary to Anthropic's MCP) that enables AI agents from different vendors to communicate and collaborate across enterprise systems. The protocol lets agents securely exchange information and coordinate actions, addressing interoperability challenges. A2A facilitates communication between “client” and “remote” agents, supporting both quick tasks and long-running processes. Developers can contribute to A2A’s open-source specification draft, and Google plans to launch a production-ready version later this year. (Google and GitHub)

Nova Sonic brings conversational voice to applications

Amazon introduced Nova Sonic, a new speech-to-speech model that combines understanding and generation capabilities into a single unified system. The model simplifies application development by eliminating the need to orchestrate separate models for speech recognition, language processing, and text-to-speech conversion. According to benchmarks, Nova Sonic outperforms competitors from OpenAI and Google with a 5.0 word error rate on speech transcription and 1.09 second latency, making it particularly valuable for applications in customer service, healthcare, and enterprise settings. The model is available now through Amazon Bedrock for $3.40 per million voice input tokens and $13.60 per million voice output tokens. (Amazon)

Anthropic introduces Max plan with higher usage limits

Anthropic launched a new subscription plan that offers up to 20 times higher Claude usage limits than its Pro tier. The plan comes in two tiers: Expanded Usage costs $100 per month and provides 5 times more usage than Pro (roughly 225 messages every five hours), while Maximum Flexibility ($200 per month) offers 20 times more usage than Pro (roughly 900 messages over the same period). Anthropic says it added this option in direct response to requests from their most active users, mostly software developers, who need greater capacity for demanding projects. Along with OpenAI’s similar ChatGPT Pro plan, Anthropic Max shows that monthly subscriptions for power users are becoming a promising revenue model for top AI companies and an important tool for their customers. (Anthropic)

OpenAI countersues Elon Musk, alleges harassment campaign

OpenAI asked a federal judge to halt what it describes as a pattern of harassment and “unlawful and unfair action” by billionaire Elon Musk. OpenAI claims Musk, a former co-founder who launched rival xAI in 2023, has tried to harm the company through press attacks, social media campaigns, and retaliatory legal claims after leaving the company. OpenAI’s filing comes amid Musk’s lawsuit attempting to prevent the ChatGPT maker from transitioning to a for-profit model, which the company must complete by year-end to secure its $40 billion fundraising round. The legal dispute between Musk and OpenAI is scheduled for jury trial in spring 2025. (Reuters)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng reflected on the impact of new U.S. tariffs, expressing concern over how they threaten international collaboration, inflate costs, and slow down AI progress. Andrew also encouraged the global AI community to stay united despite these worries.

“AI isn’t the solution to everything, but even amidst this challenging environment, I hope our community can hold together, keep building friendships across borders, keep sharing ideas, and keep supporting each other.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Anthropic’s latest experiment revealed that Claude can take reasoning steps even without explicit prompting; Meta released its new Llama 4 models with a mixture-of-experts architecture, claiming performance gains over major competitors; Qwen2.5-Omni 7B raised the bar for small multimodal models, achieving strong results across text, image, audio, and video with just seven billion parameters; and new research showed that transformers can outperform decision trees in predicting missing values in tabular data, such as spreadsheet cells.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research