U.S. chatbot use passes 50 percent AA-Briefcase benchmark measures knowledge work

Data Points

Published

Jun 22, 2026

Reading time

5 min read

In today’s edition of Data Points, you’ll learn about our top headlines, and more:

ARD, an open spec for discovery
North Mini Code gains traction
Security experts criticize U.S. government
Apple Intelligence, beyond Siri

But first:

Pew study shows chatbots on the rise

Nearly half of American adults now use AI chatbots, a 16-percentage-point jump from ChatGPT’s number in 2024, with one in four reaching for them daily, according to a Pew Research Center survey of 5,119 adults conducted in February 2026. ChatGPT dominates the field at 44 percent adoption, more than double 2023’s 18 percent. Gemini trails at 24 percent. People use chatbots primarily for search (42 percent) and work tasks (38 percent among employed adults), but smaller groups turn to them for image generation, medical advice, or emotional support. The adoption curve doesn’t extend to other AI devices at the same pace. About a third own smart speakers; far fewer have bought AI-enabled doorbells (18 percent) or thermostats (11 percent). Yet skepticism runs deep. Sixty percent read AI-generated search summaries without necessarily trusting the results. A majority of Americans believes AI poses data privacy risks and is advancing too quickly, and more predict it will harm them personally rather than help. (Pew Research Center)

New benchmark tests models’ skill at white-collar work

Artificial Analysis released AA-Briefcase, a new evaluation framework designed to measure how well frontier AI models handle complex knowledge work tasks. The benchmark allows models up to 500 turns to complete tasks, with top performers like Claude Opus 4.8 averaging around 24 minutes per task. Interestingly, runtime doesn’t predict performance: for example, MiniMax-M3 takes longer than Opus but scores 240 Elo points lower, suggesting efficiency matters more than raw processing time. Turn count also shows weak correlation with results; Gemini 3.5 Flash uses nearly 90 turns per task on average yet underperforms leaders that use far fewer iterations. (Artificial Analysis)

ARD builds on MCP, other specs to find tools and other resources

Hugging Face has built a reference implementation of the Agentic Resource Discovery (ARD) specification, an open standard co-developed by Microsoft, Google, Hugging Face, GoDaddy, and others that lets AI agents discover tools and services at runtime instead of relying on advance configuration. Currently, developers must hardcode tool URLs or dump all available capabilities into an LLM’s context window, but both approaches fail to scale. ARD uses a searchable registry with richer metadata like publisher identity and compliance attestations, letting agents search in natural language and dynamically invoke the right resource. Hugging Face’s Discover Tool wraps its existing Hub search infrastructure in the ARD specification, exposing thousands of Spaces, ML Skills, and MCP servers through both a REST API and command-line interface. The implementation shows that the same capability can surface as different resource types: A Space can be discovered as an AI skill or MCP server depending on what the client requests, proving the spec works across multiple formats. (Hugging Face)

Open coding model gains new implementations

Cohere’s North Mini Code, its first open-weight coding model designed for software engineers building agentic systems, continued to spur adoption with several quantized releases designed to run on smaller machines. The model totals 30 billion parameters but activates only three billion per token, an MoE design that lets it run on a single H100 GPU at FP8 precision. The architecture interleaves two attention types in a three-to-one ratio and routes each token through eight of 128 available experts, keeping compute efficient while maintaining capacity. Cohere optimized it specifically for code generation, agentic software engineering, and terminal tasks, with native tool use and interleaved thinking support. The model scores 33.4 on Cohere’s Artificial Analysis Coding Index and achieves up to 2.8x higher output throughput than Devstral Small 2 on identical hardware. Weights ship under Apache 2.0 on Hugging Face, with access also available through Cohere’s API, OpenRouter, and a free hosted space, making it a self-hosted alternative for teams wanting to run capable coding models without massive infrastructure. (MarkTechPost)

A call for the White House to reverse course on Fable and Mythos

A group of over 100 cybersecurity professionals, executives, and researchers — including signatories from Nvidia, Google, Adobe, and academia — signed an open letter calling on the U.S. government to reverse export control restrictions placed on Anthropic’s Claude Fable 5 and Claude Mythos 5 large language models. The signatories argue that while these models are capable of finding software vulnerabilities and generating exploits, they are not uniquely powerful compared to other AI systems already in use. The letter states that restricting access to these tools actually harms American defenders by limiting their ability to audit and secure code faster than adversaries, especially given that Chinese models are only months behind U.S. capabilities. The signers acknowledge that Anthropic built multiple protections into Fable to prevent offensive cyber use and that the underlying research capabilities are necessary for any model intended to write secure code. They call for any future AI regulation to be grounded in transparent scientific evaluation, developed through democratic processes, and enforced fairly rather than through ad-hoc directives. (Free Fable)

Apart from Siri, Apple downplays the chatbot

Apple’s AI strategy for iOS 27 extends beyond the overhauled Siri, integrating machine learning into everyday apps like Messages, Calendar, and Home to solve practical problems without a chatbot interface. The company is embedding AI directly into services people already use: bill splitting in Apple Cash with photographed receipts, automatic password updates after data breaches, and intelligent call screens that surface relevant information from email. Other features include natural language Calendar event creation, a Shortcuts builder that works with natural language inputs, and intelligent grouping of Safari tabs. Apple’s philosophy is that AI should fade into the background, powering useful tools rather than obvious AI-powered additions, making existing software itself feel more capable. (TechCrunch)

Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew talked about the U.S. Government and Anthropic’s actions to control AI access, Anthropic’s release of Claude Fable 5 with restrictive guardrails, and the implications for AI sovereignty and open-source alternatives.

“I hope we can build a more free, more open world, where research is freely shared, and laws and societal norms shape a level playing field that allows everyone to make progress. A silver lining of the events of these past two weeks is now that everyone better realizes key points of instability of the current system, we can all work to create a more stable foundation.”

Read Andrew’s letter here.

Other top AI news and research covered in depth:

Independent tests of Claude Fable 5 have encountered Anthropic’s protective policies, raising questions about transparency in AI benchmarking.
New agentic tests like DeepSWE, ProgramBench, and ITBench-AA are pushing AI agents beyond traditional bug hunts, setting new standards for evaluation.
Nvidia is making a bold move with Nemotron 3 Ultra, focusing on speed and openness to capture the competitive AI hardware market.
Researchers have developed Privileged On-Policy Exploration (POPE), a reinforcement learning technique that trains models to expand on partial solutions, enhancing problem-solving capabilities.

Our 7-Day Voice AI Builder Challenge

Stop watching the terminal. Teach your agent to call you when it matters.

DeepLearning.AI and Vocal Bridge are teaming up to host a 7-day Voice AI Builder Challenge. Your mission will be to teach your AI coding assistant to call you the moment it needs human intervention.

Compete for:

Top-tier prizes for the highest-scoring builds
Real-time expert feedback to help you refine your assistant on the fly
A spot on our live, competitive leaderboard
Global visibility, with the winning skill featured on the official DeepLearning.AI GitHub repo

The waitlist is officially open. Join today to ensure you're notified the exact moment the challenge goes live.

Join the Waitlist Now!

Data Points is produced by human editors with AI assistance.

Our 7-Day Voice AI Builder Challenge

Subscribe to Data Points