Building a model for vision and speech How Cloudflare thwarts unauthorized AI crawlers… by using AI

Published
Mar 24, 2025
Reading time
4 min read
Aerial view of a hedge maze with a large black spider at the center, surrounded by trees, benches, and paths.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • Nvidia’s Nemotron adds reasoning to Llama models
  • Does ChatGPT make frequent users more lonely?
  • OpenAI’s o1-pro costs a pretty penny
  • Mistral Small 3.1 gives Gemma 3 27B some competition

But first:

New speech model enables real-time visual conversations

Kyutai released MoshiVis, an open vision speech model that lets users have natural voice conversations about images while maintaining low latency. The model adds 206 million trainable parameters on top of the existing Moshi speech model and uses a data-efficient training approach that requires minimal audio data by training on text-based image descriptions. MoshiVis may represent a significant step toward more natural multimodal AI interactions, as the model can seamlessly switch between general conversation and discussing visual content while maintaining low latency on consumer hardware. (Kyutai and arXiv)

Cloudflare uses generative AI to fight unauthorized AI crawlers

Cloudflare launched AI Labyrinth, a new defense system that generates fake web pages to waste the resources of unauthorized AI crawlers that ignore “no crawl” directives. The system creates convincing but irrelevant content networks that serve as honeypots, helping Cloudflare identify and track unauthorized scrapers. AI crawlers now generate over 50 billion requests daily on Cloudflare’s network, representing nearly 1 percent of all web traffic they handle. This approach marks a shift from traditional blocking methods and could make it difficult for AI crawlers to extract useful data. (Cloudflare)

Nvidia releases open reasoning models with shared training data

Nvidia unveiled a new family of open weight reasoning models called Llama Nemotron, sharing not only the models but also 30 million training samples and detailed training methods. The three models - ranging from 8 billion to 253 billion parameters - feature toggleable reasoning capabilities, distilling Meta’s open Llama models but adding DeepSeek-like reinforcement learning. This comprehensive release, which includes model weights, post-training data, and technical documentation, enables AI developers to better understand, modify, and build upon Nvidia’s work to create more capable AI systems. (Nvidia)

OpenAI studies emotional impact of ChatGPT use

OpenAI and MIT Media Lab researchers analyzed 40 million ChatGPT interactions and conducted a four-week trial with nearly 1,000 participants to study how people emotionally engage with the AI system. The studies found that users who developed emotional bonds with ChatGPT were more likely to be lonely and dependent on the system, while participants using voice chat with a gender different from their own reported higher levels of loneliness. Although researchers acknowledge the limitations of self-reported emotional data, these findings begin to address how large language models affect human psychology and could help companies design safer AI interactions and attempt to make their models more “emotionally intelligent.” (OpenAI and MIT Media Lab)

OpenAI launches o1-pro in the API, its most expensive model yet

OpenAI’s reasoning model o1-pro is now available via the company’s Responses API at a price of $150 per million tokens of input and $600 per million tokens of output. This makes o1-pro easily the company’s most expensive model, surpassing GPT-4.5. Previously, o1-pro had only been available through the company’s monthly Pro subscription plan; this release opens it to developers who want to take advantage of its use of more computing power and generates more tokens at inference, allowing it to provide more accurate and logically thorough answers than a standard AI model. (OpenAI)

Mistral releases new open multimodal model

Mistral AI released Mistral Small 3.1, a 24 billion parameter open weights model that processes text and images while running on consumer hardware like an RTX 4090 graphics card. The model outperforms Gemma 3 and similar-sized competitors on various knowledge and instruction—following benchmarks, handles up to 128,000 tokens of context, and operates at speeds of 150 tokens per second. The release shows how competition between open AI models continues to narrow the performance gap with proprietary alternatives while maintaining accessibility for developers. (Mistral)


Still want to know more about what matters in AI right now?

Read last week’s issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng shared insights from AI Dev 25. He highlighted attendees’ strong interest in agentic AI and solving real-world problems over AGI hype. He also praised the event’s technical depth, emphasizing DeepLearning.AI’s “Learner First” mentality and the value of bringing developers together.

“There is something magical about bringing people together physically to share ideas, make friends, and to learn from and help each other. I hope we’ll be able to bring even more people together in the future.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Cohere’s Aya Vision outperformed multimodal rivals in text and image understanding, demonstrating fluency across a wide range of languages; AI Co-Scientist, Google’s new research agent, showed itself capable of generating hypotheses to aid drug discovery; the U.S. Copyright Office ruled that no new laws are needed to govern AI-generated works, noting the copyrightability of AI-assisted creations with sufficient human guidance; and MatterGen, a diffusion model, showcased its ability to design novel materials with tailored properties, advancing AI-driven material discovery.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research