In today’s edition of Data Points, you’ll learn more about:
- Mira Murati’s Tinker’s simplified approach to fine-tuning
- OpenAI’s new video model and social app
- IBM’s embrace of Mamba for Granite 4.0 models
- Perplexity’s AI browser, now free worldwide
But first:
DeepSeek unveils sparse attention model for cheaper long-context inference
DeepSeek released V3.2-exp, an experimental model with a new sparse attention system that cuts inference costs for long-context operations by up to 50 percent. The system employs an indexer to prioritize specific excerpts and a token selection system to choose relevant tokens, allowing the model to process long contexts with reduced server loads. The open-weight model is available on Hugging Face with an accompanying academic paper on GitHub, enabling third-party researchers to verify DeepSeek’s performance claims. This development addresses the growing challenge of inference costs, a critical bottleneck as AI applications scale. The model is available under an MIT license, or via API at $0.28/$0.42 per million input/output tokens. (DeepSeek)
California enacts AI safety and transparency law SB 53
California Governor Gavin Newsom signed the Transparency in Frontier Artificial Intelligence Act (SB 53), requiring advanced AI companies with annual revenues of at least $500 million to report their safety protocols and disclose any major risks posed by their technologies. The law mandates companies publicize their safety best practices in line with national and international standards, and report safety incidents to the state’s Office of Emergency Services. It also strengthens whistleblower protections for employees who warn about potential dangers. Newsom vetoed a stricter safety bill last year that would have required mandatory safety testing and kill switches after intense industry lobbying against those provisions. Industry response to this compromise bill has been mixed, with some large AI companies and tech leaders endorsing its approach and others rejecting its mandates as an overreach. (State of California)
Thinking Machines’ first product simplifies fine-tuning
Tinker launched today as a managed API service that lets researchers and developers fine-tune language models without managing distributed training infrastructure. The platform supports various open-weight models from small to large, including massive mixture-of-experts models like Qwen-235B-A22B, with model switching requiring only a single code change. The service, the first from Mira Murati’s closely-watched AI startup Thinking Machines, aims to make it easier to customize existing models, as early users from Princeton, Stanford, Berkeley, and Redwood Research have already demonstrated success in specialized applications ranging from theorem proving to chemistry reasoning. Tinker is currently in private beta with free access to start, with usage-based pricing coming in the following weeks. (Thinking Machines)
OpenAI launches Sora 2 with mobile video creation and sharing app
OpenAI released Sora 2, its updated video and audio generation model, along with a new iOS social app that allows users to create, remix, and share AI-generated videos. The model demonstrates significant improvements in physical accuracy, including realistic object physics, synchronized dialogue and sound effects, and the ability to follow complex multi-shot instructions while maintaining consistent world state. A feature called “cameos” enables users to insert themselves or others into AI-generated scenes after a one-time video recording for identity verification. The company compares Sora 2 to GPT-3.5, marking a major leap in capabilities and user engagement from the original Sora model launched in February 2024. Some observers wonder whether the video app is more fun than useful, as OpenAI hunts for another “ChatGPT moment” to boost engagement. The Sora iOS app is initially available free in the U.S. and Canada with high usage limits. ChatGPT Pro users gain access to the higher-quality Sora 2 Pro model on sora.com, with an API release to follow. (OpenAI)
IBM releases Granite 4.0 models with hybrid architecture
The Granite 4.0 family features a novel hybrid Mamba/transformer architecture that reduces memory requirements by up to 70 percent while maintaining competitive performance. The models combine Mamba-2 layers with transformer blocks in a 9:1 ratio, enabling linear rather than quadratic scaling with sequence length and constant memory usage regardless of context size. So far, Granite 4.0 includes four variants: H-Small (32 billion total parameters/9 billion active), H-Tiny (7 billion total/1 billion active), H-Micro (3 billion dense), and Micro (3 billion conventional transformer). These models can run on significantly cheaper GPUs and handle workloads like long-context RAG systems and multiple concurrent sessions that would overwhelm conventional transformers. The models are available now on IBM’s watsonx.ai, through partners including Hugging Face, NVIDIA NIM, and Ollama, with Amazon SageMaker and Microsoft Azure support coming soon, all under Apache 2.0 licensing. Reasoning versions of all models and a Medium-sized model are expected soon. (IBM)
Perplexity launches Comet browser worldwide for free
Perplexity released its AI-powered web browser Comet globally on Thursday, making it free to all users after initially charging $200 monthly for Perplexity Max subscribers. The browser functions as a personal assistant that can search the web, organize tabs, draft emails, shop, and perform other tasks while millions of users waited on the access list. Perplexity faces competition from Google’s Gemini integration in Chrome, Anthropic’s browser-based AI agent, and OpenAI’s Operator, which all offer similar browser-based AI capabilities. The move to free access could help Perplexity gain market share in the increasingly crowded AI browser space, particularly after the company made an unsolicited $34.5 billion bid for Google’s Chrome browser in August. (CNBC)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng talked about LandingAI’s Agentic Document Extraction (ADE) tool, which transforms PDF files into LLM-ready markdown text for use in sectors like healthcare, financial services, and law, emphasizing the upside of AI-based data extraction from complex documents.
“Before LLMs, many documents sat on individuals’ laptops or in businesses’ cloud storage buckets unexamined, because we did not have software that could make sense of them. But now that LLMs can make sense of text, there’s significant value in getting information out of the numerous PDF documents, forms, and slide decks we’ve stored for processing — if we are able to extract the information in them accurately.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth:
- OpenAI partners with Oracle, Nvidia, Softbank, and more to build out 20 gigawatts of data center capacity, marking a significant step toward trillion-dollar spending.
- Researchers use genomic language models to create custom viruses, highlighting advancements in AI-generated viral genomes.
- Sweden’s STIM has built an ecosystem for training AI models on copyrighted music while ensuring compensation for original artists.
- Google’s AlphaEarth Foundations tracks the whole planet’s climate, land use, and potential for disasters in detail and at scale, modeling Earth in 10-meter squares.