Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- Runway adds video-to-video and API
- Moshi, a new open speech model
- LlamaCoder’s open webapp builder alternative
- California restricts synthetic actors and election deepfakes
But first:
Qwen project releases over one hundred updated open models
Alibaba unveiled Qwen2.5, a new and remarkably numerous suite of open-source language models. The models include general-purpose, coding, and math-focused variants in multiple sizes up to 72 billion parameters. Qwen2.5 models introduce enhancements like longer text generation, better structured data handling, and more reliable JSON output. They demonstrate improved performance across benchmarks, with the 72B version competing with leading proprietary and open-source models on tasks like knowledge, reasoning, and instruction following. (GitHub)
Mistral cuts developer prices, introduces free tier
Mistral AI announced price reductions across its model lineup, with its flagship Mistral Large model seeing a 33% price cut to $2 per million input tokens. The company introduced a free tier for its development platform and released an improved 22-billion-parameter Mistral Small model under its research license. Mistral also added free vision capabilities to its chatbot using the Apache 2.0-licensed Pixtral 12B model, allowing users to analyze images without data privacy concerns. (Mistral AI)
Runway adds video-to-video and a developer API
Runway revealed that its Gen-3 Alpha video model can now transform video styles using text prompts. Runway also introduced an API for its Gen-3 Alpha Turbo model, offering developers a way to incorporate video generation into their own applications. The API, currently in limited access, requires users to display a “Powered by Runway” banner and comes with two pricing plans. The platform charges 50 credits for videos up to 5 seconds and 100 credits for videos between 5 and 10 seconds, with a 10-second maximum duration. (Runway)
Kyutai releases new low-latency open-source audio model
Moshi is a new speech-text and speech-to-speech model from Kyutai. It uses Mimi, a new streaming neural audio codec that processes audio more efficiently than existing codecs. The model incorporates two audio streams, one for Moshi and one for the user, and predicts text tokens corresponding to its own inner monologue to improve generation quality. Moshi achieves impressively low latency and high performance. The developers released three models: the Mimi speech codec and two versions of Moshi fine-tuned on synthetic voices. All are available under open-source licenses for research and commercial use. (GitHub)
LlamaCoder’s app builder turns heads
Meta spotlighted Together AI’s LlamaCoder, an open-source web app that uses Llama 3.1 405B to generate complete web applications from user prompts. The app has gained significant traction in just over a month, with over 2,000 GitHub stars, hundreds of repository clones, and more than 200,000 generated applications. This rapid adoption demonstrates the growing interest in open-source AI models for application development and highlights the potential of Llama 3.1 competing with closed-source alternatives. (Meta)
California approves laws to regulate AI in elections and movies
California Governor Gavin Newsom signed legislation aimed at protecting Hollywood actors and performers from unauthorized AI-generated digital clones. The new laws allow performers to back out of contracts with vague language about AI use; they also prevent commercial cloning of deceased performers without estate permission. Newsom also signed three bills to prohibit using artificial intelligence to create false images or videos for political ads. One law makes it illegal to create and publish deepfakes related to elections within 120 days before and 60 days after Election Day, while another requires large social media platforms to remove deceptive material. (AP News and AP News)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng highlighted the role of data engineering in AI and introduced a new professional certificate on Coursera.
“Data underlies all modern AI systems, and engineers who know how to build systems to store and serve it are in high demand. Today, far too many businesses struggle to build a robust data infrastructure, which leads to missed opportunities to create value with data analytics and AI.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: OpenAI’s latest model excels in math, science, and coding, though its reasoning process isn’t visible; SambaNova increased inference speeds for Meta’s Llama 3.1 405B model; Amazon enhanced its warehouse automation by acquiring Covariant’s model-building talent and tech; and researchers proposed a method to reduce memorization in large language models, addressing privacy concerns.