Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- DeepSeek’s inexpensive V2 model gets a new license
- Stable Video, now in four dimensions
- How Runway trained its video model
- Microsoft makes Phi-3 easier to fine-tune and deploy
But first:
A newer, bigger model from Mistral
Mistral AI released Mistral Large 2, a 123 billion parameter language model with a 128,000 token context window supporting dozens of languages and 80+ coding languages. The company claims Mistral Large 2 “sets a new frontier in terms of performance / cost of serving on evaluation metrics,” achieving 84% accuracy on MMLU in its pretrained version, putting it somewhere between Claude 3 Sonnet and GPT-4. The company also announced that it would be deprecating older models on its platform to focus on NeMo, Large, Codestral, and Embed. Mistral Large 2 is available on Mistral’s platform and through major cloud providers, with different licensing options for research, non-commercial, and commercial use. (Mistral)
Udio 1.5 gives users more musical control
Udio’s latest update introduces stem downloads, allowing users to separate tracks into vocals, bass, drums, and other elements for advanced mixing and remixing. The new audio-to-audio feature enables users to upload and reimagine their own tracks using AI, while key control lets creators specify musical keys in their prompts for more precise harmonic results. These tools give music makers more control over AI-generated compositions, opening up new creative possibilities for both amateurs and professionals. (Udio)
DeepSeek-V2 code released under permissive license
DeepSeek changed the license for DeepSeek-V2, a 236 billion parameter mixture-of-experts language model that achieves strong performance while reducing training costs by 42.5% compared to its predecessor. The model uses novel attention and feed-forward network architectures to enable economical training and fast generation, outperforming many leading models on benchmarks across English, Chinese, coding, and math tasks. DeepSeek-V2 is released under a custom license that allows for commercial use, with the code repository licensed under the MIT License. The company offers API access to the model through its platform, providing millions of free tokens to new users and a pay-as-you-go option at 14 cents per million input tokens and 28 cents per million output tokens. (Hugging Face)
Stable Video 4D opens up generative video research
Stability AI introduced Stable Video 4D, a new AI model that transforms a single object video into eight different novel-view videos. Users upload a single video and specify desired 3D camera poses. The model then generates eight novel-view videos from different perspectives based on those specifications. It can produce 5-frame videos across 8 views in about 40 seconds. The model aims to improve consistency across spatial and temporal axes compared to previous approaches. Stable Video 4D is currently available on Hugging Face for researchers and developers to experiment with, but the model is still in a research phase, with ongoing work to refine its capabilities. (Stability AI)
Document leak says Runway trained its video model on YouTube
Video generation company Runway may have secretly scraped thousands of YouTube videos and pirated content to train its Gen-3 model. An internal spreadsheet obtained by 404 Media reveals the company collected videos from popular YouTube channels, influencers, and media companies without their knowledge or consent. This news gives insight into how Runway’s model was trained, but also raises significant questions about ethical data collection practices, particularly as Google has previously stated that such scraping violates YouTube’s terms of service. (404 Media)
Microsoft introduces serverless fine-tuning and endpoints for Phi model family
Microsoft announced significant updates to its Phi-3 family of small language models, including serverless fine-tuning capabilities for Phi-3-mini and Phi-3-medium. The company also made Phi-3-small available via a serverless endpoint, allowing developers to quickly build AI applications without managing infrastructure. These enhancements, along with improvements to Phi-3-mini’s performance in areas like instruction-following and structured output, aim to make AI development more efficient and accessible for a wide range of cloud and edge scenarios. (Microsoft Azure)
Still want to know more about what matters in AI right now?
Read last week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng shared his thoughts on why AI startups may want to begin by imagining a concrete product to test rather than a general problem to solve:
“If you are thinking about starting a new AI project, consider whether you can come up with a concrete vision to execute toward. Even if the initial vision turns out not to be quite right, rapid iteration will let you discover this sooner, and the learnings will let you switch to a different concrete idea.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: All about OpenAI's GPT4-o mini, Meta's restriction of their multimodal models in the EU, why investors are stockpiling AI chips to attract startups, and VASA-1, a generative system that produces a talking-head video with appropriately expressive motion.