More New Open Models New models from Nvidia, Alibaba, and Stability AI expand open options

Jun 19, 2024
Reading time
2 min read
Different results from new text-to-image models from Nvidia, Alibaba, and Stability AI

A trio of powerful open and semi-open models give developers new options for both text and image generation. 

What’s new: Nvidia and Alibaba released high-performance large language models (LLMs), while Stability AI released a slimmed-down version of its flagship text-to-image generator.

How it works: The weights for Nvidia’s and Alibaba’s new models are fully open, while Stability AI’s are restricted.

  • Nvidia offers the Nemotron-4 340B family of language models, which includes a 340-billion parameter base model as well as versions fine-tuned to follow instructions and to serve as a reward model in reinforcement learning from human feedback. (Nemotron-4 340B-Reward currently tops the HuggingFace RewardBench leaderboard, which ranks reward models.) The models, which can work with 4,096 tokens of context, were pretrained on 9 trillion tokens that divide between English-language text, text in over 50 other natural languages, and code in more than 40 programming languages. 98 percent of the alignment training set was generated, and Nvidia also released the generation pipeline. The license allows people to use and modify the model freely except for illegal uses. 
  • Alibaba introduced the Qwen2 family of language models. Qwen2 includes base and instruction-tuned versions of five models that range in size from 500 million to 72 billion parameters and process context lengths between 32,000 and 128,000 tokens. The largest, Qwen2-72B, outperforms Llama 3-70B on MMLU, MMLU-Pro, HumanEval, and other benchmarks that gauge performance in natural language, mathematics, and coding. Qwen2-72B and Qwen2-72B-Instruct are available under a license that permits users to use and modify them in commercial applications up to 100 million monthly users. The smaller models are available under the Apache license, which allows people to use and modify them freely. Alibaba said it plans to add multimodal capabilities in future updates.
  • Stability AI launched the Stable Diffusion 3 Medium text-to-image generator, a 2 billion-parameter based on the technology that underpins Stable Diffusion 3. The model is intended to run on laptops and home computers that have consumer GPUs and is optimized for Nvidia and AMD hardware. It excels at rendering imaginary scenes and text; early users encountered inaccuracies in depicting human anatomy, a shortcoming that former Stability AI CEO Emad Mostaque, in a social post, attributed to tuning for safety. The license allows use of the model’s weights for noncommercial purposes. Businesses that have less than 1 million users and $1 million in revenue can license it, along with other Stability AI models, for $20 per month.

Why it matters: AI models that come with published weights are proliferating, and this week’s crop further extends the opportunity to build competitive AI applications. Nemotron-4 340B provides an exceptionally large model among open LLMs. Among smaller models, Qwen2-72B poses stiff competition for Llama 3-70B, which has energized the developer community since its May release. And Stable Diffusion 3 puts Stability AI’s image generation technology into the hands of developers working on edge devices.

We’re thinking: Given the difficulty of acquiring high-quality data to train LLMs, and that the terms of service for many leading models prohibit generating data to train other models, Nvidia’s choice to equip Nemotron-4 to generate synthetic data is especially welcome. And it makes sense from a business perspective: Making it easier for developers to train their own LLMs may be good for GPU sales.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox