The latest wave of large language models trained in Chinese is open source for some users.
What’s new: Internet giant Alibaba released large language models that are freely available to smaller organizations. The internet giant followed Baichuan Intelligent Technology, a startup that contributed its own partly open models, and Beijing Academy of Artificial Intelligence, which announced that its WuDao 3.0 would be open source.
How it works: These pretrained models are small compared to, say, Meta’s LLaMa 2 (70 billion parameters) — but that may be a plus in China, where U.S. export restrictions have made chips for processing AI hard to get.
- Alibaba offers Qwen-7B and Qwen-7B-Chat. The models are freely available to small-scale users, but organizations with more than 100 million monthly active users require a license.
- Baichuan Intelligent Technology, a firm owned by Wang Xiaochuan, founder of search engine Sogou (now owned by Tencent), released Baichuan-13B and Baichuan-13B-Chat. The models are freely available to academic users. Commercial users require a license.
- Beijing Academy of Artificial Intelligence revealed its open source Wu Dao 3.0 model family to IEEE Spectrum. The family includes AquilaChat-7B and AquilaChat-33B (both fine-tuned for conversation), AquilaCode (fine-tuned to generate code from natural-language prompts), and Wu Dao Vision (for computer vision tasks). The new models upgrade and slim down the 1.75-trillion-parameter WuDao 2.0.
Behind the news: Developers in China are racing to cash in on chatbot fever. But they face unique hurdles.
- In September, the United States Commerce Department restricted the sale of high-performance AI chips including Nvidia A100 and H100 GPUs in China. Some Chinese customers have found loopholes, but demand continues to outstrip supply.
- Language models and their output are restricted by law. Interim rules set to take effect on August 15 require government approval for generative AI products before they’re released to the public. Developers have limited recent chatbots to comply with restrictions on internet content.
Why it matters: The March leak of Meta’s LLaMA initiated a groundswell of open models that excel in English and a subsequent explosion of innovation and entrepreneurial activity. Competitive open models trained in Mandarin and other Chinese languages could spark similar developments in one of the world’s biggest countries — as long as developers hew to the law.
We’re thinking: High-profile models like ChatGPT and Bard, having been trained on huge amounts of English-language data, tend to know a lot about the histories, geographies, and societies of English-speaking countries but relatively little about places where other languages are spoken. Models trained on Chinese corpora will serve speakers of China’s languages far better, and open source models fine-tuned for Chinese users likely will play an important role.