Cost Containment for Generative AI Microsoft's quest to reduce the size and cost of language models

Published

Oct 18, 2023

Reading time

2 min read

Microsoft is looking to control the expense of its reliance on OpenAI’s models.

What’s new: Microsoft seeks to build leaner language models that perform nearly as well as ChatGPT but cost less to run, The Information reported.

How it works: Microsoft offers a line of AI-powered tools that complement the company’s flagship products including Windows, Microsoft 365, and GitHub. Known as Copilot, the line is based on OpenAI models. Serving those models to 1 billion-plus users could amount to an enormous expense, and it occupies processing power that would be useful elsewhere. To manage the cost, Microsoft’s developers are using knowledge distillation, in which a smaller model is trained to mimic the output of a larger one, as well as other techniques.

Microsoft’s agreement with OpenAI gives it unique access to outputs from OpenAI models. Distilling Open AI models has become the AI team’s top priority. Such models are already running in Bing Chat.
Microsoft AI research chief Peter Lee dedicated around 2,000 GPUs to training and validating distilled models, a fraction of the number used to train and validate GPT-4.
Orca, a 13-billion-parameter LLaMA 2 model that was fine-tuned on GPT-4 outputs, matched ChatGPT on the challenging BIG-Bench Hard benchmark. Nonetheless, it trailed GPT-4 on other benchmarks. (Microsoft reportedly considered releasing Orca on Azure as a competitor to GPT-4 and LLaMA 2, but LLaMA 2’s license restricts its ability to do so.)
The company is also developing smaller models from scratch. For instance, Phi-1 surpassed most open source models on benchmarks for generating Python code, such as HumanEval, despite being smaller by a factor of 10 and trained on less data by a factor of 100.

Behind the news: Microsoft has invested $10 billion in OpenAI. The deal promises the tech giant 75 percent of OpenAI’s operating profit until its investment is repaid, then 49 percent of further profits until reaching an unspecified cap. Meanwhile, Microsoft does have access to high-performing models from other sources. Its Azure cloud platform serves Meta’s LLaMA 2.

Why it matters: Serving large neural networks at scale is a challenge even for Microsoft, which has immense hardware resources and a favorable agreement with OpenAI. Running distilled and fine-tuned models can cut the cost for both tech giants and tiny startups.

We’re thinking: If users like Copilot so much they're running up a large bill in model inferences, that sounds like a positive sign!

Subscribe to The Batch