Fine-Tuning Simplified Thinking Machines’ new Tinker API makes it easier to fine-tune models on many GPUs

Published
Reading time
2 min read
Close-up of a violin scroll and pegs, symbolizing precision needed in fine-tuning AI models.
Loading the Elevenlabs Text to Speech AudioNative Player...

The first offering from Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, aims to simplify — and democratize — the process of fine-tuning AI models.

What’s new: Tinker is an API that streamlines working with multiple GPUs to fine-tune large language models. Users control their algorithms while code behind the scenes handles scheduling, resource allocation, and recovery in case a GPU crashes. You can join a waitlist for free access, but the company plans to start charging in coming weeks. Tinker currently offers a selection of pretrained Qwen3 and Llama 3 models with other open-weights options to come.

How it works: The API lets you work as though you were fine-tuning on a single device. You can select a model and write a fine-tuning script that loads your data and specifies a predefined loss function for supervised or reinforcement learning, or you can write your own. Tinker’s software determines, for instance, how to split the model and data among computing clusters.
During fine-tuning, the system builds and trains a LoRA adapter (two small matrices that modify a pretrained model’s weights at inference) for the task at hand.

  • Using LoRA also enables the system to share a single pool of compute among multiple fine-tuning runs, which reduces costs.
  • Tinker Cookbook offers implementations of fine-tuning methods.

Behind the news: Several companies can fine-tune models on your data but don’t give you control over the training loop, similar to the way OpenAI fine-tunes its models on customer data. Libraries like DeepSpeed offer control over fine-tuning while simplifying parallelization across multi-GPU infrastructure, but they require you to manually request GPUs from cloud services (if you don’t have your own) and manage configuration files, which can be complicated.

Why it matters: Fine-tuning using multiple GPUs often requires dedicating time to figure out how to allocate resources, debug tricky APIs, and so on. Tinker saves that time, enabling model builders to spend it more productively. Academic researchers, startups, and mid-size companies that want to level up their investment in AI research and/or development are most likely to find it helpful.

We’re thinking: Tinker’s use of LoRA  divides the cost of training base models among multiple fine-tuning runs, and potentially among users. This could enable users to experiment more within the a fixed budget.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox