Large-Model AI for Apple Devices 2026's Apple Foundation Models bring AI to MacBooks, iPhones, and the cloud

Published
Reading time
3 min read
AFM 3 Core model architecture visualizes DRAM and NAND processes in AI with focus on sparsely-activated LLM operations.
Loading the Elevenlabs Text to Speech AudioNative Player...

The third generation of Apple Foundation Models — fruit of Apple’s collaboration with Google — introduces a variation on the mixture-of-experts architecture that runs on local devices. 

What’s new: AFM 3 Core Advanced, a model designed to generate text and speech on some Apple devices, exceeds the processing efficiency of the popular mixture-of-experts architecture while occupying  substantially less working memory. (Other models in the AFM 3 family, all of which are custom-built and distilled from unspecified Google Gemini models, include AFM 3 Core, which also runs on Apple devices, and AFM 3 Cloud, AFM 3 Cloud Image, and AFM 3 Cloud Pro, which run on servers.)

  • Input/output: Text, images, speech in; text, speech out
  • Architecture: Modified mixture-of-experts transformer (20 billion parameters total, 1 to 4 billion parameters active)
  • Availability: Available in fall 2026 with Apple operating system updates to Macs and iPhone 17 Pro/Max/Air phones
  • Features: Text understanding and generation, speech understanding and generation, image understanding, tool use, skills, reasoning, 25 languages
  • Undisclosed: input/output size limits, benchmark performance, specific training data and methods

How it works: AFM 3 Core Advanced is optimized to run on Apple silicon.

  • Like other models in its family, AFM 3 Core Advanced was trained on a mixture of publicly available information, licensed data, data collected from studies, and generated data. No user data or user interactions with models was included. The training process included pretraining, supervised fine-tuning, and reinforcement learning. 
  • An alternative to MoE called Instruction-Following Pruning enables AFM 3 Core Advanced to run more quickly and memory-efficiently. A typical mixture-of-experts model uses routing layers within the model to choose which parts of the network (experts) to activate for each output token. Instead of using routing layers within the model, AFM 3 Core Advanced uses a separate transformer to choose which experts to activate for some or all output tokens. Since the network doesn’t switch experts for every token, it can achieve faster inference than typical mixture-of-experts models of the same size.

Results: Apple has not yet published any benchmark results for AFM 3 models yet. It says it will release results later in the year. Like other AFM 3 models, AFM 3 Core Advanced outperformed the previous generation in proprietary measurements of human preference.  

Behind the news: In January, Apple struck a multi-year agreement with Google to use Gemini models as the basis of its AI models. As part of the AFM 3 launch, Apple Vice President of AI Amar Subramanya revealed that the models were "distillation-based, not a wholesale adoption of Gemini.” Concurrently, Apple announced plans for its Foundation Models Framework to accommodate models from other companies. Developers who are building on Apple hardware will be able to choose between AFM 3 models and alternatives that implement Apple’s LanguageModel protocol, including Anthropic Claude or Google Gemini families.

Why it matters: AFM 3 Core Advanced’s architecture is a notable accomplishment. A typical mixture of experts requires loading the whole model into active memory (RAM or VRAM), since loading the experts token by token from flash-memory storage is slow. AFM 3 Core Advanced uses the same experts across multiple tokens. This makes it practical to store the model in flash memory — a larger, more capable model that can run on local devices.

We’re thinking: Constraints on memory and bandwidth make it impractical for most apps to download and run multi-billion-parameter models. So iOS developers have ample incentive to use streamlined models that are designed to run on iOS hardware. Moreover, Apple’s partnership with Google lets it tap into the latter’s expertise at running such models on phones.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox