Large-Model AI for Apple Devices 2026's Apple Foundation Models bring AI to MacBooks, iPhones, and the cloud

Published

Jun 26, 2026

Reading time

3 min read

The third generation of Apple Foundation Models — fruit of Apple’s collaboration with Google — introduces a variation on the mixture-of-experts architecture that runs on local devices.

What’s new: AFM 3 Core Advanced, a model designed to generate text and speech on some Apple devices, exceeds the processing efficiency of the popular mixture-of-experts architecture while occupying substantially less working memory. (Other models in the AFM 3 family, all of which are custom-built and distilled from unspecified Google Gemini models, include AFM 3 Core, which also runs on Apple devices, and AFM 3 Cloud, AFM 3 Cloud Image, and AFM 3 Cloud Pro, which run on servers.)

Input/output: Text, images, speech in; text, speech out
Architecture: Modified mixture-of-experts transformer (20 billion parameters total, 1 to 4 billion parameters active)
Availability: Available in fall 2026 with Apple operating system updates to Macs and iPhone 17 Pro/Max/Air phones
Features: Text understanding and generation, speech understanding and generation, image understanding, tool use, skills, reasoning, 25 languages
Undisclosed: input/output size limits, benchmark performance, specific training data and methods

How it works: AFM 3 Core Advanced is optimized to run on Apple silicon.

Like other models in its family, AFM 3 Core Advanced was trained on a mixture of publicly available information, licensed data, data collected from studies, and generated data. No user data or user interactions with models was included. The training process included pretraining, supervised fine-tuning, and reinforcement learning.
An alternative to MoE called Instruction-Following Pruning enables AFM 3 Core Advanced to run more quickly and memory-efficiently. A typical mixture-of-experts model uses routing layers within the model to choose which parts of the network (experts) to activate for each output token. Instead of using routing layers within the model, AFM 3 Core Advanced uses a separate transformer to choose which experts to activate for some or all output tokens. Since the network doesn’t switch experts for every token, it can achieve faster inference than typical mixture-of-experts models of the same size.

Results: Apple has not yet published any benchmark results for AFM 3 models yet. It says it will release results later in the year. Like other AFM 3 models, AFM 3 Core Advanced outperformed the previous generation in proprietary measurements of human preference.

Behind the news: In January, Apple struck a multi-year agreement with Google to use Gemini models as the basis of its AI models. As part of the AFM 3 launch, Apple Vice President of AI Amar Subramanya revealed that the models were "distillation-based, not a wholesale adoption of Gemini.” Concurrently, Apple announced plans for its Foundation Models Framework to accommodate models from other companies. Developers who are building on Apple hardware will be able to choose between AFM 3 models and alternatives that implement Apple’s LanguageModel protocol, including Anthropic Claude or Google Gemini families.

Why it matters: AFM 3 Core Advanced’s architecture is a notable accomplishment. A typical mixture of experts requires loading the whole model into active memory (RAM or VRAM), since loading the experts token by token from flash-memory storage is slow. AFM 3 Core Advanced uses the same experts across multiple tokens. This makes it practical to store the model in flash memory — a larger, more capable model that can run on local devices.

We’re thinking: Constraints on memory and bandwidth make it impractical for most apps to download and run multi-billion-parameter models. So iOS developers have ample incentive to use streamlined models that are designed to run on iOS hardware. Moreover, Apple’s partnership with Google lets it tap into the latter’s expertise at running such models on phones.

Subscribe to The Batch