Skills remakes Claude with custom instructions Google’s Veo 3.1 adds native audio and new editing tools

Published
Oct 20, 2025
Reading time
4 min read
 Engaged in multitasking, man blends gaming with finance; sleek home office highlights tech and data trends.

In today’s edition of Data Points, you’ll learn more about:

  • Microsoft’s Copilot Voice and Vision for Windows 11
  • HunyuanImage-3.0, a leaderboard-topping image generator
  • Adobe’s new plan for custom Firefly models
  • Ling-1T, an open non-thinking model that shines at reasoning

But first:

Anthropic launches Skills to customize Claude

Skills are portable folders containing instructions, scripts, and resources that Claude automatically loads when relevant to a task at hand, keeping the model fast while it accesses specialized instructions and expertise. The feature works across all Claude products, from Claude apps (for Pro, Max, Team, and Enterprise users) to the Messages API and Claude Code, and multiple skills can stack together for complex workflows. By extending Claude’s capabilities beyond its base training, Anthropic hopes that Skills, like MCP, may become a standard way for advanced users (and their teammates) to interact with an AI model. Users can access Anthropic-created skills for common tasks like creating Excel spreadsheets and PowerPoint presentations, customize example skills from GitHub, or build their own using the skill-creator tool. (Anthropic)

Google updates Veo with audio generation and new editing tools

Google released Veo 3.1, an updated version of its video generation model that adds generated audio and tops other video models on multiple benchmarks. Google’s video editor Flow also has new tools: “Ingredients to Video,” “Frames to Video,” and “Extend” can edit previously created videos without sound. Veo 3.1 also includes new editing capabilities: an “Insert” tool that adds elements to scenes while adjusting lighting and shadows, and a forthcoming “Remove” feature for erasing objects from videos. Veo 3.1 is available through the Gemini API, Vertex AI, the Gemini app, and Flow, with eight seconds of standard video costing about $3. (Google)

Microsoft integrates voice and vision AI into Windows 11

Microsoft’s latest OS update brings Copilot Voice and Copilot Vision capabilities to all Windows 11 PCs. Users can now activate Copilot with a “Hey Copilot” wake word and ask questions using natural language. Copilot Vision analyzes what’s on screen to provide guidance for tasks like troubleshooting, learning new apps, or editing projects. Microsoft says it hopes to make AI interaction as fundamental to computing as the mouse and keyboard (and also to sell lots of upgrades from the now-deprecated Windows 10). The new Copilot tools are available now for Windows 11 users via the Microsoft Store, with additional updates rolling out to Windows Insiders and Copilot Labs in the coming months. (Microsoft)

Tencent’s HunyuanImage-3.0 is a best-in-class text-to-image model

HunyuanImage-3.0 uses an autoregressive architecture instead of the diffusion transformer (DiT) approach common in most current image generators. (OpenAI’s GPT-Image is another exception.) Tencent’s model employs a Mixture of Experts design with 64 experts and 80 billion total parameters, activating 13 billion parameters per token, making it the largest open MoE model for image generation. The unified architecture allows the model to reason about prompts and automatically expand brief descriptions with contextually relevant details drawn from its training data. HunyuanImage-3.0 currently tops LMArena’s image generator leaderboard, beating Google’s Nano Banana, GPT-Image, and other leading closed models. (Hugging Face)

Adobe introduces AI Foundry to customize Firefly for enterprises

AI Foundry retrains Adobe’s Firefly AI model with enterprise customers’ proprietary data, brand guidelines, and visual assets. Unlike Adobe’s existing custom Firefly models, which handle single concepts and image generation only, AI Foundry models will be multimodal and can understand multiple kinds of input simultaneously. Adobe teams work directly with clients to identify, transfer, and tag data before retraining the base Firefly model through a process called “continuous pre-training,” which the company describes as “deep tuning” rather than standard fine-tuning. The service aims to meet enterprise demand for more sophisticated AI customization while keeping client data separate and ensuring companies retain ownership of generated images. Early customers include Home Depot and Walt Disney Imagineering, with models deployed through Adobe’s Firefly Services API. (VentureBeat)

Ling-1T rivals GPT-5 in reasoning benchmarks

A Chinese research team released Ling-1T, a 1 trillion parameter AI model that uses 50 billion active parameters per token and was trained on over 20 trillion tokens. The model outperformed open-weights competitors like DeepSeek-V3.1 and is competitive with proprietary systems including GPT-5 and Gemini 2.5 Pro at mathematics, coding, and logical reasoning tasks. Ling-1T makes several atypical technical choices, including FP8 mixed-precision training for 15 percent faster performance, an “evolutionary chain-of-thought” training process, and a sentence-level reinforcement learning method called LPO that treats sentences rather than individual tokens as semantic units. Ling-1T is the largest FP8-trained foundation model to date and shows that open-source non-thinking models can match proprietary systems in complex reasoning while maintaining greater efficiency and transparency. The model is available for download at Hugging Face and ModelScope, but API pricing and commercial availability details were not announced. (Hugging Face)


Want to know more about what matters in AI right now?

Read the latest issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng talked about the importance of disciplined evaluation and error analysis in AI development, emphasized that understanding root causes of errors can lead to faster progress, and introduced best practices for evaluating agentic systems.

“Rather than defining an error metric ahead of time, it is therefore typically more effective to first quickly build a prototype, then manually examine a handful of agent outputs to see where it performs well and where it stumbles. This allows you to focus on building datasets and error metrics — sometimes objective metrics implemented in code, and sometimes subjective metrics using LLM-as-judge — to check the system’s performance in the dimensions you are most concerned about.”

Read Andrew’s letter here.

Other top AI news and research stories covered in depth:

  • OpenAI strengthened its ties with AMD through a multi-billion dollar chip deal, providing OpenAI six gigawatts of computing power and up to 10% of AMD stock.
  • DeepSeek cut inference costs with DeepSeek-V3.2-Exp, which streamlines processing using a "Lightning Indexer" to boost efficiency.
  • Thinking Machines simplified fine-tuning with the new Tinker API, making it easier to fine-tune models on many GPUs.
  • MolmoAct enhanced robotic capabilities by creating spatial maps, allowing robots to plot their actions before executing text directions.

Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research