This week's top AI news and research stories featured a new breed of audio generators, benchmarks that rank large language models’ performance of tasks associated with taxes, finance, and law, a survey on manufacturers’ use of AI in engineering, design, procurement, and production, and a method that generates a 3D model from a single image based on Stability’s video diffusion model. But first:
Llama 3 has landed
Meta released Llama 3, a new set of open-source large multimodal models that perform strongly on industry benchmarks. The 8 billion and 70 billion parameter models, available now, offer improved reasoning capabilities over Llama 2; a promised 400 billion parameter version is still in training. The models will be widely available on AWS, Azure, Google Cloud, HuggingFace, and other leading cloud providers and hardware platforms. (See Meta’s press release for more)
Meta expands AI assistant to compete with ChatGPT
Meta announced its new AI assistant, powered by Llama 3, would be integrated into Instagram, Facebook, WhatsApp, Messenger, and via a standalone website. The assistant now incorporates real-time search results from Bing and Google, and its image generation capabilities have been upgraded to create animations and high-resolution images on the fly. (Read an interview with Mark Zuckerberg at The Verge)
Grok-1.5V shows strong vision capabilities
X’s open-source large multimodal model shows competitive performance across various computer vision benchmarks, including multi-disciplinary reasoning and understanding documents, science diagrams, charts, screenshots, and photographs. Grok-1.5V particularly excels in the new RealWorldQA benchmark, which attempts to measure real-world spatial understanding, outperforming GPT-4, Claude, and Gemini Pro in this domain. (See the release notes at X)
Adobe partners with AI companies to expand Premiere Pro's capabilities
In addition to developing its own Firefly AI video model, Adobe is offering what it calls a “sneak” preview of integrated AI tools from Runway, Pika Labs, and OpenAI's Sora into Premiere Pro. These partnerships would give users access to a wider range of AI-powered features for generating B-roll, extending shots, and more. Integrating third-party AI models into Adobe's software ecosystem could benefit both Adobe and the AI companies involved. (Read Adobe’s update or the story at The Verge)
OpenAI launches custom GPT-4 model for Japan
Along with opening new offices in Tokyo, OpenAI is providing early access to a custom version of GPT-4 specifically optimized for the Japanese language. The model offers improved performance in translating and summarizing Japanese text and operates up to 3x faster than its predecessor. The custom model has already shown promising results with the English learning app Speak, and OpenAI plans to release it more broadly via its API in the coming months. (Check out the news at OpenAI)
Next-generation Atlas boasts improved strength and flexibility
The new Atlas robot from Boston Dynamics features compact electric actuators that provide strength exceeding that of an elite human athlete and a range of motion surpassing human capabilities. Atlas’s unusual movements, showcased in an introductory video, are somewhat uncanny but allow for greater efficiency . Despite its unconventional design, Boston Dynamics CEO Robert Playter believes people will adapt to the robot's capabilities as long as it delivers sufficient productivity. (Read Playter’s interview at IEEE Spectrum)
New features and improvements in OpenAI's Assistants API Beta v2
OpenAI is launching an improved version of its Assistants API. The update includes a more powerful file retrieval tool (offering up to 10,000 files per assistant), simplified vector stores, and new controls for managing token usage and conversation histories. The Assistants API now also supports streaming and the use of fine-tuned versions of GPT-3.5. (See the release notes on OpenAI’s website)
Research: Infinite context with finite memory
Google researchers introduce Infini-attention, an efficient method that scales transformer language models to infinitely long inputs with bounded memory and computation. Infini-attention incorporates a compressive memory into the standard attention mechanism, enabling both local masked attention and long-term linear attention within a single Transformer block. Experiments demonstrate Infini-attention's effectiveness on long-context language modeling, passkey retrieval, and book summarization tasks, introducing minimal additional parameters while enabling fast streaming inference. (Read the research paper)
Research: The potential impact of advanced AI assistants on society
A new paper by Google DeepMind argues that as AI assistants become more advanced and capable of communicating in natural language, new challenges arise in terms of trust, privacy, and appropriate human-AI relationships. Safeguards must be put in place to ensure users can reliably identify automated assistants and maintain control over their interactions. AI assistants should also be designed to cooperate and coordinate with each other to meet human preferences and avoid unnecessary conflicts. (Check out the story at DeepMind)
Research: VASA-1 generates virtual characters with lifelike facial expressions and head movements
Microsoft researchers developed VASA, a new framework that generates highly realistic videos of talking faces using just a single still image and an audio clip. VASA-1, the first model based on this framework, does not clone or simulate voices, but can produce synchronized lip movements for an audio file, while also capturing a wide range of natural facial expressions and head motions. (Read the paper and see examples at Microsoft)
Research: How synthetic data fuels LLMs’ expansion
Synthetic data and instruction tuning are emerging as powerful tools for training large language models, enabling them to learn complex reasoning skills and reduce harmful biases. These models, such as Anthropic's Claude and Meta’s Llama 2, demonstrate remarkable abilities to follow complex instructions, engage in open-ended dialogue, and even write code. However, synthetic datasets need to be carefully generated and curated in order to reap their full benefits. (Dig into the researchers’ arguments in the paper)
AI defeats human pilot in simulated dogfight
The United States Air Force recently announced that an AI agent successfully piloted the X-62 VISTA aircraft during a simulated dogfight against a human F-16 pilot above California’s Edwards Air Force base in 2023. The X-62, part of DARPA's Air Combat Evolution (ACE) program, is part of a broader test of autonomous flight capabilities. “Dogfighting was the problem to solve so we could start testing autonomous artificial intelligence systems in the air,” said chief test pilot Bill Gray. (Read more at The Aviationist)
AI Index 2024 highlights AI's rapid advancements and societal impact
The seventh edition of the AI Index report, the most comprehensive to date, reveals that AI has surpassed human performance in several tasks and is having a significant impact on many sectors, including labor and scientific discovery. But the report also highlights concerns regarding the lack of standardization in evaluation criteria, the increasing costs of training state-of-the-art AI models, and the growing public nervousness toward AI products and services. (Explore the full report)
Microsoft’s billion-dollar bet on AI in the UAE
Microsoft announced a $1.5 billion investment in G42, an Abu Dhabi-based AI group, marking the company's first major AI partnership in the Middle East. The deal has raised concerns among US officials due to G42's alleged links to the Chinese government, which the company denies. (See the story at CNN)