Hugging Face gives researchers and startups GPU access Plus, Meta unveils a new mixed-modal model

May 24, 2024
Reading time
3 min read
Hugging Face gives researchers and startups GPU access: Plus, Meta unveils a new mixed-modal model

This week's top AI news and research stories featured everything we know about OpenAI's GPT-4o, highlights from Google’s annual I/O developers’ conference, Sony Music's declaration opting out of AI training, and details about Meta's Emu Edit model. But first:

OpenAI debuts new data analysis tools for ChatGPT (OpenAI)
Paid ChatGPT users can now upload files directly from Google Drive and Microsoft OneDrive, interact with tables and charts using natural language, and customize charts for presentations. When users upload or import a data file, ChatGPT can now write and execute Python code to analyze or visualize that data on users’ behalf. These features may make it easier for those with limited coding skills to conduct in-depth analyses and let experts save time on routine data tasks.

Reddit partners with OpenAI (Reddit)
Reddit’s vast forums will be used to power ChatGPT and other AI products. The collaboration will give Reddit new AI-powered features for its users and moderators, while OpenAI will advertise on Reddit. (Full terms were undisclosed.) OpenAI now has deals with global newspapers, software forums, and a wide variety of other publishers, giving it special access to timely and high-quality training material.

Hugging Face commits $10 million in free GPU access to academics and startups (The Verge)
ZeroGPU is accessible through Hugging Face’s Spaces platform, which already hosts over 300,000 AI demos. The shared Nvidia A100s can be used concurrently by multiple users or applications; unutilized capacity will be made available to others. HuggingFace’s goal is to counter tech giants and closed models’ centralization by making state-of-the-art AI technologies more accessible.

Meta’s research arm introduces Chameleon (ArXiv)
Chameleon can natively process both text and images together, allowing it to perform a wide range of mixed-modal tasks with impressive results. Meta’s researchers say the key is Chameleon’s fully token-based architecture (representing images as well as texts as tokens) and training on datasets that combine text with images. Chameleon outperforms many leading and specialized models (including GPT-4V and Gemini Pro) when answering questions about images, describing pictures, writing relevant text, and creating images from text prompts. 

Google’s Project IDX enters open beta (Google/Project IDX)
Google’s AI-assisted, browser-based integrated development environment (IDE) offers now-familiar features like code completion, debugging tools, and a chat-assisted sidebar, all powered by Gemini. Whenever IDX modifies snippets or suggests new code, it also links back to the original source and its associated license, ensuring proper attribution. Although Google is entering a competitive market, IDX aims to attract developers by showcasing Gemini’s AI advancements and integrating with the company’s cloud services.

Anthropic introduces prompt generation tool for Claude (Anthropic)
The tool aims to solve new users’ “blank page problem” by providing a starting point for testing and iteration, incorporating best practices like chain of thought and separating data from instructions. Users can access the prompt generator directly on the Console or analyze the underlying prompt and architecture using a Google Colab notebook. The generator addresses a common challenge for AI users: efficiently crafting effective (and often larger and more complex) prompts that yield high-quality results.

ElevenLabs launches AI-powered screen reader app (KnowTechie/Bloomberg)
ElevenLabs Reader: AI Audio is the billion-dollar AI voice cloning startup’s first consumer app. The free app can read web pages, PDFs, and other documents aloud using a selection of 11 AI-generated voices. The app marks ElevenLabs’ expansion into the broader AI voice market beyond its current focus on entertainment and media production.

Microsoft offers China-based AI staff relocation amid U.S. crackdown (CNBC/WSJ)
Microsoft reportedly asked hundreds of its China-based employees working on cloud computing and AI to consider relocating to other countries. One source said Microsoft offered 700 to 800 Chinese engineers the opportunity to transfer to the U.S., Ireland, Australia, or New Zealand. The move comes as the U.S. government tightens restrictions on China’s access to advanced technology, citing concerns over potential military applications and cybersecurity threats.

Falcon 2 open-source models boast rich vision-to-text capabilities (AetosWire)
Abu Dhabi’s Technology Innovation Institute released Falcon 2, a family of large language models that includes Falcon 2 11B and Falcon 2 11B VLM. The latter is the institute’s first multimodal model, capable of converting visual inputs into textual outputs. Both models are Apache 2.0 open-source, multilingual, and perform on par with Gemma 7B and better than Llama 3 8B according to benchmarks and HuggingFace leaderboards.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox