Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- Meta’s powerful new real-time object segmentation model
- Google expands its open-source Gemma family
- How to measure models’ resistance to harmful prompts
- Perplexity teams up with major publishers
But first:
Shutterstock and NVIDIA introduce a new 3D generative model
Shutterstock and NVIDIA launched a new service in commercial beta that allows creators to quickly prototype 3D assets and generate 360-degree HDRi backgrounds using text or image prompts. Generative 3D, built with NVIDIA’s visual AI foundry, enables designers to create 3D objects rapidly for prototyping or populating virtual environments. The tool renders assets in various file formats, making them ready for editing in digital content creation tools. The service aims to boost productivity for designers and artists, allowing them to focus on higher-level creative tasks while automating time-consuming 3D asset generation. (Nvidia and Shutterstock)
U.S. agency endorses open-source AI development
The National Telecommunications and Information Administration recommended allowing powerful AI models’ key components to be made widely available as “open-weight” models. This approach allows developers, including small companies, researchers, nonprofits, and individuals, to build upon and adapt existing AI work. The NTIA’s recommendation aims to promote innovation and broader access to AI tools while still allowing the government to monitor potential risks and respond if necessary. (NTIA)
Meta’s SAM 2 brings powerful object segmentation to video and images
Meta released SAM 2, an advanced AI model that performs real-time object segmentation in both images and videos, surpassing its predecessor in image accuracy while adding video capabilities. The unified model can segment any object in any video or image, even for previously unseen content, without requiring custom adaptation. Meta is releasing SAM 2 under an Apache 2.0 license, along with the SA-V dataset containing 51,000 videos and over 600,000 spatio-temporal masks. The model has potential applications in video editing, scientific research, and as a component in larger AI systems for multimodal understanding. (Meta)
Google adds three new tools to the open-source Gemma family
Gemma 2’s new 2 billion parameter model outperforms GPT-3.5 on the Chatbot Arena leaderboard. The model is optimized for various hardware configurations, including NVIDIA GPUs and edge devices, while integrating with frameworks like Keras, JAX, and Hugging Face. ShieldGemma offers classifiers to detect harmful content in four areas: hate speech, harassment, sexually explicit content, and dangerous content. Meanwhile, Gemma Scope provides over 400 sparse autoencoders covering all layers of Gemma 2 2B and 9B models to help developers gain insights into the models’ decision-making processes. (Google)
New Scale AI leaderboard tests models’ resistance to harmful prompts
Scale AI’s system uses 1,000 human-written prompts covering topics like illegal activities, hate speech, and self-harm. Models are ranked based on the number of “high harm” violations in their responses, with fewer violations indicating greater robustness. The evaluation aims to measure progress in steering AI models away from producing harmful content when faced with adversarial inputs. According to the leaderboard, Gemini 1.5 Pro currently leads with only 8 violations, followed closely by Llama 3.1 405B Instruct with 10 violations and Claude 3 Opus with 13 violations; GPT-4o finished eighth with 67 violations. This benchmarking approach allows for comparing safety capabilities across different AI models and companies. (Scale)
Perplexity teams up with major publishers in new revenue-sharing program
Perplexity announced its Publishers’ Program, promising to share revenue and provide technological support to partners including TIME, Der Spiegel, Fortune, Entrepreneur, The Texas Tribune, and WordPress.com. The program includes revenue sharing from advertising (a new business model for Perplexity), free access to Perplexity’s Online LLM APIs for custom answer engines, and Enterprise Pro accounts for partners’ employees. Perplexity had been accused of plagiarizing other publishers’ stories, but the Publishers’ Program aims to align AI search with quality journalism, supporting digital publishing while ensuring high-quality content remains central to AI-powered information retrieval. (Perplexity)
Still want to know more about what matters in AI right now?
Read last week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng shared best practices for brainstorming, evaluating, and prioritizing great ideas for AI startups and products:
“In large companies, it can take a few weeks to go through a process to gather and prioritize ideas, but this pays off well in identifying valuable, concrete ideas to pursue. AI isn’t useful unless we find appropriate ways to apply it, and I hope these best practices will help you to generate great AI application ideas to work on.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: All about Meta's Llama 3.1 405B and OpenAI's SearchGPT, why publishers are restricting AI data access, and AgentInstruct, a framework for generating diverse synthetic data for LLM fine-tuning.