Cobot’s Proxie robot tackles warehouse tasks VBench++, a new benchmark suite for AI video

Published

Nov 25, 2024

Reading time

3 min read

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

Anthropic and Amazon strengthen their ties
Windsurf blends copilots with agents in one IDE
Mistral introduces Pixtral Large to its APIs and chat platform
H’s first product launch is a business agent

But first:

Proxie, a new warehouse robot developed by Amazon alumni

Collaborative Robotics (aka Cobot), led by former Amazon executive Brad Porter, unveiled Proxie, a mobile robot designed to assist with cart-moving tasks in various facilities. The two-armed, four-wheeled robot is currently being tested by Maersk and Mayo Clinic, with other companies exploring its potential use. Cobot aims to develop increasingly capable robots that can work alongside humans, leveraging advancements in AI for more sophisticated manipulation and communication. (Cobot)

New benchmarks aim to standardize evals for video generation

Researchers developed VBench++, a series of tests that evaluate video generation quality across 16 dimensions, including subject identity consistency and motion smoothness. VBench++ aligns with human perception, provides insights into model strengths and weaknesses, and can evaluate both text-to-video and image-to-video generation. This open-source benchmark aims to drive progress in video generation by offering a standardized way to assess and compare model performance across various technical and trustworthiness aspects. (arXiv)

Amazon invests $4 billion in Anthropic, deepening partnership

Amazon invested an additional $4 billion in Anthropic, bringing its total investment to $8 billion and making AWS Anthropic’s primary cloud and training partner. Anthropic will collaborate closely with AWS on developing Trainium accelerators, optimizing machine learning hardware, and advancing the chips’ training capabilities. This partnership will also give AWS customers early access to fine-tuning Anthropic’s models with their own data. Anthropic gains access to funding to continue its research and development, and Amazon has the opportunity to show its chips can rival Nvidia’s for high-end training and inference. (Amazon and Anthropic)

New software development tool integrates copilots and agents

Codeium launched a new integrated development environment (IDE) called Windsurf, featuring an AI system called Cascade. Windsurf combines collaborative and independent AI capabilities, aiming to improve upon software developers’ use of copilot and agent technologies. Cascade integrates codebase analysis, advanced code search tools, and human action tracking to facilitate AI-human collaboration during the coding process. The company claims their system offers better performance and integration compared to similar tools, particularly when working with existing codebases. (Codeium)

Mistral AI unveils powerful multimodal model and enhanced platform

Mistral AI announced Pixtral Large, a 124-billion-parameter text and image model that outperforms leading competitors on benchmarks like MathVista, DocVQA, and VQAv2. The company integrated Pixtral Large into its Le Chat platform, which now offers features such as real-time coding, PDF analysis, image generation, web search, and the ability to create task-specific agents. These updates establish Mistral AI as a noteworthy player in the multimodal AI market, showcasing competitive capabilities in visual understanding and mathematical reasoning tasks compared to established models like GPT-4 and Gemini. (Mistral)

H unveils Runner H, its first AI product for business automation

H, a Paris startup founded by Google alumni, announced Runner H, an agentic AI for business tasks like quality assurance and process automation. The product is built on H’s proprietary 2 billion parameter language model and will be available through APIs, with initial free access and a paid model later. This launch marks H’s first product release after a tumultuous period following its $220 million seed round and the departure of three co-founders. (H Company)

Still want to know more about what matters in AI right now?

Read last week’s issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng explored an emerging trend of writing text to be read specifically by AI models, discussing how it parallels SEO and how incentives might drive authors to create content tailored for LLM consumption.

“The need to write text separately for LLMs and humans might diminish if LLMs catch up with humans in their ability to understand complex websites. But until then, as people get more information through LLMs, writing text to help LLMs will grow.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Next-gen models show limited gains as AI giants rethink their training strategies amidst the breakdown of scaling laws; AI creates an interactive Minecraft-like world in real time, eliminating the need for a game engine; TSMC halts advanced chip production for Chinese companies following new U.S. orders, escalating chip restrictions; and researchers achieve a 20 percent reduction in transformer training costs with minimal performance loss, paving the way for more efficient AI development.

Subscribe to Data Points