Benchmarks

15 Posts

Humanized Training for Robot Arms
Benchmarks

Humanized Training for Robot Arms: New Research Improves Robot Performance and Adaptability

Robots trained via reinforcement learning usually study videos of robots performing the task at hand. A new approach used videos of humans to pre-train robotic arms.
2 min read
Word Cloud
Benchmarks

Toward Next-Gen Language Models

A new benchmark aims to raise the bar for large language models.What’s new: Researchers at 132 institutions worldwide introduced the Beyond the Imitation Game benchmark (BIG-bench), which includes tasks that humans
2 min read
AI: A Progress Report
Benchmarks

AI: A Progress Report

A new study showcases AI’s growing importance worldwide.What’s new: The fifth annual AI Index from Stanford University’s Institute for Human-Centered AI documents rises in funding, regulation, and performance.
2 min read
Graph showing information about different transformer models
Benchmarks

Transformer Variants Head to Head

The transformer architecture has inspired a plethora of variations. Yet researchers have used a patchwork of metrics to evaluate their performance, making them hard to compare. New work aims to level the playing field.
2 min read
Data showing information related to AI strategy status in OECD countries
Benchmarks

Computation as a National Resource

How much processing power do various nations have on hand to drive their AI strategy? An international trade group aims to find out. The Organisation for Economic Co-operation and Development (OECD) is launching an effort to measure the computing capacity available in countries around the world.
1 min read
Animations depicting benchmarking, datasets and best practices
Benchmarks

Prosperity of the Commons

A new consortium of companies, schools, and research labs is building open tools for next-generation machine learning. MLCommons aims to foster innovation in machine learning by developing new benchmarks, datasets, and best practices.
1 min read
Screen captures of online platform Dynabench
Benchmarks

Dynamic Benchmarks

Benchmarks provide a scientific basis for evaluating model performance, but they don’t necessarily map well to human cognitive abilities. Facebook aims to close the gap through a dynamic benchmarking method that keeps humans in the loop.
2 min read
Bert (muppet) and information related to BERT (transformer-based machine learning technique)
Benchmarks

Do Muppets Have Common Sense?

Two years after it pointed a new direction for language models, Bert still hovers near the top of several natural language processing leaderboards. A new study considers whether Bert simply excels at tracking word order or or learns something closer to common sense.
2 min read
Graphs related to a comparison and evaluation of 14 different optimizers
Benchmarks

Optimizer Shootout

Everyone has a favorite optimization method, but it’s not always clear which one works best in a given situation. New research aims to establish a set of benchmarks. Researchers evaluated 14 popular optimizers using the Deep Optimization Benchmark Suite some of them introduced last year.
2 min read
Graphs and data related to AI chips
Benchmarks

Built for Speed

Chips specially designed for AI are becoming much faster at training neural networks, judging from recent trials. MLPerf, an organization that’s developing standards for hardware performance in machine learning tasks, released results from its third benchmark competition.
1 min read
Hamster running in a hamster ball
Benchmarks

Running Fast, Standing Still

Machine learning researchers report better and better results, but some of that progress may be illusory. Some models that appear to set a new state of the art haven’t been compared properly to their predecessors, Science News reports based on several published surveys.
1 min read
A chatbot called Meena and a graph comparing it with other chatbot services
Benchmarks

Toward Open-Domain Chatbots

Progress in language models is spawning a new breed of chatbots and, unlike their narrow-domain forebears, they have the gift of gab. Recent research tests the limits of conversational AI.
2 min read
Graph related to Language Model Analysis (LAMA)
Benchmarks

What Language Models Know

Watson set a high bar for language understanding in 2011, when it famously whipped human competitors in the televised trivia game show Jeopardy! IBM’s special-purpose AI required around $1 billion. Research suggests that today’s best language models can accomplish similar tasks right off the shelf.
2 min read
Arcade game
Benchmarks

Leveling the Playing Field

Deep reinforcement learning has given machines apparent hegemony in vintage Atari games, but their scores have been hard to compare — with one another or with human performance — because there are no rules governing what machines can and can’t do to win. Researchers aim to change that.
2 min read
Bert and Ernie from Sesame Street
Benchmarks

BERT Is Back

Less than a month after XLNet overtook BERT, the pole position in natural language understanding changed hands again. RoBERTa is an improved BERT pretraining recipe that beats its forbear, becoming the new state-of-the-art language model — for the moment.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox