Different graphs showing switch transformer data

Bigger, Faster Transformers: Increasing parameters without slowing down transformers

Performance in language tasks rises with the size of the model — yet, as a model’s parameter count rises, so does the time it takes to render output. New work pumps up the number of parameters without slowing down the network.
Series of images showing improvements in a multilingual language translator

Better Zero-Shot Translations: A method for improving transformer NLP translation

Train a multilingual language translator to translate between Spanish and English and between English and German, and it may be able to translate directly between Spanish and German as well. New work proposes a simple path to better machine translation between languages.
Graphs and data related to visualized tokens (or vokens)

Better Language Through Vision: Study improved Bert performance using visual tokens.

For children, associating a word with a picture that illustrates it helps them learn the word’s meaning. Research aims to do something similar for machine learning models. Researchers improved a BERT model’s performance on some language tasks by training it on a large dataset of image-word pairs.
GPT-Neo related animation

Language Models Want to Be Free: How EleutherAI is developing a GPT-3 clone.

A grassroots research collective aims to make a GPT-3 clone that’s available to everyone. EleutherAI, a loose-knit group of independent researchers, is developing GPT-Neo, an open source, free-to-use version of OpenAI’s gargantuan language model.
Data and graphs related to a new model capable of detecting tremors

Quake Watch: AI model detects earthquakes and estimates epicenters.

Detecting earthquakes is an important step toward warning surrounding communities that damaging seismic waves may be headed their way. A new model detects tremors and provides clues to their epicenter.
Data related to adversarial learning

Adversarial Helper: Adversarial learning can improve vision and NLP.

Models that learn relationships between images and words are gaining a higher profile. New research shows that adversarial learning, usually a way to make models robust to deliberately misleading inputs, can boost vision-and-language performance.
AI-generated images with the model DALL-E

Tell Me a Picture: OpenAI's two new multimodal AI models, CLIP and DALL·E

Two new models show a surprisingly sharp sense of the relationship between words and images. OpenAI, the for-profit research lab, announced a pair of models that have produced impressive results in multimodal learning: DALL·E.
Ilya Sutskever

Ilya Sutskever: OpenAI’s co-founder on building multimodal AI models

The past year was the first in which general-purpose models became economically useful. GPT-3, in particular, demonstrated that large language models have surprising linguistic competence and the ability to perform a wide variety of useful tasks.
Matthew Mattina

Matthew Mattina: Arm research leader explains how TinyML is bringing AI to phones and other edge devices

Look at the tip of a standard #2 pencil. Now, imagine performing over one trillion multiplication operations in the area of that pencil tip every second. This can be accomplished using today’s 7nm semiconductor technology.
Data showing how new pretrained language models might learn facts like weight and cost

The Measure of a Muppet: How NLP models learn attributes of pretrained embeddings.

The latest pretrained language models have shown a remarkable ability to learn facts. A new study drills down on issues of scale, showing that such models might learn the approximate weight of a dog or cost of an apple, at least to the right order of magnitude.
Examples of contrastive learning

Learning From Words and Pictures: A deep learning method for medical x-rays with text

It’s expensive to pay doctors to label medical images, and the relative scarcity of high-quality training examples can make it hard for neural networks to learn features that make for accurate diagnoses.
Data related to Nvidia's Pay Attention When Required (Par) approach

Selective Attention: More efficient NLP training without sacrificing performance

Large transformer networks work wonders with natural language, but they require enormous amounts of computation. New research slashes processor cycles without compromising performance.
Proof Search Tree

The Proof Is in the Network: A transformer model that generates mathematical proofs

OpenAI’s Generative Pre-Trained Transformer (GPT) architecture has created coherent essays, images, and code. Now it generates mathematical proofs as well.
AI medical chatbot having a conversation with a patient

GPT-3 Is No MD: GPT-3 lacks medical problem solving skills.

The world’s most sophisticated language model won’t replace your doctor anytime soon. Researchers at Nabla, an AI-enabled healthcare platform, found that GPT-3 lacks the logical reasoning skills to be a useful medical chatbot.
Illustration of two witches with half a pumpkin each and the moon behind them

The AI Community Splinters: Could geopolitics drive a wedge in the AI community?

Will international rivalries fragment international cooperation in machine learning? Countries competing for AI dominance will lash out at competitors.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox