DeepNet Graph Layers vs years
Machine Learning Research

Pile on the Layers!: DeepNorm Allows Transformers to Accommodate More Layers

Adding layers to a neural network puts the “deep” in deep learning, but it also increases the chance that the network will get stuck during training. A new approach effectively trains transformers with an order of magnitude more layers than previous methods.
Animation showing optimizing a physical design
Machine Learning Research

Airfoils Automatically Optimized: DeepMind AI Research Simulates Fluid Dynamics

Engineers who design aircraft, aqueducts, and other objects that interact with air and water use numerical simulations to test potential shapes, but they rely on trial and error to improve their designs. A neural simulator can optimize the shape itself.
Variational Neural Cellular Automata (VNCA) overview
Machine Learning Research

Tech Imitates Life, Life Imitates Art: Image Generation Technique Works Pixel By Pixel

The computational systems known as cellular automata reproduce patterns of pixels by iteratively applying simple rules based loosely on the behavior of biological cells. New work extends their utility from reproducing images to generating new ones.
The framework of Virtual Outlier Synthesis (VOS)
Machine Learning Research

Right-Sizing Confidence: Object Detector Lowers Confidence for Unfamiliar Inputs

An object detector trained exclusively on urban images might mistake a moose for a pedestrian and express high confidence in its poor judgment. New work enables object detectors, and potentially other neural networks, to lower their confidence when they encounter unfamiliar inputs.
Driver one passing driver who has no gas
Machine Learning Research

Linear Regression: Straight & Narrow - Linear Regression for Machine Learning

Linear regression may be the key statistical method in machine learning, but it didn’t get to be that way without a fight. Two eminent mathematicians claimed credit for it, and 200 years later the matter remains unresolved.
Architecture of CXV
Machine Learning Research

Upgrade for Vision Transformers: Improved Efficiency for Vision Transformers

Vision Transformer and models like it use a lot of computation and memory when processing images. New work modifies these architectures to run more efficiently while adopting helpful properties from convolutions.
Didactic diagram of a hypothetical embedded-model architecture
Machine Learning Research

Image Generation + Probabilities: New Method Boosts Performance for Normalizing Flow

If you want to both synthesize data and find the probability of any given example — say, generate images of manufacturing defects to train a defect detector and identify the highest-probability defects — you may use the architecture known as a normalizing flow.
Shifted Patch Tokenization (SPT) | Locality Self-Attention (LSA)
Machine Learning Research

Less Data for Vision Transformers: Boosting Vision Transformer Performance with Less Data

Vision Transformer (ViT) outperformed convolutional neural networks in image classification, but it required more training data. New work enabled ViT and its variants to outperform other architectures with less training data.
GLaM model architecture
Machine Learning Research

Efficiency Experts: Mixture of Experts Makes Language Models More Efficient

The emerging generation of trillion-parameter language models take significant computation to train. Activating only a portion of the network at a time can cut the requirement dramatically and still achieve exceptional results.
AI generated images with different descriptions
Machine Learning Research

More Realistic Pictures From Text: How the Glide Diffusion Model Generates Images from Text

OpenAI’s DALL·E got an upgrade that takes in text descriptions and produces images in styles from hand-drawn to photorealistic. The new version is a rewrite from the ground up. It uses the earlier CLIP zero-shot image classifier to represent text descriptions.
Jurassic-X's software infrastructure
Machine Learning Research

Neural Nets + Rules = Truer Text: Jurassic-X NLP Can Solve Math, Check Facts, and More

A new approach aims to cure text generators of their tendency to produce nonsense. AI21 Labs launched Jurassic-X, a natural language processing system that combines neural networks and rule-based programs.
Deep Symbolic Regression
Machine Learning Research

From Sequences to Symbols: Transformers Extend AI's Mathematical Capabilities

Given a sequence of numbers, neural networks have proven adept at discovering a mathematical expression that generates it. New work uses transformers to extend that success to a further class of expressions.
Grokking: A dramatic example of generalization far after overfitting on an algorithmic dataset
Machine Learning Research

Learning After Overfitting: Transformers Continue Learning After Overfitting Data

When a model trains too much, it can overfit, or memorize, the training data, which reduces its ability to analyze similar-but-different inputs. But what if training continues? New work found that overfitting isn’t the end of the line.
Stock Market Simulation using cGANs
Machine Learning Research

Stock-Trading Test Bed: AI system simulates stock market performance.

If you buy or sell stocks, it’s handy to test your strategy before you put real money at risk. Researchers devised a fresh approach to simulating market behavior.
Overview of Graph Hyper Network (GHN-2)
Machine Learning Research

Who Needs Training? Graph neural network selects optimal weights for image tasks.

When you’re training a neural network, it takes a lot of computation to optimize its weights using an iterative algorithm like stochastic gradient descent. Wouldn’t it be great to compute the best parameter values in one pass? A new method takes a substantial step in that direction.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox