Many things in life have a positive side and a negative side. For instance, a new AI system might help democratize access, and at the same time it might be more accessible to people who have internet access than those who don’t. Thus, it could be either praised for helping people or criticized for not helping enough. These days, a determined critic or politician can point to almost anything, good or bad, and find cause to celebrate or denigrate it depending on their agenda.
We know from studies of social media that posts that arouse anger are more likely to reach a large audience than those that encourage feelings of contentment. This means that whenever an event occurs — even a good one — naysayers have a larger megaphone than supporters. (This isn’t altogether new. Juicy gossip has always traveled faster than mundane truth.) For example, fear mongering about artificial general intelligence seems to be a persistent meme even though AI’s benefits vastly outweigh its harms.
What can we do about this? I’d like to see us do more to support each other. If an uncivil critic has a larger megaphone than we do, we can respond together with a public show of support. When I tweet about some topics — support for Ukraine against Russian aggression, for instance —I find that an occasional hostile response can make me pull back. But I try to ignore the hostility and continue to support the causes that I believe in.
The psychologist John Gottman says that successful relationships have a ratio of five positive interactions to one negative interaction. I don't know whether a ratio like this applies to communities, but I would love to hear members of the AI community cheering for each other most of the time — even if, a smaller fraction of the time, we also need to discuss and fix problems that deserve sharp criticism.
Over the past couple of years, I’ve seen members of the AI community express a lot of support for one another, but I’ve also noticed a growing tendency to criticize, especially on Twitter. To be clear, AI has many problems including bias, fairness, and harmful use cases, and we need to discuss and fix them. But if the AI community is to keep growing — which I hope we will — we need to invite others into an environment of mutual support and respect.
I had dinner with a few AI friends last weekend. Rod Brooks, Kai-Fu Lee, Tom Mitchell, and I reminisced about the early days of AI, when everyone knew each other and we often supported each other in the ambitious research directions that many were pursuing. The community continued to welcome newcomers for decades, which allowed us to grow and make a lot of progress.
In that spirit, I hope we’ll put more energy into strengthening our community and focus our critical impulses on the most pressing issues. Let’s give each other the love, respect, and support that will keep the field growing for a long time to come.
Toward Next-Gen Language Models
A new benchmark aims to raise the bar for large language models.
What’s new: Researchers at 132 institutions worldwide introduced the Beyond the Imitation Game benchmark (BIG-bench), which includes tasks that humans perform well but current state-of-the-art models don’t.
How it works: The authors selected over 200 tasks based on 10 criteria such as being sensible to humans, not solved by current language models, and “not solvable by memorizing the internet.” Many involve atypical problems such as identifying a single move that will win a game of chess, guessing a movie title from a series of emojis, and playing a role in a mock courtroom trial.
- The tasks are zero- or few-shot, meaning that a model is given a small number of example prompt-and-response pairs and expected to respond to a novel prompt. (In this way, BIG-bench is used to test models, not to fine-tune them.)
- The authors ran the tasks on various sizes of OpenAI’s GPT-3, Google’s PaLM, and dense and sparse varieties of Google’s BIG-G (based on LaMDA).
- They also posed the tasks to a team of humans, who were allowed to search the internet as they performed the tasks.
Results: No model, regardless of size, outperformed the best-performing human on any task. However, for some tasks, the best-performing model beat the average human. For example, answering multiple-choice questions about Hindu mythology, the best model scored around 76 percent, the average human scored roughly 61 percent, and the best human scored 100 percent (random chance was 25 percent). Generally, larger models performed better than smaller ones. For example, BIG-G’s average accuracy on three-shot, multiple-choice tasks was nearly 33 percent with a few million parameters but around 42 percent with over a hundred billion parameters.
Why it matters: BIG-bench’s creators argue that benchmarks like SuperGLUE, SQuAD2.0, and GSM8K focus on narrow skills. Yet the latest language models, after pretraining on huge datasets scraped from the internet, show unexpected abilities such as solving simple arithmetic problems. BIG-bench’s diverse, few-shot tasks give researchers new ways to track such emergent capabilities as models, data, and training methods evolve.
We’re thinking: Devising tasks that can’t be solved by memorizing the internet may push researchers to develop algorithms — including ones that enable complex forms of reasoning — that generalize well even with limited amounts of training data.
Wind in the Forecast
Machine learning is making wind power more predictable.
What’s new: Engie SA, a multinational energy utility based in France, is the first customer for an AI-powered tool from Google that predicts the energy output of wind farms, Bloomberg reported. The company plans to deploy the system on 13 wind farms in Germany.
How it works: Google’s DeepMind subsidiary trained a neural network to predict energy output from wind farms up to 36 hours ahead of time. The training data included historical weather forecasts and unspecified data from wind turbines.
- Engie will use the system to predict how much energy will be available to sell to electricity providers in coming days.
- Accurate predictions should also enable Engie to reduce its use of fossil fuels and nuclear power. If the system predicts low wind-power output, the company can plan to bring other energy sources online.
- In a 2019 blog post, Google reported that the increased reliability afforded by its algorithm would add 20 percent to the value of wind energy.
Behind the news: Google isn’t the only firm employing machine learning to squeeze more electricity out of renewable resources.
- Microsoft recently partnered with Danish wind turbine manufacturer Vesta Wind Systems to develop a reinforcement-learning system that helps keep turbines pointed in the optimal direction.
- Israeli startup Xfloat built a system that keeps floating solar panels facing the sun as it moves across the sky.
Why it matters: Wind and solar power are notoriously uncertain, leading utilities to default to fossil fuels, which are available on-demand. Predicting wind-energy yields can reduce some of that uncertainty, helping utilities benefit from advantages such as renewables’ lower overhead and easing dependence on fossil-fuel and nuclear sources.
We’re thinking: Stopping climate change isn’t the only motivation to cut dependence on fossil fuels. The conflict in Ukraine has contributed to a global shortage of oil and gas, causing energy prices to spike. Alternative sources can help make the global economy less reliant on oil producers and more resilient to disruptions in supply.
A MESSAGE FROM DEEPLEARNING.AI
The DeepLearning.AI community continues to grow, thanks to Pie & AI ambassadors like Emilio Soria-Olivas of Valencia, Spain. We’re thrilled to share his accomplishments. Sign up to become a Pie & AI ambassador and learn how you could be featured as well!
People who suffer from gastrointestinal conditions such as irritable bowel syndrome are number two when it comes to describing the characteristics of their own poop.
What’s new: The smartphone app Dieta helps patients to keep gastrointestinal illnesses in check by tracking their own behaviors and symptoms. It includes a computer vision model that recognizes medically salient characteristics of excrement as accurately as doctors and better than most patients, a recent study found.
How it works: The app enables patients to log symptoms such as nausea, constipation, and abdominal pain; behaviors like exercise, sleep, and meals; treatments including medications, supplements, and diet; and feelings of illness or wellbeing. It also helps patients experiment on themselves, recommending lifestyle changes and treatments and enabling patients to forward the results to caregivers. A computer vision model classifies feces according to characteristics that are useful in diagnosis.
- Patients use the app to take a picture of their stool. The model classifies the excreta in five aspects: size, consistency, fragmentation, indistinct edges, and type according to the Bristol Stool Scale.
- To train the model, the developers collected and classified 68,000 photos submitted by users including the startup’s founder.
- A clinical version lets patients chat with caregivers and provides a location tracker that flags unplanned bathroom visits (for instance, pulling off a freeway to attend to an urgent matter).
Behind the news: Machine learning engineers have trained other models to peer into the toilet.
- Moxie, a smartphone app that debuted in 2020, similarly classifies poop according to the Bristol Stool Scale. A 2020 review by Wired found that it mistook a photo of the reviewer’s face for a bowel movement.
- In 2020, researchers from Duke and Stanford developed the Precision Health Toilet. The device uses a suite of sensors to evaluate waste for factors like consistency and blood content (a risk factor for cancer and other ailments).
Why it matters: Roughly 40 percent of adults worldwide may suffer from gastrointestinal conditions, according to a 2021 study. Tracking bowel movements helps to diagnose these conditions earlier and more accurately.
We’re thinking: We’re grateful that someone — other than us — builds models that classify the Bristol Stool Scale.
Pile on the Layers!
Adding layers to a neural network puts the “deep” in deep learning, but it also increases the chance that the network will get stuck during training. A new approach effectively trains transformers with an order of magnitude more layers than previous methods.
What’s new: A team at Microsoft led by Hongyu Wang and Shuming Ma developed DeepNorm, a normalization function that enables transformers to accommodate up to 1,000 layers. (Their models, dubbed DeepNet, topped out at 3.8 billion parameters.)
Key insight: When training a transformer, layer normalization often is used to scale layer inputs, promoting faster learning. The magnitude of a layer normalization’s input is inversely proportional to the total change in the parameter values of all previous layers in a training step. The authors found that the greater the number of layers, the higher the likelihood of a very large update. This results in larger inputs to layer normalization, so earlier layers receive smaller and smaller updates until parameter values stop changing and performance stops improving. (This issue is related to the familiar vanishing gradient problem, but its cause is different. In the familiar scenario, gradients from later layers diminish as they backpropagate through the network. In this case, the combination of layer normalization and unusually large updates results in significantly smaller gradients.) Limiting the total change in parameter values would prevent large updates, which should enable deeper networks to continue training without getting stuck.
How it works: The authors trained a transformer, applying DeepNorm to the residual connections in each attention and feed-forward layer.
- To avoid large parameter updates, DeepNorm scaled up each residual connection’s computation by an author-derived constant. Mathematically, residual connections usually output x+f(x), where f(x) is the function computed by the previous layer. DeepNorm changes them to output a*x+f(x).
- Given the output of the residual connections, DeepNorm applied layer normalization.
- DeepNorm also scaled down the initial parameter values to avoid large updates in early training.
Results: The authors evaluated DeepNets of various depths on tasks that involve translating text between English and over 100 other languages. The DeepNets outperformed all competitors of equal depth, between 36 and 1,000 layers, as well as some with an order of magnitude fewer layers (and an order of magnitude more parameters). For instance, translating English into German and back, a 200-layer DeepNet achieved 28.9 BLEU, while a 200-layer dynamic linear combination of layers (a state-of-the-art transformer variant) achieved 27.5 BLEU. Seven other 200-layer models, including a transformer without the authors’ modifications, diverged during training. On the [OPUS-100]https://opus.nlpl.eu/opus-100.php multilingual dataset, a DeepNet with 200 layers and 3.2 billion parameters achieved 23.0 BLEU, while M2M-100 (a transformer variant with 48 layers and 12 billion parameters) achieved 18.4 BLEU.
Why it matters: Scaling up neural networks has driven a lot of improvement over the past decade. This work points a way toward even deeper models.
We’re thinking: DeepNets are deep and narrow, making previous models look shallow and wide by comparison. Since training ginormous (1,000 layer, super-wide) models is very expensive, we’d do well to find the ideal tradeoff between deep and narrow versus shallow and wide.