Last week, I wrote about the grand challenge of artificial general intelligence. Other scientific and engineering grand challenges inspire me as well. For example, fusion energy, extended lifespans, and space colonization have massive potential to remake civilization (for good or ill).
These grand challenges share a few traits:
- A solution would transform the way most people live, hopefully — but not necessarily — for the better.
- Brilliant engineers have been working toward these goals for decades. While they might be reached within our lifetimes, there’s no guarantee.
- They’re technically complex. Thus, it’s difficult for a layperson (and often even experts) to chart a path forward.
Despite their extreme uncertainty, such projects fill my mind with hopes and dreams. Fusion energy promises a safe, clean, unlimited source of electricity. The ability to harvest energy from the fusion of atoms could mitigate climate change and remake geopolitics by empowering all countries to become energy-independent.
Extended lifespans could enable people to accumulate greater wisdom. Of course, they could also concentrate wealth and power in the hands of the longest-lived individuals and create difficult demographic challenges. Purported longevity compounds like resveratrol have fallen short of their promise, but I’m excited by studies on the use of metformin and other compounds to lengthen lifespans.
Space colonization that carries robots and, someday, humans to distant planets, solar systems, and ultimately galaxies would extend the future course of human history beyond the duration of Earth and into a practically unlimited future. Spacefaring technology would lead humanity into uncharted realms much like homo sapiens’ departure from Africa led to a global civilization.
Like artificial general intelligence, these grand challenges have motivated their share of overhyped startups, scorn from skeptics, and tireless enthusiasm from believers. Yet I hope to see progress in all of them within my lifetime. (If we manage to extend lifetimes, that could be a very long time.)
The most exciting thing is that AI developers can play a role in achieving them!
- DeepMind recently used AI to control fusion reactions. More generally, AI is helping to design and simulate large-scale physical systems.
- AI is making inroads into many aspects of healthcare including drug discovery. These include scientific research as well as startups that focus on human longevity.
- Automated control has a longstanding role in space exploration. The latency of communication between Earth and distant planets makes it infeasible to control in real time, say, a vehicle on Mars using a joystick on Earth. Fun fact: Jagriti Agrawal, a founding team member of Kira Learning (disclosure: an AI Fund portfolio company), wrote software that runs on NASA’s Perseverance Mars rover.
AI is not panacea. But as a general-purpose technology, it can be applied to these grand challenges and others. Whenever I’m interested in a topic, be it climate change or quantum computing, my background in AI makes it easier to strike up a fruitful conversation with domain experts. All of us in AI have tools that could be useful to them.
AI for President
A deepfake of South Korea’s new president helped propel him into office.
What’s new: Yoon Suk-yeol, who won the country’s March 9 election, campaigned using videos that featured an AI-generated likeness of himself answering voters’ questions. No deception was involved; viewers were informed that they were watching a computer animation.
How it works: Seoul-based DeepBrain AI created Yoon’s avatar using 20 hours of audio and video of the candidate captured in front of a green screen, totaling around 3,000 spoken sentences, according to France24.
- Every day for two months prior to the election, Yoon’s campaign team selected a question and scripted an answer to be delivered by the avatar, which was dubbed AI Yoon.
- At first, AI Yoon delivered remarks about policy, but the scripts became more casual as AI Yoon told viewers about his Meyers-Briggs personality type and favorite karaoke songs. The avatar also lobbed insults at Yoon’s opponent Lee Jae-myung and the incumbent president.
- At first, Lee disparaged AI Yoon. Two weeks before the election, though, he deployed his own avatar. Unlike AI Yoon, Lee’s doppelganger was based on recordings of actual campaign appearances.
Behind the news: The first known political use of deepfakes occurred in 2020, when Indian politician Manoj Tiwari altered a campaign video to show himself delivering the same message in various local languages. The technology has also fueled political scandals. In 2019, a Malaysian government minister said a video that captured him engaging in extramarital sex was a deepfake. Earlier that year, speculation that a video of Gabon’s president, Ali Bongo, was a deepfake had spurred an attempted coup.
Why it matters: Yoon, who is known for his gruff, no-nonsense personality, created a digital double designed to resonate positively with the young voters who were deemed critical to his victory. While some critics dismissed the gambit, Yoon’s success suggests a bright future for campaign-sanctioned fakes tailored to appeal to particular groups.
We’re thinking: A politician used a deepfake to make himself seem more authentic! How’s that for irony?
Know When to Fold ’Em
Lose too much money at Texas hold ’em, and you may get an AI-generated warning.
What’s new: Casinos and gaming websites are using machine learning to flag gamblers who show signs of addictive behavior, The New York Times reported.
How it works: Gambling businesses risk losing their licenses if they facilitate ruinous behavior. Moreover, they make more money on gamblers who pace themselves than those who lose their shirts. Denmark-based Mindway AI mitigates these risks by flagging worrisome behavior on the part of their customers. The system is mainly employed by online betting platforms, including Flutter Entertainment and Entain, but brick-and-mortar casinos have adopted the system as well.
- The company trains a custom model for each client.
- As a baseline, psychologists who have expertise in compulsive gambling score a portion of the client’s existing customers according to 14 risk factors such as betting amounts, times of day spent playing, and bank withdrawals. They label each player according to three risk levels and train the model to match the labels.
- At inference, the system monitors each player’s behavior and generates a risk level.
- The casino or website can warn customers of the automated risk assessments at their discretion, potentially warning players of a worrisome trend in their behavior before they get into trouble. While Mindway CEO Rasmus Kjærgaard recommends that his clients deal with potential issues by phone, many of them send email or a pop-up notification.
Yes, but: Gambling addicts may not respond well to receiving automated messages telling them they have a problem, Brett Abarbanel, a gambling researcher at the University of Nevada Las Vegas, told The New York Times.
Behind the news: Face recognition also plays a role in identifying problem gamblers. For instance, casinos in Macau have used the technology to identify high rollers and offer them perks. The city’s gambling authority stated that these systems were used only for security.
Why it matters: As many as 10 million people suffer from compulsive gambling in the U.S. alone. Identifying problem gamblers helps combat the spiral of debt, substance abuse, and mental health issues that often follow. Of course, casinos benefit, too, if their patrons can remain solvent enough to keep pumping money back into the house.
We’re thinking: For decades, the gambling industry has used data science to help casino operators. It’s heartening to see it applying AI to help its customers.
A MESSAGE FROM DEEPLEARNING.AI
Learn how to generate images using generative adversarial networks (GANs)! The Generative Adversarial Networks Specialization makes it easy to understand everything from foundational concepts to advanced techniques. Enroll today
Barnyard Sentiment Analysis
Neural networks may help farmers make sure their animals are happy.
What’s new: Researchers led by Elodie Briefer and Ciara Sypherd at University of Copenhagen developed a system that interprets the moods behind a pig’s grunts and squeals.
How it works: The authors trained convolutional neural networks to classify porcine expressions using a database of 7,414 vocal sounds made by animals engaged in 19 situations like feeding, fighting, running, or being led to a slaughterhouse.
- Experts in animal behavior classified each call’s sentiment as positive or negative using the situations as guides. For example, noises recorded while an animal was feeding or being reunited with a familiar snout were labeled positive. Those recorded during a fight or in a slaughterhouse were labeled negative.
- The authors trained two ResNet-50s on spectrograms of the calls. One network classified calls as positive or negative while the other labeled the situation.
Results: The models achieved 91.5 percent accuracy classifying the sentiment of calls and 81.5 percent identifying the situation. A method that classified calls without machine learning achieved 61.7 percent and 19.5 percent respectively.
Behind the news: The noises an animal makes aren’t the only indication of its wellbeing, but they offer a window into its mental state.
- Earlier work used feed-forward and generalized regression neural networks to forecast feeding behavior and detect pneumonia in pigs.
- Researchers at several universities in South Korea developed a convolutional neural network that classified whether cows were hungry, in heat, or coughing based on their utterances.
- Such technology could help humans, too. Zoundream, a startup based in Basel and Barcelona, plans to market a translator that interprets infant cries as expressions of hunger, pain, gas, or needing a hug.
Why it matters: The authors plan to develop a tool that would monitor hogs’ behavior and anticipate their needs. Science has shown that animals are capable of complex emotions, prompting countries like Australia and the United Kingdom to pass laws that protect livestock welfare. Systems that evaluate animals’ emotional states could help farms stay in regulatory compliance and make better homes for the creatures in their care, as well as reassure consumers that their food was produced humanely.
We’re thinking: This work has awakened our interest in programming with EIEIO.
Who Needs Training?
When you’re training a neural network, it takes a lot of computation to optimize its weights using an iterative algorithm like stochastic gradient descent. Wouldn’t it be great to compute the best parameter values in one pass? A new method takes a substantial step in that direction.
What's new: Boris Knyazev and colleagues at Facebook developed Graph Hyper Network (GHN-2), a graph neural network that computed weights that enabled arbitrary neural network architectures to perform image recognition tasks. (A neural network that finds weights for another neural network is known as a hypernetwork.) GHN-2 improves on a similar hypernetwork, GHN-1, proposed by a different team.
Key insights: GHN-1 learned based on how well a given architecture using generated weights performed the task. GHN-2 improved its predecessor’s performance by drawing on insights from training conventional neural networks:
- A greater number of training examples per batch can improve trained performance.
- Connections between layers that are not adjacent can pass information within representations across successive layers without error.
- Normalization can moderate representations that grow too large or too small.
GNN basics: A graph neural network processes datasets in the form of a graph made up of nodes connected by edges (say, customers connected to products they’ve purchased or research papers connected to other papers they cite). During execution, it uses a vanilla neural network to update the representation of each node based on the representations of neighboring nodes.
How it works: GHN-2 consists of an embedding layer, a gated graph neural network, which uses a gated recurrent unit (a type of recurrent network layer) to update node representations, and a convolutional neural network. Its input is a neural network architecture in graph form, where each node represents a set of weights for an operation/layer such as convolution, pooling, or self-attention, and each edge is a connection from one operation/layer to the next. Its output is a set of weights for each operation/layer. The authors trained it to generate weights for classifying images in CIFAR-10 or ImageNet using a dataset of 1 million randomly generated neural network architectures composed of convolutional layers, pooling layers, self-attention layers, and so on.
- Given a batch of architectures and a batch of images, GHN-2 learned to generate weights for all architectures, applying what it learned in processing previous batches to the next. Then it used the images to test the resulting models.
- As it trained, it added connections between layers in a given architecture, analogous to skip connections in a ResNet. These connections allowed information to pass directly from earlier layers to later ones when updating the representation of each node, reducing the amount of information lost over successive updates. (They were discarded when running the architecture with the generated weights.)
- Having added temporary connections, it processed the architecture in three steps. (1) It created an embedding of each layer. (2) It passed the embeddings through the gated graph neural network that updated them in the order in which a typical neural network, rather than a graph neural network, would execute. (3) It passed the updated embeddings through a convolutional neural network to produce new weights for the input architectures.
- Prior work found that models produced by hypernetworks generate representations whose values tend to be either very high or very low. GHN-2 normalized, or rescaled, the weights to moderate this effect.
- Given a batch of network architectures and a set of images from CIFAR-10 or ImageNet during training, GHN-2 assigned weights in a way that minimized the difference between the networks’ predicted classes and the actual classes.
Results: Architectures similar to those in the training set generally performed better using parameter values generated by GHN-2 than GHN-1. So did architectures that were wider, deeper, or more dense than those in the training set. Parameter values generated by GHN-2 yielded average CIFAR-10 accuracy of 66.9 percent versus GHN-1’s 51.4 percent. While GHN-2 outperformed GHN-1 on ImageNet, neither model produced great parameter values for that task. For instance, architectures similar to those in the training set and outfitted with parameter values from GHN-2 produced an average top-5 accuracy of 27.2 percent compared to GHN-1’s 17.2 percent.
Why it matters: GHN-2 took only a fraction of a second to generate better-than-random parameter values, while training a ResNet-50 to convergence on ImageNet can take over one week on a 32GB Nvidia V100 GPU. (To be fair, after that week-plus of training, the ResNet-50’s accuracy can be 92.9 percent — a far better result.)
We're thinking: The authors also found that initializing a model with GHN-2 boosted its accuracy after fine-tuning with a small amount of data. How much additional time did the initialization save compared to conventional initialization and fine-tuning?