Dear friends,

I tested positive for Covid on Monday and mentioned this on social media. I’m grateful to the many people who wished me well. Reading your messages made me feel better. My unscientific feeling is that they helped clear my nose a bit!

I was surprised by a tiny minority who responded to my catching Covid with vitriol. For example, one person posited that I’d been waiting to catch Covid so I could announce it for a social media “win.”

Next Monday will be February 14, and I wish you an early Happy Valentine’s Day. Reading over the social media replies reminded me of the importance of spreading love, not hate.

I feel blessed to have received support from many people this week and over the years, so a little vitriol doesn’t bother me. But the potential impact of hateful speech on others worries me. Insults, put-downs, and unwarranted criticism diminish people and discourage them from living up to their highest potential.

If you’re ever feeling put upon, know that I’m on your side. I know you’re doing your best and don’t need unnecessary flack.

People need love. So, as we approach Valentine’s Day, I hope you’ll tell people in your life that you love them. You care for them. You wish them well. Let’s do this on February 14 and every day of the year.

With love and affection,


DeepLearning.AI Exclusive

Working AI: Silent Power

Finnish entrepreneur Kai Saksela, a structural engineer by training, studied deep learning. Now he's using neural networks to recognize sounds that signal danger in electrical equipment. Read more


Competitive Coder

Programming is hard. Programming competitions are harder. Yet transformers proved themselves up to the task.

What’s new: Yujia Li, David Choi, Junyoung Chung, and a team at DeepMind built AlphaCode, a system that beat roughly half of competitors in coding contests where many examples of program inputs and outputs were available.

Key insight: Previous work showed that transformers can generate code, though their output doesn’t always solve the task at hand. But transformers can generate millions of possible solutions to the same problem instantly, and the solutions can be filtered by checking their performance automatically. Those that remain should solve the problem.

How it works: The authors trained a transformer to generate programs based on problems from a dataset they built containing 13,000 challenges mainly from Codeforces, a platform that hosts coding contests. Each problem included hundreds of solution programs (incorrect as well as correct) along with roughly 100 examples of test cases (expected inputs and outputs) mostly created by the authors.

  • The authors pretrained a transformer on 86 million programs in 12 programming languages. Given the first part of a program, the transformer learned to generate the next part.
  • They fine-tuned the model to generate each program in their challenge dataset based on the difficulty, problem description, programming language, suggested techniques that might solve the problem, and whether the solution was correct. They used the GOLD loss function, which gave the model more encouragement to be confident in its predictions when it had some confidence, and less encouragement to be confident when it had little confidence. In this way, the model increased its chance of generating, over many tries, at least one correct program.
  • They fine-tuned a second transformer to generate test-case inputs given a problem description.
  • At inference, they randomly sampled a difficulty and suggested techniques, and they told the first transformer to generate a correct solution. They repeated this 1 million times and filtered out programs that failed to solve all test cases. This left thousands of programs.
  • To filter the programs further, they used the second transformer to generate 50 test-case inputs and ran the remaining programs on those 50 inputs. Then they clustered programs that produced the same outputs and randomly picked one from each of the 10 largest clusters. This procedure yielded 10 diverse programs to be entered into a contest.

Results: The authors used AlphaCode in 10 simulated Codeforces competitions, allowing it two hours to generate solutions for each. Ranking its performance among 5,000 Codeforces competitors, it averaged in the 54th percentile (lower is better). It correctly solved 34.2 percent of problems in the validation set.

Why it matters: AlphaCode generated 1 million possible solutions and culled the bad ones to solve problems it had never seen before and beat a substantial portion of competitive human programmers. It goes to show that there are still benefits to be gained from scaling up.

We’re thinking: AlphaCode is an impressive demonstration of high-throughput code generation and testing. That said, considering its performance on the validation set, there’s still a distance to go.

New Supercomputer on the Block

Facebook’s parent company is staking its future on a new compute cluster.
What’s new: Meta unveiled AI Research SuperCluster (RSC), which is designed to accelerate training of large models for applications like computer vision, natural language processing, and speech recognition.

How it works: The company began building RSC in 2020, aiming for a system capable of training trillion-parameter models and processing up to an exabyte (1 billion gigabytes) of data. It currently incorporates 6,080 Nvidia A100s, the chip vendor’s flagship graphics processing unit (GPU).

  • Compared to its unnamed predecessor, RSC can perform computer vision tasks up to 20 times faster and train large-scale natural language models three times faster. Meta plans to add 9,920 more GPUs this year to further accelerate training across the board.
  • Facebook highlighted the system’s data-protection features. Its previous research infrastructure used only publicly available data to avoid compromising user privacy. RSC is designed to process user data while maintaining privacy or security. The data it uses undergoes a privacy review process before processing and remains encrypted prior to training, and the storage infrastructure keeps the data separate from the wider network.
  • The ability to tap internal data is expected to supercharge development of multimodal AI and home robots.

Behind the news: RSC’s emphasis on data protection has a backstory. French regulators recently fined the company $238 million for failing to allow users to disable tracking software. In September, Ireland charged Facebook’s WhatsApp messaging service nearly $270 million for lack of transparency around how it uses the user data it collects. Those actions came after the U.S. Federal Trade Commission responded to violations of user privacy by imposing a historic $5 billion penalty as well as restrictions on the company’s structure and operations.

Why it matters: Specialized in-house processing capacity is a strategic asset in the era of cloud computing. RSC is essential to Meta’s aspiration to build an immense virtual reality community it calls the metaverse. Microsoft and Nvidia likewise have built their own bespoke infrastructure.

We’re thinking: Less than a decade ago, the cutting-edge AI supercomputer was a $100,000 cluster (that Andrew Ng worked on). How much bigger — and, unfortunately, less accessible — these systems have become!


Join us on February 16, 2022, at 10 a.m. Pacific Time for a live session with Sadie St. Lawrence, founder and CEO of Women in Data. Learn which skills you need for a career in machine learning and artificial intelligence.

The Limits of Pretraining

The higher the accuracy of a pretrained model, the better its performance  after fine-tuning, right? Not necessarily.

What’s new: Samira Abnar and colleagues at Google Research conducted a meta-analysis of image-recognition experiments and performed some of their own. They analyzed the relationship between model performance after pretraining and after fine-tuning in a variety of tasks.

Key insight: To find out whether higher pretrained accuracy always leads to higher fine-tuned accuracy, it would be necessary to run thousands of experiments while varying hyperparameter values systematically for each task. A simpler way is to extrapolate the relationship from the results of existing experiments.

How it works: The authors re-examined 4,800 experiments performed on diverse architectures: Vision Transformers, MLP-Mixers, and ResNets. The models had been pretrained to classify labeled images in JFT or ImageNet 21K. They were tested on 25 tasks, including classifying objects, classifying the orientation of objects, and diagnosing diabetic retinopathy, after fine-tuning via few-shot learning or transfer learning. In few-shot learning, the last layer was replaced and trained on 25 examples. In transfer learning, the whole network was fine-tuned on 1,000 examples.

  • For each model and fine-tuned task, the authors plotted pretrained accuracy on the horizontal axis and fine-tuned accuracy on the vertical axis. The resulting swaths of clustered dots generally rose nonlinearly until they reached a plateau.
  • The authors calculated a curve to match the best results in each task. Then they extended that line to extrapolate fine-tuned accuracy if pretrained accuracy were 100 percent.
  • In their own experiments, they varied the size of the pretraining set (JFT), number of parameters in the model (Vision Transformer), and number of epochs in pretraining. Then they repeated the steps above.

Results: Higher pretrained accuracy generally yielded higher fine-tuned accuracy — but it reached a point of diminishing returns. In some cases, higher pretrained accuracy yielded worse fine-tuned accuracy. Moreover, pretrained models of equal accuracy didn’t necessarily perform equally well on different fine-tuned tasks. The authors’ own experiments matched the curves they derived from earlier work, leading them to conclude that dataset size, number of parameters in a model, and length of training don’t significantly influence the relationship between pretrained and fine-tuned accuracy.

Why it matters: More pretraining doesn’t necessarily result in a better fine-tuned model.

We’re thinking: One limiting factor in the value of pretraining accuracy may be the relevance of the pretrained task to the fine-tuned task. No matter how well a model classifies ImageNet, it may not easily learn how to diagnose medical images. A rigorous framework for managing the tradeoff between pretraining and fine-tuning would be useful.

No Hardhat Required

Workers can operate a forklift in their pajamas and never leave their bedrooms, thanks to a new generation of AI-assisted robots.

What’s new: Companies are pairing semi-autonomous vehicles with remote human operators to execute tasks that the vehicles can’t handle on their own, Wired reported.

Distance driving: Robots that use AI to navigate around warehouses or perform manual labor can encounter situations that weren’t well represented in their training data. When that happens, a remote operator can step in — sometimes from a continent away.

  • Phantom Auto employs people to operate forklifts from consoles equipped with a steering wheel, pedals, and screens that display a vehicle’s front, behind, and side views. If an operator’s connection lags or fails, a machine learning algorithm kicks in to navigate the machine and bring it to a safe stop.
  • The logistics company ArcBest modified Phantom Auto forklifts to maneuver equipment around its warehouses autonomously, looping in human operators only for complex tasks like stacking pallets and unloading trucks.
  • Pod, an autonomous electric truck from Swedish firm Einride, autonomously moves cargo within manufacturing facilities. AI coordinates the movements of multiple vehicles, but remote human operators stand by to take the wheel if a vehicle gets into trouble.
  • Online retailer Ocado uses autonomous robot arms to grasp items from storage bins. If a particular item, such as an unfamiliar product, confuses the system, human operators based in Mexico and the Philippines guide the arm through the task.

Why it matters: The pandemic has left 597,000 logistics jobs unfilled in the U.S. alone, according to the National Bureau of Economic Research. A combination of automation and remote human intervention might cover the gap, provide more flexible working conditions, and smooth wrinkles in global supply chains.

We’re thinking: Even as remote-controlled, semi-autonomous technology creates new jobs, it’s bound to eliminate others. Companies need to devote ample resources to retraining and upskilling workers whose jobs are at risk.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox