On Monday, Landing AI (where I’m CEO) announced the close of a $57 million Series A funding round. The investment enables the company to continue building its data-centric MLOps platform for computer vision, with a focus on manufacturing visual inspection.
Studies estimate that AI will create trillions of dollars of value, and machine learning already has changed the trajectory of consumer-internet companies like Google and Facebook. Yet the technology has barely penetrated most other industries. Making AI work in more traditional industries will require a different recipe than internet companies use. I explained why this week at Fortune’s Brainstorm A.I. event, pictured below.
Datasets are much smaller. I once built a face recognition system using about 350 million images. But when I asked people in the manufacturing industry how many images they had of each defect they wanted to recognize, 50 or fewer was the most common answer. Techniques developed for learning from hundreds of millions of examples will struggle to work with only 50. But the situation improves if you choose those examples well. Data-centric AI tools can help you get there.
Applications are more diverse. If we took all current and potential machine learning projects and sorted them in decreasing order of value, we might find that the “head” of the distribution comprises applications like a large company’s web search engine, online ad system, or product recommendation engine. This is followed by a “long tail” of applications that have lower value individually but massive value in aggregate. As a community, we’ve figured out how to organize dozens or hundreds of engineers to build these large applications, some of which can generate over $1 billion of value. But this recipe doesn’t work for other industries where applications are more heterogeneous and where each of 10,000 machine learning models generates $1 million to $5 million each.
For example, in manufacturing, each plant makes a different product, and thus will need a different trained model to detect defects. In healthcare, every hospital codes its electronic health records (EHR) differently. Rather than a single monolithic model to read every hospital’s EHR, each hospital needs a system trained on its own data. The total value of these applications is enormous. But how can any company help build, deploy and maintain 10,000 custom models without hiring 10,000 machine learning engineers?
This “long tail” problem helps to explain why many proof-of-concept implementations and demos don’t make it into production. While a team of engineers can build a one-off application, we still need better tools to make this type of work scalable and economically viable.
Landing AI is building tools to make it fast and easy for manufacturers to engineer the data so as to train, deploy, and maintain their own computer vision systems. This design pattern addresses the widespread problems of small datasets and diverse applications. If you’re working in a sector other than manufacturing, consider if your sector has a long tail of applications and if building an MLOps platform to let customers do their own customization — as Landing AI is doing in manufacturing — might advance machine learning in your industry.
DeepMind Doubles Down on AlphaFold
The Google sister company devoted to artificial general intelligence parlayed its technology into a biomedical spin-off.
What’s new: DeepMind launched a startup called Isomorphic. The new company aims to build its business on AlphaFold 2, an ensemble of neural networks that finds the shapes of protein molecules, which determine their biological function. The company is hiring experts in AI, biology, medicinal chemistry, biophysics, and engineering.
How it works: Like DeepMind, Isomorphic is a subsidiary of Google’s parent company Alphabet. DeepMind CEO Demis Hassabis also leads the London-based spin-off.
- Isomorphic will build predictive models to investigate the medical potential of proteins, the interactions between them, and the ways they bind to receptors in the body.
- The company likely will sell its services to pharmaceutical companies rather than developing drugs itself, Hassabis told the healthcare website Stat.
Behind the news: AlphaFold 2 has analyzed the shapes of over 98 percent of proteins in the human body. It remains for scientists to validate its output through lab experiments.
- AlphaFold debuted in 2018, when it won an annual contest for predicting protein shapes.
- A revised version won again in 2020 with an average error comparable to the width of an atom.
- DeepMind opened the system in July along with databases that detail the structure of hundreds of thousands of proteins.
Why it matters: Just 6.2 percent of drug candidates make it through clinical trials to market, and the cost of developing a successful medicine costs $1.3 billion on average. Isomorphic could wring trial and error out of the process, boosting success rates, cutting costs, and enriching drug-company customers.
We’re thinking: AlphaFold 2 is a big step forward for biomedicine, and deep learning promises further progress in areas like protein-protein interaction (how does a potential treatment interact with a target protein?) and protein dynamics (protein shapes aren’t static, and their motion can affect their properties). Much work by many determined researchers lies ahead to bridge the gap between lab and clinic.
GPT-3 for All
The GPT-3 natural language model both wowed and worried the AI community and the public alike with its ability to generate realistic prose. Now it’s ready to churn out text on a grand scale.
What’s new: Microsoft is making the giant, pretrained neural network available to selected customers through its Azure cloud service. The new service expands on restricted access offered by OpenAI.
How it works: Microsoft will grant access for well-defined applications that comply with the company’s principles for responsible AI, which include fairness, reliability, transparency, accountability, and privacy. Pricing remains undisclosed.
- Users will feed GPT-3 examples of the kinds of outputs they want it to generate. Microsoft envisions uses like summarizing sports commentary (as shown in the animation above), helping programmers write code, and brainstorming marketing copy.
- The service includes tools to further tailor the model’s output. For instance, filters can adjust the formality of generated language to suit casual video game dialogue or decorous corporate communications.
- Other tools will ensure that the model complies with local laws and meets customer requirements for network security, management, topology, and geography, Microsoft AI platform vice president John Montgomery told VentureBeat.
- Microsoft said the new implementation includes safety monitoring and analysis to help identify abuse or misuse. The company plans to use feedback from initial projects to build safeguards against harmful uses, a spokesperson told The Batch.
Behind the news: GPT-3’s road to commercialization began in early 2019, when OpenAI transitioned from a nonprofit research institute to a for-profit company. A few months later, it inked a $1 billion deal with Microsoft to help build the tech giant’s AI platform and later granted Microsoft exclusive commercial access to GPT-3. OpenAI launched a private beta program in mid-2020. The model also powers Microsoft’s Power Apps development platform, which converts natural language into computer code.
Why it matters: GPT-3 is an AI juggernaut of the sort that few companies can build, never mind design. Making it available on Azure puts it within reach of not only budding AI companies but also users in healthcare, manufacturing, government, and so on (albeit to use, not to modify). Developers using the beta version have harnessed GPT-3 to write fiction, generate music notation, and produce images based on text descriptions — over 300 applications as of spring 2021.
Yes, but: Like other architectures trained on text scraped from the web, GPT-3 has a propensity to generate biased, objectionable and confused output. Whether Microsoft’s implementation addresses these issues remains to be seen.
- OpenAI initially withheld an earlier version, GPT-2, due to worries that malicious actors could exploit it. GPT-3 hasn’t done away with that concern.
- In a recent study, researchers found that GPT-3 expressed a stereotyped association between Islam with violence.
- French medical technology company Nabla tested GPT-3 as a medical chatbot. It found it woefully lacking in expertise in diagnosis, treatment, and insurance. In one trial conversation, it advised a fake patient who expressed a wish to end their own life, “I think you should.”
We’re thinking: Microsoft and OpenAI may not have a monopoly on GPT-3’s capabilities for long. Several Chinese universities teamed up to build Wu Dao, which is purportedly 10 times bigger than GPT-3. Microsoft’s Silicon Valley competitors are following suit with ever larger language models. EleutherAI has released a much smaller open source attempt to duplicate GPT-3 and aims to scale it up. Meanwhile, AI21 Labs offers free access to the beta version of its 178 billion-parameter Jurassic-1.
A MESSAGE FROM DEEPLEARNING.AI
You’re invited! Join us to learn “How to Build and Deploy a Virtual Chatbot” with FourthBrain’s Greg Loughnane on November 17, 2021, at 10 a.m. Pacific Time. There will be a live demo and inside info on deployment, applications, and more! RSVP now
Facebook Doesn´t See Your Face
Facebook, beset by reports that it ignored evidence that it causes harm in its drive to spur user engagement, pulled the plug on its face recognition features.
What’s new: Facebook disabled face recognition for uploaded photos and videos as well as its retrospective Memories service and promised to delete over 1 billion individual face recognition templates. The company cited “societal concerns” and uncertainty about future government regulation.
Losing Face: Face recognition powered several Facebook features under a single opt-in. Over a third of its 1.8 billion daily users took advantage of them.
- Users received an alert whenever a photo or video of their face was uploaded, or whenever Memories displayed one.
- The service automatically tagged uploaded photos and videos with the names of people who appeared in them.
- The Automatic Alt Text system, which generates spoken image captions for visually impaired users, identified untagged people.
Behind the news: Apart from the current firestorm sparked by documents leaked by a former employee, privacy advocates have repeatedly targeted Facebook’s face recognition features.
- In 2021, the company agreed to pay $650 million to settle a class-action lawsuit. The plaintiffs held that tagging people in photos without consent violated an Illinois state law.
- In 2019, the Federal Trade Commission restricted the company’s use of the technology and fined the company $5 billion over its handling of personal information.
Why it matters: Facebook’s move follows similar actions by Amazon, IBM, and Microsoft. The trend suggests dim prospects for face recognition in mass-market consumer settings.
Yes but: This is not necessarily the end of face recognition at Facebook. The company retains its DeepFace technology and will continue to develop it. It views face recognition as an important tool in identity verification and fraud prevention, a spokesperson said.
We’re thinking: In the wake of retreating tech giants, a number of smaller companies have entered the market. The resulting fragmentation makes it harder to keep track of how the technology is being used and potentially abused. Thoughtful regulation should set a clear standard and hold all companies to it.
Finding Useful Points in Space
To interact with the world, a robot needs to know which items to grab, which items to avoid, and which joints to move in the process. A new method aims to improve a machine’s ability to determine and locate points of interest.
What's new: Boyuan Chen and Pieter Abbeel at UC Berkeley with Deepak Pathak at Carnegie Mellon developed Keypoint3D, an unsupervised training method that enables a model to identify spatial coordinates, known as keypoints, that correspond to useful locations in the environment — including spots on its own body.
Key insight: Previous work trained an agent in a virtual world to find keypoints based on a single two-dimensional camera view, but it performed poorly if that view contained overlapping or occluded objects. A similar approach can take advantage of multiple camera views to locate objects in 3D space. Using inferred 3D keypoints to regenerate the original camera views can help the agent learn to track particular objects consistently across time and space.
How it works: Keypoint3D trained a system to choose 32 keypoints most useful in completing a particular task and find their locations based on three camera views. The system was trained and tested in a virtual environment, where it drove an agent to complete a set of basic robot tasks (opening a door, closing a box, and so on), as well as draping a scarf over a mannequin (to demonstrate the system’s ability to manipulate the flexible material) and walking on four legs (to demonstrate its ability to find the agent’s own joints). They trained a reinforcement learning model jointly with the keypoint detection models to ensure that the choice of keypoints would be relevant to the task at hand.
- Three convolutional encoder networks (one for each camera view) learned to generate 32 two-dimensional heat and depth maps that indicated the probability that each pixel corresponded to a viable keypoint such as the end of a door handle. The system used the heat maps to calculate expected 2D coordinates of high-probability pixels. They used the depth maps to calculate the third dimension. The model used this information to produce three estimates of the location of each of 32 likely keypoints.
- The system summed the three estimates in a weighted average to reach a final estimate of each keypoint’s location in 3D space. The weights came from the probabilities in the corresponding heat and depth maps.
- The authors used the reinforcement learning algorithm proximal policy optimization (PPO) to train a vanilla neural network, given the estimated coordinates, to complete a given task. For example, given the locations of a quadruped’s joints, the model determined how to move the joints to make it walk.
- During training, the system used the coordinates to generate three views via separate convolutional decoders. They calculated three unsupervised training loss functions that (a) encouraged a generated image to be similar to the corresponding original, (b) encouraged the keypoint coordinates to be similar in each view, and (c) discouraged keypoints from bunching. It combined the unsupervised losses in a weighted sum with the loss from the reinforcement learning policy.
Why it matters: Other efforts to generate 3D keypoints from multiple views have been designed to locate static objects. This method accounts for changes over time to drive robots that can interact with dynamic environments.
We're thinking: It may seem odd to move a robot by guessing the locations of its joints in still images rather than knowing the joint positions in the first place. But this is how humans do it, too — try to bring together your left and right fingertips with your eyes closed. Giving robots this capability would enable them to locate and control objects with greater precision.