I decided last weekend not to use a learning algorithm. Sometimes, a non-machine learning method works best.
Now that my daughter is a little over two years old and highly mobile, I want to make sure the baby gate that keeps her away from the stairs is always shut. It’s easy to forget and leave it open when walking through. How do you do this?
I started designing a system where I’d collect images of the gate both open and shut, and train a neural network to distinguish between the two. Then I would use TensorRT to deploy the model on a Raspberry Pi computer, which would beep if the gate were left open for more than 60 seconds.
I got as far as wiring up the system. Then I found a refrigerator-door alert widget that does the same job by sensing when a magnet is separated from a detector.
It goes to show that sometimes you don’t need a big neural network to do the job. (But when you do need one, it’s handy.) That’s why it’s nice to have a portfolio of techniques. Then we can better pick the right one for a given job.
Perhaps one lesson here is to pick the right sensor: To do the job with a camera, I needed a computer vision algorithm. But with a magnetic sensor, making the decision to beep when the gate is left open becomes trivial.
Medical AI Gets a Grip
Surgical robots perform millions of delicate operations annually under human control. Now they’re getting ready to operate on their own.
What’s new: Researchers at UC Berkeley, UC San Francisco, and SRI International trained a machine learning system to pilot a da Vinci two-armed surgical robot through a task that tested its dexterity, precision, and speed, The New York Times reported.
How it works: The system learned via imitation learning to lift tiny plastic rings off a pegboard, pass them from one claw to the other, and slide them onto different pegs. The task is a exercise for surgeons learning to perform laparoscopic procedures, in which a camera and other specialized instruments are inserted into the patient’s body through a small incision.
- The authors trained an ensemble of four convolutional neural networks on 180 RGBD (red, green, blue, plus depth) video clips of human surgeons using the robot to demonstrate an error and how to correct it, as well as information about the robot’s joint positions. The system learned to perform the task, but its precision degraded over time as the cables that control the robot’s limbs stretched, causing the model to miss its targets.
- To compensate for the gradual loss of precision, the authors trained an LSTM on motion-capture data of the robot’s joint positions as the machine performed random motions autonomously.
- Together, the two models proved more agile, precise, and rapid on the ring-and-peg test than human surgeons.
Behind the news: AI already assists physicians in a few small but important procedures. For instance, a robotic tool from the Dutch company Microsure, which helps suture tiny incisions on blood vessels, uses AI to stabilize shaking in the operator’s hands.
Why it matters: This is a nice example of an algorithm that handles concept drift in robotic control. A lot of work in model-based reinforcement learning assumes a fixed model. But just as the dynamics of a human arm change as the arm tires — and a surgeon must adapt to control that tiring arm — we want learning algorithms to adapt to gradual changes in the robot’s dynamics.
We’re thinking: We’re looking to AI systems that help optimize nutrition, exercise, and sleep to help steer us clear of AI systems that wield a scalpel!
Crosswords Go the Way of Go
A hybrid of deep learning and symbolic AI took the prize at a major puzzle competition.
What’s new: A system called Dr. Fill outscored nearly 1,300 human contestants at April’s annual American Crossword Puzzle Tournament, Slate reported.
How it works: Oregon polymath Matt Ginsberg debuted a logic-based system at the tournament in 2012, taking 11th place. This year, Ginsberg paired his model with a neural crossword solver developed by at UC Berkeley.
- Trained on a database of 6 million paired clues and answers, the Berkeley system reads puzzle clues and generates candidate words.
- It serves candidates to the symbolic system, which calculates the probability that each one is the correct answer based on factors like the number of letters and whether its spelling conflicts with intersecting words.
- At the late-April tournament, Dr. Fill made only three errors and solved the final puzzle in 49 seconds — over two minutes ahead of the fastest human.
Behind the news: Founded in 1978, the American Crossword Puzzle Tournament requires competitors to complete eight puzzles in two days. The three fastest and most accurate competitors face off on a final puzzle to vie for the $3,000 grand prize.
Why it matters: Neural networks and symbolic systems are often seen as competing approaches. Together, they can help solve previously elusive problems.
We’re thinking: What’s a 12-letter catchphrase that describes a persistent attitude toward gaining knowledge and skills?
A MESSAGE FROM DEEPLEARNING.AI
The first two courses in our Machine Learning Engineering for Production (MLOps) Specialization are live on Coursera! Enroll now
Virtual Reality in Real Time
Ideally, real-time 3D applications such as virtual and augmented reality transition smoothly between different viewpoints of a scene — but generating a fresh perspective can take time. New research speeds the process.
What’s new: Stephan Garbin and colleagues at Microsoft developed FastNeRF, a system that accelerates the photorealistic 3D rendering method known as Neural Radiance Fields (NeRF) to visualize scenes from any angle at a brisk 200 frames per second.
Key insight: To visualize one frame of a 3D scene, you need to know the position of a virtual camera and the directions of a set of virtual light rays that extend from the camera through each pixel in the frame. (The objects behind the pixels have a basic color that may be modified by lights, shadows, occlusion, and transparency.) NeRF computes a pixel’s color by combining the color/transparency of all points that lie along the associated ray, which requires hundreds of neural network inferences — tough to pull off in real time. FastNeRF manages the computational burden through a two-part workaround. First, rather than calculating on the fly, it pre-computes and stores information about all possible rays and points along them. Second, to avoid having to store every possible combination of ray and point (1,0243 * 1,0242 values, assuming 1,024 samples per spatial dimension), it stores each point’s basic color and transparency based on its position, and the shift in its color due to a ray’s direction (1,0243 + 1,0242 values).
How it works: FastNeRF uses two vanilla neural networks to compute information based on a point’s position (the position network) and a ray’s direction (direction network). The authors trained the system on Synthetic NeRF, which contains 360-degree views of real-world objects like model ships and LEGO constructions, and frontal views of objects in Local Light Field Fusion.
- FastNeRF evenly samples points throughout the scene. The position network calculates each point’s transparency as well as a vector that represents its basic color. It stores the results.
- Similarly, FastNeRF evenly samples rays pointing in all directions. The direction network calculates a vector that represents how each ray’s direction would affect the color of all points along that ray. It stores that result as well.
- To compute a pixel’s value, FastNeRF combines the transparency, basic color, and the effect of the ray’s direction for every point along the ray.
- It weights each point’s color (from the location network) by the output of the direction network. Then it weights each point’s color by its transparency. Finally, it sums the twice-weighted color of all points along the ray.
Results: Running on a high-end consumer graphs board, FastNeRF performed over 3,000 times faster than NeRF. For example, it rendered a scene of a LEGO tractor in 0.0056 seconds versus NeRF’s 17.46 seconds. Despite its speed, on Synthetic NeRF, FastNeRF achieved 29.97dB peak signal-to-noise ratio, which gauges how well a generated image reproduces the original (higher is better), versus NeRF’s 29.54dB.
Why it matters: The authors reduced an unmanageable quantity of high-dimensional data to a practical size by dividing the information based on point position and ray direction between two models. A similar approach could be useful in applications that require optimization over many input parameters, such as drug discovery and weather modeling.
We’re thinking: Augmented and virtual reality promise to bring powerful new approaches in education, entertainment, and industry — if we can make them cheap, easy, and fast enough. Deep learning is helping us get there.
Algorithms for Orcas
A combination of computer vision and drones could help restore dwindling killer whale populations.
What’s new: Researchers at Oregon State University and conservation groups SR3 and Vulcan developed a system that assesses the health of orcas, Geekwire reported.
How it works: The researchers fly drones off the coast of British Columbia and Washington State to capture video of orcas as they swim near the water’s surface. Four machine learning models collectively called Aquatic Mammal Photogrammetry Tool analyze the imagery.
- The first model identifies video frames that include orcas and draws bounding boxes around the creatures. Next, a segmentation model outlines their bodies. A landmark detector then identifies each animal's snout, dorsal fins, and other parts and uses their relative shape and position to estimate its health. The fourth model identifies individuals based on the shape of the grey patch behind the dorsal fin.
- Analyzing photos of orcas for signs of ill health used to take six months. The system cuts that time to weeks or days.
- The results can inform policymakers about the need for protective measures, such as limiting the number of salmon that commercial fishermen are allowed to catch in order to leave more for the orcas.
Behind the news: Conservationists are getting help from machine learning across the animal kingdom.
- An open source project is developing an AI-equipped collar to protect elephants from poachers.
- A system developed at University of Southern California suggests optimal patrol routes to help park rangers in Cambodia intercept poachers.
- Wildlife.ai is a nonprofit hub that organizes AI projects for identifying threatened species of frogs, fish, and other animals.
Why it matters: With detailed information about the health of individual creatures, conservationists can respond more quickly when they’re in trouble. The developers plan to open-source their work so it can be adapted to other populations of orcas and possibly other species of aquatic mammals.
We’re thinking: The Pacific Northwest orca population has shrunk to 75 individuals, the lowest number in 30 years. We hope for a rebound.