Animal Animations From Video ZooBuilder AI Tool Converts Animal Videos into Animations'

Published

Apr 06, 2022

Reading time

2 min read

A video game studio is looking to machine learning to cut the effort and expense of populating three-dimensional scenes with animated animals.

What’s new: Ubisoft showed off ZooBuilder, a pipeline of machine learning tools that converts videos of animals into animations. The system is a prototype and hasn’t been used in any finished games.

How it works: In the absence of an expensive dataset that depicts animals in motion, researchers at Ubisoft China and elsewhere generated synthetic training data from the company’s existing keyframe animations of animals. They described the system in an earlier paper.

Given a rendered keyframe animation, the team used virtual cameras to capture the animal’s image from a variety of perspectives. They composited photorealistic backgrounds behind the animals and augmented the images by flipping, changing contrast, adding noise, and converting color to grayscale.
For each animation frame and perspective, they extracted 3D coordinates of each of 37 joints and converted them into 2D coordinates. These procedures provided 2D and 3D labels for their dataset.
They fine-tuned a pretrained OpenPose (which originally was developed for human figures) on the 2D data. Given a 2D image, it learned to extract 2D joint coordinates.
To track the temporal relationships between connected joints, they fine-tuned a pretrained 2D-to-3D human pose estimator on the extracted 2D data. It learned to map the joints in five consecutive rendered frames to their corresponding 3D coordinates.
At inference, the system accepts a video of an animal in the wild along with a skeleton (collection of joints), 3D mesh, and skin. It uses a pretrained YOLOv3 object detector to locate animals within bounding boxes and crops the video frames accordingly. OpenPose finds 2D coordinates and Pose_3D translates them into a sequence of 3D coordinates. The system converts the 3D coordinates into an animation file, applies it to the mesh, covers the mesh with skin, and outputs an animation. Human animators refine the result.

Yes, but: ZooBuilder initially was limited to cougars and had trouble tracking them when parts of their bodies were occluded or out of the frame, and when more than one creature was in the frame. Whether Ubisoft has overcome these limitations is not clear.

Behind the news: Machine learning is playing an increasing role in 3D graphics.

Kinetix offers an internet-based tool that converts uploaded videos of humans in motion into animated 3D models. The company has partnered with Adobe to accept models and animations from Mixamo, which uses machine learning to automate 3D animation.
Disney researchers used StyleGAN2 to create realistic animated faces.

Why it matters: It can take months of person-hours to animate a 3D creature using the typical keyframe approach. Automated systems like this promise to make animators more productive and could liberate them to focus on portraying in motion the fine points of an animal’s personality.

We’re thinking: There’s face recognition for cows, speech recognition for birds, sentiment analysis for pigs, and now OpenPose for cougars. What will the animals steal from us next?!

Subscribe to The Batch