Creating a virtual representation of a scene using traditional polygons and texture maps involves several complex operations, and even neural-network approaches have required manual preprocessing. Researchers from the Samsung AI Center and Skolkovo Institute of Science and Technology propose a new deep-learning pipeline that visualizes scenes with far less fuss.
What’s new: Aliev et al.’s Neural Point Based Graphics technique rapidly produces realistic images in an end-to-end process. It does particularly well with thin objects that are hard to model using a polygonal mesh, such as shoe laces and bicycle tires. You can see it in action here.
Key insight: There’s no need to model surfaces to represent a scene. Point clouds and corresponding images together contain enough information for a neural network to generate realistic images. Moreover, neural networks can fill in missing information such as parts of objects hidden from view, which simplifies scene modeling.
How it works: The system starts with a point cloud representing a scene, an image of the scene, camera parameters including viewing angle, and a randomly initialized vector representation of each point that encodes shape and surface properties.
- Using traditional graphics libraries and algorithms, it pixelizes a scene’s point cloud and vectors into a multi-channel raw image.
- A rendering network based on the U-Net architecture takes the raw image as input. It learns simultaneously to improve the vectors and generate a final RGB image by minimizing the difference between generated and ground-truth images.
- Once trained, the system can accept a new camera position to generate corresponding viewpoints from a given pixel cloud and learned vectors.
Results: The researchers compared photographic input and generated images from a variety of data sets, including consumer cameras, across several scene-capture techniques, including traditional and deep learning methods. Their system scored highest on a number of measures of image similarity. While its rendering of synthetic scenes isn’t as realistic as that achieved by state-of-the-art ray tracing methods, it produces good-looking images roughly 2,000 times faster.
Why it matters: Neural Point-Based Graphics is a distinct step forward for end-to-end scene capture. By demonstrating that point clouds and images — which can come from a smartphone — together can represent scenes in realistic detail, this research opens the door for refinements that could ultimately compete with the best current methods in a much simpler pipeline.
We’re thinking: Just as neural networks have replaced rule-based systems in computer vision and language applications, they’re on track to have a similar impact in graphics. Given its simplicity and speed, this approach could facilitate real-time applications such as video games, virtual reality, and augmented reality.