KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera

Abstract

📜 Abstract

KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D models of a real-world scene. This allows models of far greater detail to be created than previously possible using only the input from a single viewpoint in space. Using an off-the-shelf GPU, we are able to very quickly (in seconds) reconstruct complex dense 3D scenes with significantly greater levels of detail (300k to 1 million triangles per model compared to existing Kinect-based, Structure from Motion systems, which are able to reconstruct only sparse point clouds) than previously possible. We demonstrate various novel interactive applications enabled by KinectFusion, by constructing and operating on detailed 3D scene reconstructions including: as a compelling augmented reality environment, and as a real-time, physics enabled sandbox.

Description

✨ Summary

KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera was presented at UIST 2011 and introduced a breakthrough in real-time 3D reconstruction using Microsoft’s Kinect sensor. Prior methods could only reconstruct sparse point clouds, but KinectFusion enabled complete, high-detail models, enhancing applications in fields like augmented reality and human-computer interaction.

Using GPU computation, KinectFusion achieves a rapid processing ability, creating dense 3D models with hundreds of thousands to millions of polygons in real-time. Interactive applications like augmented reality scenarios and physics-based simulations were showcased, demonstrating the system’s versatility.

This paper has influenced numerous studies and industrial applications, particularly where real-time 3D reconstruction is critical. It has contributed to advancements in virtual/augmented reality and interactive gaming, where detailed and real-time scene understanding are necessary.

For instance, it has been cited in works involving mobile robot navigation (Collett et al., 2013), and augmented reality frameworks like the studies conducted by Niantic, Inc. It has also influenced advancements in graphics rendering techniques where intricate environment modeling is required.

Key references to this work include: - Nießner, M., Zollhöfer, M., Izadi, S., & Stamminger, M. (2013). “Real-time 3D reconstruction at scale using voxel hashing”. In ACM Transactions on Graphics (TOG). Link - Roth, H., & Vona, M. (2012). “Moving Volume KinectFusion”. In BMVC. Link - Newcombe, R. A., Fox, D., & Seitz, S. M. (2015). “DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Link