Background
The ability to accurately detect and track objects in 3D space has long been an important capability in several fields from sports science to robotics. Historically, marker-based motion capture has served as the gold standard for tracking systems; however, their reliance on physical infrared markers that are placed on the object are a significant limitation. A key aim of this research was to investigate methods of generalized multi-view object tracking using opportunistic features. In addition, we sought to explore methods of dense object reconstruction using a hand-held camera system with the aim of creating high-quality object meshes directly from video.
Approach
The primary tracking approach investigated in this project featured a multi-stage pipeline using synchronized RGB images from multiple cameras to locate and track a rigid-body object by leveraging opportunistic features. The pipeline first allows the user to select the desired target object. Next, the system generates an initial segmentation mask and solves for visual correspondences between each view, establishing an initial 3D location of the tracked object. Following initialization, the system leverages the same method of solving for visual correspondences to match between successive frames, allowing the system to estimate the motion of the object between frames and solve for its new position. Additionally, we developed a method of dense object reconstruction using a standard stereo camera which leveraged the truncated signed distance function (TSDF) to represent the object geometry. To evaluate our approaches, we created a robust dataset of object movements using multiple marker-based motion capture systems. These movements included scenarios with solely object movement (balls) as well as human-object interaction (baseball bats). Figure 1 highlights a sample of the human-object interaction data captured during the project along with overlays to demonstrate tracking performance.

Figure 1: Baseball bat tracking example.
Accomplishments
Over the course of this IR&D project, we constructed a data capture pipeline leveraging multiple motion capture systems and collected a comprehensive dataset of object motion. Furthermore, we developed a robust 3D object tracking pipeline which produces accurate 3D object trajectories and supports a variety of object types. Figure 2 compares the baseball bat trajectory generated by our 3D tracking pipeline with the ground-truth trajectory obtained using a traditional motion capture system. While further work will explore expanding this approach to provide full 6 degree-of-freedom (DOF) object poses, this effort represents a significant advancement in object tracking research.

Figure 2: Bat tracking performance comparison.