BEAST3D: Animal behavioral analysis and neural encoding from multi-view video via Gaussian splatting
Title: BEAST3D: Leveraging Gaussian Splatting for Neural Encoding and Animal Behavior Analysis in Multi-View Video
Abstract:
While multi-view video has become a standard method for capturing the three-dimensional movements of animals in experimental contexts, deriving comprehensive 3D data from these recordings continues to pose significant technical hurdles. Traditional supervised pose estimation is hindered by the need for labor-intensive manual labeling, whereas off-the-shelf 3D reconstruction tools, typically trained on broad scene datasets, struggle with the unique imagery and limited viewpoints characteristic of laboratory environments. To overcome these obstacles, we introduce BEAST3D, a self-supervised pretraining framework designed to learn 3D visual representations directly from unlabeled, calibrated multi-view video.
BEAST3D employs a vision transformer to predict 3D Gaussian splats, which facilitate the reconstruction of unseen camera angles via differentiable rendering. Concurrently, the model isolates the animal from its surroundings through segmentation. By conditioning directly on established camera parameters, BEAST3D can reconstruct 3D structures using as few as four views. This approach contrasts sharply with general-purpose models, which rely on dense, overlapping viewpoints to estimate camera geometry—a condition rarely met in lab settings.
Our extensive evaluation across four distinct species confirms that BEAST3D generates rich, viewpoint-invariant features that transfer robustly to three key downstream applications: novel view synthesis, which serves as a benchmark for the fidelity of the learned 3D representations; multi-view pose estimation, which yields the sparse keypoint trajectories essential for behavioral analysis; and neural encoding, which correlates 3D behavioral metrics with concurrent neural activity data. Consequently, BEAST3D offers a flexible framework for behavioral analysis that capitalizes on the 3D structural information inherent in contemporary multi-view laboratory recordings.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



