Time-varying scene meshes are widely used for volumetric video, enabling immersive six degrees of freedom (6DoF) interaction. However, their large data size compared to 2D video poses significant challenges for efficient delivery and real-time streaming, and existing mesh compression methods are not well-suited for dynamic, full-scene content.
To the best of our knowledge, TSMC is the first method to exploit temporal redundancy for inter-frame coding of large, unbounded scene meshes. Unlike prior approaches limited to static or object-level meshes, TSMC supports full-scene compression with complex spatial and temporal variations, and can also benefit hybrid 3DGS frameworks that combine meshes with Gaussian splats.
TSMC first identifies dynamic regions using a SAM3-based segmentation approach. For these regions, it constructs a volume-tracked reference mesh to handle self-contact and computes displacement fields by tracking vertex motion across frames. Static backgrounds are encoded once per group of frames, while dynamic regions are represented using a reference mesh and compressed displacement fields via Karhunen-Loéve Transform and Laplacian coordinates, achieving substantial size reduction with high visual fidelity.