Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction

Cheng-You Lu¹, Zhuoli Zhuang¹, Nguyen Thanh Trung Le¹, Da Xiao¹, Yu-Cheng Chang¹, Thomas Do¹, Srinath Sridhar², Chin-Teng Lin¹

¹University of Technology Sydney ²Brown University
WACV 2026

Abstract

Advances in 3D reconstruction and novel view synthesis have enabled efficient and photorealistic rendering. However, images for reconstruction are still either largely manual or constrained by simple preplanned trajectories. To address this issue, recent works propose generalizable next-best-view planners that do not require online learning. Nevertheless, robustness and performance remain limited across various shapes. Hence, this study introduces Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction (Hestia), which addresses the shortcomings of the reinforcement learning-based generalizable approaches for five-degreeof-freedom viewpoint prediction. Hestia systematically improves the planners through four components: a more diverse dataset to promote robustness, a hierarchical structure to manage the high-dimensional continuous action search space, a close-greedy strategy to mitigate spurious correlations, and a face-aware design to avoid overlooking geometry. Experimental results show that Hestia achieves non-marginal improvements, with at least a 4% gain in coverage ratio, while reducing Chamfer Distance by 50% and maintaining real-time inference. In addition, Hestia outperforms prior methods by at least 12% in coverage ratio with a 5-image budget and remains robust to object placement variations. Finally, we demonstrate that Hestia, as a next-best-view planner, is feasible for the real-world application.

BibTex

@InProceedings{Lu_2026_WACV, author = {Lu, Cheng-You and Zhuang, Zhuoli and Le, Nguyen Thanh Trung and Xiao, Da and Chang, Yu-Cheng and Do, Thomas and Sridhar, Srinath and Lin, Chin-Teng}, title = {Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {5302-5312} }

Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction

We propose Hestia, a generalizable RL-based next-best-view planner that actively predicts viewpoints for data capture in 3D reconstruction tasks.

Abstract

A voxel is worth more than a ray. Hestia treats each voxel as a cube by considering its six faces, rather than a point. This reduces the information loss inherent in point approximations, ensuring a more accurate representation of the voxel.

More diverse and large-scale training set. Our processed training set from Objaverse is two orders of magnitude larger than those used in previous studies and includes at least 18 more categories than prior datasets.

Greedy design. Hestia adopts a near-greedy optimization strategy by reducing the discount factor to avoid spurious correlations.

Real-World Demo

PCD Reconstruction

Benchmark

BibTex