Tencent Unveils Voyager: A High‑Power AI Model for Turning Video Into 3D Worlds
Overview of Voyager
Tencent’s new AI model, Voyager, extends the company’s Hunyuan suite, which already includes Hunyuan3D‑2 for text‑to‑3D generation and HunyuanVideo for video synthesis. Voyager focuses on converting existing video clips into three‑dimensional worlds that can be explored interactively.
Training Methodology
Researchers built software that automatically analyzes video footage to extract camera movements and compute per‑frame depth. This approach removed the need for labor‑intensive manual labeling of thousands of hours of footage. The system processed more than 100,000 video clips drawn from both real‑world recordings and renders generated with the Unreal Engine.
Hardware Requirements
Running Voyager at a resolution of 540p requires a minimum of 60 GB of GPU memory, while Tencent recommends 80 GB for optimal results. The model can operate on single‑GPU or multi‑GPU configurations; using eight GPUs delivers processing speeds approximately 6.69 times faster than a single‑GPU setup.
Licensing Restrictions
The model’s license prohibits usage in the European Union, the United Kingdom and South Korea. Additionally, any commercial deployment serving more than 100 million monthly active users must obtain a separate licensing agreement from Tencent.
Benchmark Performance
In the WorldScore benchmark created by Stanford University researchers, Voyager achieved the highest overall score of 77.62, surpassing WonderWorld’s 72.69 and CogVideoX‑I2V’s 62.15. Voyager excelled in object control (66.92), style consistency (84.89) and subjective quality (71.09). It placed second in camera control with a score of 85.95, behind WonderWorld’s 92.98.
Deployment Considerations
Despite strong benchmark results, the model’s computational demands present challenges for widespread adoption. Developers seeking faster inference can leverage the xDiT framework for parallel processing across multiple GPUs.
Future Outlook
Voyager’s ability to generate coherent 3D worlds from video marks a step toward more immersive generative experiences, though real‑time interactive applications may still be some way off due to the required hardware power.
Used: News Factory APP - news discovery and automation - ChatGPT for Business