Skip to content

Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory

Notifications You must be signed in to change notification settings

MeiGen-AI/Infinite-World

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Infinite-World

Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory

arXiv Project Page

Ruiqi Wu1,2,3*, Xuanhua He4,2*, Meng Cheng2*, Tianyu Yang2, Yong Zhang2‡, Zhuoliang Kang2, Xunliang Cai2, Xiaoming Wei2, Chunle Guo1,3†, Chongyi Li1,3, Ming-Ming Cheng1,3

1Nankai University   2Meituan   3NKIARI   4HKUST

*Equal Contribution   Corresponding Author   Project Leader

 

Demo

 


Highlights

Infinite-World is a robust interactive world model with:

  • Real-World Training — Trained on real-world videos without requiring perfect pose annotations or synthetic data
  • 1000+ Frame Memory — Maintains coherent visual memory over 1000+ frames via Hierarchical Pose-free Memory Compressor (HPMC)
  • Robust Action Control — Uncertainty-aware action labeling ensures accurate action-response learning from noisy trajectories

Infinite-World Framework

Installation

Environment: Python 3.10, CUDA 12.4 recommended.

1. Create conda environment

conda create -n infworld python=3.10
conda activate infworld

2. Install PyTorch with CUDA 12.4

pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124

3. Install Python dependencies

pip install -r requirements.txt

Checkpoint Configuration

All model paths are configured in configs/infworld_config.yaml. Paths are relative to the project root unless absolute.

Download checkpoints

Download checkpoints from https://huggingface.co/MeiGen-AI/Infinite-World and place files under checkpoints/:

File / directory Config key Description
models/Wan2.1_VAE.pth vae_cfg.vae_pth VAE weights
models/models_t5_umt5-xxl-enc-bf16.pth text_encoder_cfg.checkpoint_path T5 text encoder
models/google/umt5-xxl (folder) text_encoder_cfg.tokenizer_path T5 tokenizer
infinite_world_model.ckpt checkpoint_path DiT model weights

Results

Quantitative Comparison

Model Mot. Smo.↑ Dyn. Deg.↑ Aes. Qual.↑ Img. Qual.↑ Avg. Score↑ Memory↓ Fidelity↓ Action↓ ELO Rating↑
Hunyuan-GameCraft 0.9855 0.9896 0.5380 0.6010 0.7785 2.67 2.49 2.56 1311
Matrix-Game 2.0 0.9788 1.0000 0.5267 0.7215 0.8068 2.98 2.91 1.78 1432
Yume 1.5 0.9861 0.9896 0.5840 0.6969 0.8141 2.43 1.91 2.47 1495
HY-World-1.5 0.9905 1.0000 0.5280 0.6611 0.7949 2.59 2.78 1.50 1542
Infinite-World 0.9876 1.0000 0.5440 0.7159 0.8119 1.92 1.67 1.54 1719

Citation

If you find this work useful, please consider citing:

@article{wu2026infiniteworld,
  title={Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory},
  author={Wu, Ruiqi and He, Xuanhua and Cheng, Meng and Yang, Tianyu and Zhang, Yong and Kang, Zhuoliang and Cai, Xunliang and Wei, Xiaoming and Guo, Chunle and Li, Chongyi and Cheng, Ming-Ming},
  journal={arXiv preprint arXiv:2602.02393},
  year={2026}
}

License

This project is released under the MIT License.

About

Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published