Skip to content

MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation

License

Notifications You must be signed in to change notification settings

CompVis/maskflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MaskFlow: Discrete Flows for Flexible and Efficient Long Video Generation

This repository represents the official implementation of the paper titled "[MaskFlow: Discrete Flows for Flexible and Efficient Long Video Generation".

Website Paper License visitors

Michael Fuest, Vincent Tao Hu, Björn Ommer

teaser

TLDR

MaskFlow is a chunkwise autoregressive approach to long video generation that uses frame-level masking and confidence-based heuristic sampling to produce seamless, high-quality video sequences efficiently. Instead of generating entire videos at once, MaskFlow generates overlapping chunks of frames, where each new chunk is conditioned on previously generated frames to ensure temporal consistency. During training, the model learns to reconstruct partially masked frames, making it naturally suited for extending video sequences while maintaining coherence. The frame-level masking strategy aligns perfectly with chunkwise generation, enabling the model to handle different levels of corruption while ensuring smooth transitions. To further speed up inference, we incorporate confidence-based heuristic sampling, selectively unmasking only the most confidently predicted tokens at each step. This approach allows MaskFlow to generate long videos with greater flexibility and efficiency than traditional methods..

🎓 Citation

Please cite our paper:

@InProceedings{fuest2025maskflow,
      title={MaskFlow: Discrete Flows for Flexible and Efficient Long Video Generation},
      author={Michael Fuest and Vincent Tao Hu and Björn Ommer},
      booktitle = {Arxiv},
      year={2025}
}

✅ Updates

  • Mar. 4th, 2025: Training code released.
  • Feb. 16th, 2025: Arxiv released.

📦 Training

FaceForensics

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes 4 --num_machines 1 --multi_gpu --main_process_ip 127.0.0.1 --main_process_port 8868 train_acc_vq.py model=dlatte_xl2 compile=pre mixed_precision=fp16 dynamic.scheduling_matrix=full_sequence dynamic=maskflow dynamic.scheduler=sigmoid dynamic.time_cond=1 dynamic.mask_ce=1 input_tensor_type=btwh tokenizer=sd_vq_f8 data=ffs_indices data.sample_fid_every=50_000 data.batch_size=2 data.sample_fid_bs=1 data.sample_fid_n=10_0 data.sample_fid_every=400_000 data.sample_vis_n=1 data.sample_vis_every=50_000 data.num_workers_per_gpu=12 ckpt_every=200_000 data.train_steps=400_000 dynamic.reweigh_loss=snr dynamic.cum_snr_decay=0.8 dynamic.snr_clip=6.0 dynamic.use_fused_snr=1 dynamic.objective=pred_x0 dynamic.noise_level=random_all tokenizer.latent_size=32 dynamic.sampler=mgm dynamic.sampling_timesteps=20 dynamic.n_context_frames=2 dynamic.sampling_window_stride=12 dynamic.sampling_horizon=16 dynamic.sampling_timesteps=20

DMLab

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes 4 --num_machines 1 --multi_gpu --main_process_ip 127.0.0.1 --main_process_port 8868 train_acc_vq.py model=dlatte_b2 compile=pre mixed_precision=fp16 dynamic.scheduling_matrix=full_sequence dynamic=maskflow dynamic.scheduler=sigmoid dynamic.time_cond=1 dynamic.mask_ce=1 input_tensor_type=btwh tokenizer=sd_vq_f8 data=dmlab_indices data.sample_fid_every=50_000 data.batch_size=3 data.sample_fid_bs=1 data.sample_fid_n=10_0 data.sample_fid_every=400_000 data.sample_vis_n=1 data.sample_vis_every=50_000 data.num_workers_per_gpu=12 ckpt_every=200_000 data.train_steps=400_000 dynamic.reweigh_loss=snr dynamic.cum_snr_decay=0.8 dynamic.snr_clip=6.0 dynamic.use_fused_snr=1 dynamic.objective=pred_x0 dynamic.noise_level=random_all tokenizer.latent_size=32 dynamic.sampler=mgm dynamic.sampling_timesteps=20 dynamic.n_context_frames=2 dynamic.sampling_window_stride=12 dynamic.sampling_horizon=16 dynamic.sampling_timesteps=20

Evaluation

FaceForensics

TODO

DMLab

TODO

Weights

TODO

Dataset Preparation

TODO

🎫 License

This work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

By downloading and using the code and model you agree to the terms in the LICENSE.

License

About

MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published