Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DDIM Inversion for CogVideoX #709

Merged
merged 6 commits into from
Feb 27, 2025

Conversation

LittleNyima
Copy link
Contributor

There have been some attempts at performing DDIM Inversion on CogVideoX (#397 #689), but as far as I know, these efforts have not yielded very good results. Recently, I successfully implemented DDIM Inversion on CogVideoX, and the results are as follows.

Inverse and reconstruct with an empty prompt:

1.mp4

Inverse and reconstruct with an editing prompt:

2.mp4

I am currently organizing the method introduction and related code, and these will be updated in this PR soon.

@LittleNyima
Copy link
Contributor Author

This link is a brief introduction to the purpose and implementation of this PR.

@LittleNyima LittleNyima marked this pull request as ready for review February 20, 2025 13:46
@LittleNyima LittleNyima changed the title [WIP] Implement DDIM Inversion for CogVideoX Implement DDIM Inversion for CogVideoX Feb 20, 2025
@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Feb 22, 2025

So happy to see your contribution!
This code is for CogVideoX-2B or all series of CogVideoX?

@LittleNyima
Copy link
Contributor Author

So happy to see your contribution! This code is for CogVideoX-2B or all series of CogVideoX?

My development is based on the 5B variant, and it should work on the 2B model as well. I will test it on various base models in the coming days.

@a-r-r-o-w
Copy link

a-r-r-o-w commented Feb 22, 2025

@LittleNyima Wow, this is super cool 🔥🔥🔥

Would you like to also showcase this in community pipelines?

@LittleNyima
Copy link
Contributor Author

@LittleNyima Wow, this is super cool 🔥🔥🔥

Would you like to also showcase this in community pipelines?

@a-r-r-o-w Thanks for your invitation, and I totally would!

@zRzRzRzRzRzRzR
Copy link
Member

I can try it in the next two days. If it runs on both 5B and 2B, I will merge this code and modify the readme to add a reference link.

@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Feb 23, 2025

I hope to know your testing environment, because in my environment, when loading the model's pipeline, there is an issue with your code

pipeline = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype).to(device=device)

Error

Segmentation fault (core dump)

I haven't located the problem yet. I am using a A100 and over 128G of memory, so it is unlikely to be a hardware issue, and my CUDA version is 12.4.
My package version is:

torch                    2.6.0
torchvision 0.21.0
diffusers                0.32.2

But the cli_demo runs normally for me.

image

My initial suspicion is that you have rewritten some classes, which has led to conflicts.

@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Feb 23, 2025

Added that the line of code you mentioned has been successfully run.

export LD_PRELOAD=/lib/x86-64-linux-gnu/libstdc++.so.6

However, this does not work with CogVideoX-2b, because, diffusers
The DDIMInverseScheduler class does not provide a method for linspace, and an error is reported. This method is available in CogVideoXDDIMScheduler. @a-r-r-o-w Can you check this issue.

@a-r-r-o-w
Copy link

The DDIMInverseScheduler class does not provide a method for linspace, and an error is reported. This method is available in CogVideoXDDIMScheduler.

DDIMInverseScheduler is only compatible with models that use DDIMScheduler. Since we're using a slightly modified objective in CogVideoX (which is why we needed the separate CogVideoXDDIMScheduler), the corresponding inverse scheduler will be different but is not implemented in diffusers

@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Feb 23, 2025

I tried to increase it this way, but the effect is not right
This is the original video:

output.mp4

output_reconstruction.mp4:

output_reconstruction.mp4

output_inversion.mp4:

output_inversion.mp4

For 5B, this is correct

@LittleNyima
Copy link
Contributor Author

I tried to increase it this way, but the effect is not right This is the original video:

output.mp4
output_reconstruction.mp4:

output_reconstruction.mp4
output_inversion.mp4:

output_inversion.mp4
For 5B, this is correct

I guess this is likely an issue about RoPE, which is not applied in the 2B model.

@zRzRzRzRzRzRzR
Copy link
Member

I tried to increase it this way, but the effect is not right This is the original video:
output.mp4
output_reconstruction.mp4:
output_reconstruction.mp4
output_inversion.mp4:
output_inversion.mp4
For 5B, this is correct

I guess this is likely an issue about RoPE, which is not applied in the 2B model.

image_rotary_emb does not exist in 2B, and the code does not perform the job of rope.

Copy link
Member

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference order of the record needs to be modified.

@LittleNyima
Copy link
Contributor Author

I tried to increase it this way, but the effect is not right This is the original video:
output.mp4
output_reconstruction.mp4:
output_reconstruction.mp4
output_inversion.mp4:
output_inversion.mp4
For 5B, this is correct

I guess this is likely an issue about RoPE, which is not applied in the 2B model.

image_rotary_emb does not exist in 2B, and the code does not perform the job of rope.

Yes, that's exactly what I mean. RoPE is important for the control process, and perhaps a similar mechanism is also needed in the 2B model to make the model aware of spatial relationships.

@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Feb 26, 2025

I think we can write at the Code beginning that the model does not support CogVideoX-2B, and for the other parts, I think we can merge them first.

@LittleNyima
Copy link
Contributor Author

I think we can write at the Code beginning that the model does not support CogVideoX-2B, and for the other parts, I think we can merge them first.

I agree with this. In the past two days, I have tried to add support for the 2B model, but I have not obtained satisfactory results. I would determine whether the model supports RoPE during inference, and if not, an error will be raised with a description message.

@LittleNyima
Copy link
Contributor Author

@zRzRzRzRzRzRzR The import order is fixed, and an exception is now raised if RoPE is not supported. I think this script is now ready for merging.

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR merged commit eb66c9c into THUDM:main Feb 27, 2025
@LittleNyima LittleNyima deleted the feature/ddim-inversion branch February 27, 2025 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants