Implement DDIM Inversion for CogVideoX #709

LittleNyima · 2025-02-18T09:58:29Z

There have been some attempts at performing DDIM Inversion on CogVideoX (#397 #689), but as far as I know, these efforts have not yielded very good results. Recently, I successfully implemented DDIM Inversion on CogVideoX, and the results are as follows.

Inverse and reconstruct with an empty prompt:

1.mp4

Inverse and reconstruct with an editing prompt:

2.mp4

I am currently organizing the method introduction and related code, and these will be updated in this PR soon.

LittleNyima · 2025-02-20T13:46:24Z

This link is a brief introduction to the purpose and implementation of this PR.

zRzRzRzRzRzRzR · 2025-02-22T09:09:13Z

So happy to see your contribution!
This code is for CogVideoX-2B or all series of CogVideoX?

LittleNyima · 2025-02-22T16:14:14Z

So happy to see your contribution! This code is for CogVideoX-2B or all series of CogVideoX?

My development is based on the 5B variant, and it should work on the 2B model as well. I will test it on various base models in the coming days.

a-r-r-o-w · 2025-02-22T23:04:46Z

@LittleNyima Wow, this is super cool 🔥🔥🔥

Would you like to also showcase this in community pipelines?

LittleNyima · 2025-02-23T03:57:33Z

@LittleNyima Wow, this is super cool 🔥🔥🔥

Would you like to also showcase this in community pipelines?

@a-r-r-o-w Thanks for your invitation, and I totally would!

zRzRzRzRzRzRzR · 2025-02-23T04:47:27Z

I can try it in the next two days. If it runs on both 5B and 2B, I will merge this code and modify the readme to add a reference link.

zRzRzRzRzRzRzR · 2025-02-23T07:43:20Z

I hope to know your testing environment, because in my environment, when loading the model's pipeline, there is an issue with your code

pipeline = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype).to(device=device)

Error

Segmentation fault (core dump)

I haven't located the problem yet. I am using a A100 and over 128G of memory, so it is unlikely to be a hardware issue, and my CUDA version is 12.4.
My package version is:

torch                    2.6.0
torchvision 0.21.0
diffusers                0.32.2

But the cli_demo runs normally for me.

My initial suspicion is that you have rewritten some classes, which has led to conflicts.

inference/ddim_inversion.py

zRzRzRzRzRzRzR · 2025-02-23T13:47:08Z

Added that the line of code you mentioned has been successfully run.

export LD_PRELOAD=/lib/x86-64-linux-gnu/libstdc++.so.6

However, this does not work with CogVideoX-2b, because, diffusers
The DDIMInverseScheduler class does not provide a method for linspace, and an error is reported. This method is available in CogVideoXDDIMScheduler. @a-r-r-o-w Can you check this issue.

a-r-r-o-w · 2025-02-23T14:04:53Z

The DDIMInverseScheduler class does not provide a method for linspace, and an error is reported. This method is available in CogVideoXDDIMScheduler.

DDIMInverseScheduler is only compatible with models that use DDIMScheduler. Since we're using a slightly modified objective in CogVideoX (which is why we needed the separate CogVideoXDDIMScheduler), the corresponding inverse scheduler will be different but is not implemented in diffusers

zRzRzRzRzRzRzR · 2025-02-23T14:17:33Z

I tried to increase it this way, but the effect is not right
This is the original video:

output.mp4

output_reconstruction.mp4:

output_reconstruction.mp4

output_inversion.mp4:

output_inversion.mp4

For 5B, this is correct

LittleNyima · 2025-02-23T16:31:21Z

I tried to increase it this way, but the effect is not right This is the original video:

output.mp4
output_reconstruction.mp4:

output_reconstruction.mp4
output_inversion.mp4:

output_inversion.mp4
For 5B, this is correct

I guess this is likely an issue about RoPE, which is not applied in the 2B model.

zRzRzRzRzRzRzR · 2025-02-24T06:18:23Z

I tried to increase it this way, but the effect is not right This is the original video:
output.mp4
output_reconstruction.mp4:
output_reconstruction.mp4
output_inversion.mp4:
output_inversion.mp4
For 5B, this is correct

I guess this is likely an issue about RoPE, which is not applied in the 2B model.

image_rotary_emb does not exist in 2B, and the code does not perform the job of rope.

zRzRzRzRzRzRzR

The reference order of the record needs to be modified.

inference/ddim_inversion.py

LittleNyima · 2025-02-24T08:16:24Z

I tried to increase it this way, but the effect is not right This is the original video:
output.mp4
output_reconstruction.mp4:
output_reconstruction.mp4
output_inversion.mp4:
output_inversion.mp4
For 5B, this is correct

I guess this is likely an issue about RoPE, which is not applied in the 2B model.

image_rotary_emb does not exist in 2B, and the code does not perform the job of rope.

Yes, that's exactly what I mean. RoPE is important for the control process, and perhaps a similar mechanism is also needed in the 2B model to make the model aware of spatial relationships.

zRzRzRzRzRzRzR · 2025-02-26T05:54:29Z

I think we can write at the Code beginning that the model does not support CogVideoX-2B, and for the other parts, I think we can merge them first.

LittleNyima · 2025-02-26T06:00:08Z

I think we can write at the Code beginning that the model does not support CogVideoX-2B, and for the other parts, I think we can merge them first.

I agree with this. In the past two days, I have tried to add support for the 2B model, but I have not obtained satisfactory results. I would determine whether the model supports RoPE during inference, and if not, an error will be raised with a description message.

LittleNyima · 2025-02-26T07:58:19Z

@zRzRzRzRzRzRzR The import order is fixed, and an exception is now raised if RoPE is not supported. I think this script is now ready for merging.

LittleNyima added 3 commits February 18, 2025 09:50

Initialize DDIM Inversion script

dd76b2b

Implement an unverified version that should be further tested

58d66c8

stable version

250a0bc

LittleNyima marked this pull request as ready for review February 20, 2025 13:46

LittleNyima changed the title ~~[WIP] Implement DDIM Inversion for CogVideoX~~ Implement DDIM Inversion for CogVideoX Feb 20, 2025

zRzRzRzRzRzRzR reviewed Feb 23, 2025

View reviewed changes

inference/ddim_inversion.py Outdated Show resolved Hide resolved

inference/ddim_inversion.py Show resolved Hide resolved

zRzRzRzRzRzRzR added the good first issue Good for newcomers label Feb 23, 2025

zRzRzRzRzRzRzR assigned LittleNyima, zRzRzRzRzRzRzR and OleehyO Feb 23, 2025

make the style of argparser consistent with repo

e0bf395

zRzRzRzRzRzRzR reviewed Feb 24, 2025

View reviewed changes

inference/ddim_inversion.py Show resolved Hide resolved

inference/ddim_inversion.py Outdated Show resolved Hide resolved

zRzRzRzRzRzRzR mentioned this pull request Feb 25, 2025

Work plan and enhancement / 工作计划和用户诉求 #194

Open

LittleNyima and others added 2 commits February 26, 2025 15:22

Merge branch 'THUDM:main' into feature/ddim-inversion

d6bb910

fix import order and deprecate for CVX 2B models

2c33c09

zRzRzRzRzRzRzR approved these changes Feb 26, 2025

View reviewed changes

zRzRzRzRzRzRzR merged commit eb66c9c into THUDM:main Feb 27, 2025

LittleNyima deleted the feature/ddim-inversion branch February 27, 2025 05:29

LittleNyima mentioned this pull request Mar 4, 2025

Add CogVideoX DDIM Inversion to Community Pipelines huggingface/diffusers#10956

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DDIM Inversion for CogVideoX #709

Implement DDIM Inversion for CogVideoX #709

LittleNyima commented Feb 18, 2025

LittleNyima commented Feb 20, 2025

zRzRzRzRzRzRzR commented Feb 22, 2025 •

edited

Loading

LittleNyima commented Feb 22, 2025

a-r-r-o-w commented Feb 22, 2025 •

edited

Loading

LittleNyima commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 23, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Feb 23, 2025 •

edited

Loading

a-r-r-o-w commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 23, 2025 •

edited

Loading

LittleNyima commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 24, 2025

zRzRzRzRzRzRzR left a comment

LittleNyima commented Feb 24, 2025

zRzRzRzRzRzRzR commented Feb 26, 2025 •

edited

Loading

LittleNyima commented Feb 26, 2025

LittleNyima commented Feb 26, 2025

Implement DDIM Inversion for CogVideoX #709

Implement DDIM Inversion for CogVideoX #709

Conversation

LittleNyima commented Feb 18, 2025

LittleNyima commented Feb 20, 2025

zRzRzRzRzRzRzR commented Feb 22, 2025 • edited Loading

LittleNyima commented Feb 22, 2025

a-r-r-o-w commented Feb 22, 2025 • edited Loading

LittleNyima commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 23, 2025 • edited Loading

zRzRzRzRzRzRzR commented Feb 23, 2025 • edited Loading

a-r-r-o-w commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 23, 2025 • edited Loading

LittleNyima commented Feb 23, 2025

zRzRzRzRzRzRzR commented Feb 24, 2025

zRzRzRzRzRzRzR left a comment

Choose a reason for hiding this comment

LittleNyima commented Feb 24, 2025

zRzRzRzRzRzRzR commented Feb 26, 2025 • edited Loading

LittleNyima commented Feb 26, 2025

LittleNyima commented Feb 26, 2025

zRzRzRzRzRzRzR commented Feb 22, 2025 •

edited

Loading

a-r-r-o-w commented Feb 22, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Feb 23, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Feb 23, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Feb 23, 2025 •

edited

Loading

zRzRzRzRzRzRzR commented Feb 26, 2025 •

edited

Loading