-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DDIM Inversion for CogVideoX #709
Implement DDIM Inversion for CogVideoX #709
Conversation
This link is a brief introduction to the purpose and implementation of this PR. |
So happy to see your contribution! |
My development is based on the 5B variant, and it should work on the 2B model as well. I will test it on various base models in the coming days. |
@LittleNyima Wow, this is super cool 🔥🔥🔥 Would you like to also showcase this in community pipelines? |
@a-r-r-o-w Thanks for your invitation, and I totally would! |
I can try it in the next two days. If it runs on both 5B and 2B, I will merge this code and modify the readme to add a reference link. |
Added that the line of code you mentioned has been successfully run.
However, this does not work with CogVideoX-2b, because, diffusers |
DDIMInverseScheduler is only compatible with models that use DDIMScheduler. Since we're using a slightly modified objective in CogVideoX (which is why we needed the separate CogVideoXDDIMScheduler), the corresponding inverse scheduler will be different but is not implemented in diffusers |
I tried to increase it this way, but the effect is not right output.mp4output_reconstruction.mp4: output_reconstruction.mp4output_inversion.mp4: output_inversion.mp4For 5B, this is correct |
I guess this is likely an issue about RoPE, which is not applied in the 2B model. |
image_rotary_emb does not exist in 2B, and the code does not perform the job of rope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference order of the record needs to be modified.
Yes, that's exactly what I mean. RoPE is important for the control process, and perhaps a similar mechanism is also needed in the 2B model to make the model aware of spatial relationships. |
I think we can write at the Code beginning that the model does not support CogVideoX-2B, and for the other parts, I think we can merge them first. |
I agree with this. In the past two days, I have tried to add support for the 2B model, but I have not obtained satisfactory results. I would determine whether the model supports RoPE during inference, and if not, an error will be raised with a description message. |
@zRzRzRzRzRzRzR The import order is fixed, and an exception is now raised if RoPE is not supported. I think this script is now ready for merging. |
There have been some attempts at performing DDIM Inversion on CogVideoX (#397 #689), but as far as I know, these efforts have not yielded very good results. Recently, I successfully implemented DDIM Inversion on CogVideoX, and the results are as follows.
Inverse and reconstruct with an empty prompt:
1.mp4
Inverse and reconstruct with an editing prompt:
2.mp4
I am currently organizing the method introduction and related code, and these will be updated in this PR soon.