-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to support Ruyi-Mini-7B #18
Comments
Thank you for your interest in our work.
Looking forward to your feedback and PR to support Ruyi-Models. |
Yes, I have referred to the implementation of TeaCache4HunyuanVideo, but the model structure is different, so I'm uncertain whether the timestep embeddings and model outputs I used are appropriate. Regarding the results shown above, I think they may not demonstrate a strong correlation. Or is the correlation presented above acceptable for using TeaCache? |
|
Hi, thank you for your work on this project! I have a question regarding the computation of coefficients. If we adopt the coefficient computed from the normed output hidden_states (using L1 relative residue between steps) and compute the coefficient with the time-modulated inputs (using L1 relative values after the first block of norm1), does this imply that the cached residue output should be updated based on the normed output, rather than the transformer outputs before the final layer? Additionally, are there any suggested metrics or evidence that could help verify whether the coefficients are being computed correctly and derived from the appropriate modulated input and output of the transformer blocks? |
Hi, @KivenJonathan . Thank you for your interest in our work. I don't get your first point. What's the difference between 'normed output' and ' transformer outputs before the final layer'? In my understanding, they are the same. You can plot with the rescaled data to check it. |
Oh, I see where I made a mistake earlier. I used the wrong input and output values for visualization, and I've corrected that now. However, the visualization still doesn't seem particularly ideal. In the visualization, I used the hidden_states before any blocks as the input, the hidden_states after all the blocks minus input as the residual output, and the normed hidden_states minus input as the residual norm output. I've searched extensively and made several attempts, but I still haven't identified the issue. Sadly. Here is the code for calculating the L1 Rel Distance: l1_distance = torch.abs(tensor1 - tensor2).mean()
norm = torch.abs(tensor1).mean()
relative_l1_distance = l1_distance / norm
return relative_l1_distance.to(torch.float32) I plan to continue experimenting with different inputs to see if there are any changes. |
According to the visualization, you may use 'Time with Conditions'. It's hard to get them equal. It's okay with similar trend, increasing or decreasing at the same time. Polynomial fitting helps to reduce the estimation error. |
By the way, tensor 1 should be the feature in the last timestep and tensore 2 is the feature in the current timestep. |
Thank you for your quick reply. I've generated some additional visualizations based on different inputs, and the trends appear to be similar. Interestingly, both the residual output and the residual norm output tend to fluctuate during the initial few steps, making it difficult to align with the timestep embeddings. Perhaps I could try enforcing the first few steps without caching. Anyway, I think I should try polynomial fitting first. As for tensor1 and tensor2, I used a |
Sounds good. Looking forward to your final result. |
Collecting data for polynomial fitting does take a considerable amount of time. Currently, I have generated about 100 videos and collected the L1 Rel Distances. After performing polynomial fitting, it appears that the I then used that value to integrate TeaCache into Ruyi. Sometimes, the generated videos show no obvious differences, while at other times, the videos are acceptable but do exhibit some notable differences. I think this could be a normal occurrence; is that correct? Additionally, I would like to confirm one more thing. When applying polynomial fitting, the input is the L1 Rel Distance of Thank you. |
The difference depends on the extent of speeding up. Speeding up less than 1.6x should works well under many models and prompts. The output can also be the L1 Rel Distances of Transformer Residual Output before norm. |
Differnce is acceptable if the visual quality doesn't degrade much. |
Yes, I think the visual quality is almost the same, although there are some differences. I think this might be caused by the inconsistency between I also tried using the L1 relative distances of the I expect to finish this work and close this issue by the end of the week if everything goes fine. Thank you for your help. |
Finally, I have organized the code and submitted it to the Ruyi-Models GitHub repository. Now, TeaCache can be used directly in Ruyi, and it's really good to generate videos faster. Therefore, I'm wondering if I should still submit a Pull Request to the TeaCache repository, as it might be somewhat redundant? |
Congratulations! It's okay to keep it in Ruyi-Models. I will update the README. |
Great work! I can't wait to use TeaCache to accelerate diffusion models.
I'm currently trying to integrate TeaCache with Ruyi-Models. However, I think I might have made a mistake, as I'm not getting a good L1 difference visualization according to the paper. Here are the results I obtained.
I've tried several timestep embeddings and model outputs, but it seems that the timestep embeddings don't have a strong correlation with Ruyi's model output. Could you help me identify which timestep embedding and model output should be used to achieve better correlation? Thank you!
The text was updated successfully, but these errors were encountered: