Trying to support Ruyi-Mini-7B #18

cellzero · 2025-01-03T09:00:50Z

Great work! I can't wait to use TeaCache to accelerate diffusion models.

I'm currently trying to integrate TeaCache with Ruyi-Models. However, I think I might have made a mistake, as I'm not getting a good L1 difference visualization according to the paper. Here are the results I obtained.

Time Embs is the original embeddings of the timestep.
Time With Conditions is the timestep embeddings added with condition embeddings (such as text and image).
Time Modulated Inputs is the values after transformer.block[0].norm1.
Transformer Inputs is the input of transformer.
Transformer Outputs is the output just after all transformer blocks, before final layer.
Transformer Final Outputs is the final output of transformer.

I've tried several timestep embeddings and model outputs, but it seems that the timestep embeddings don't have a strong correlation with Ruyi's model output. Could you help me identify which timestep embedding and model output should be used to achieve better correlation? Thank you!

LiewFeng · 2025-01-03T09:32:33Z

Thank you for your interest in our work.

It seems that Ruyi-Models uses similar transformer to HunyuanVideo. You may try to leverage the coeff in TeaCache4HunyuanVideo.
You may alos refer to the implementation of teacache_forward to select the features to draw the picture ( and calculate the coeff), e.g., modulated_inp and residual_output.

Looking forward to your feedback and PR to support Ruyi-Models.

cellzero · 2025-01-03T09:57:14Z

Yes, I have referred to the implementation of TeaCache4HunyuanVideo, but the model structure is different, so I'm uncertain whether the timestep embeddings and model outputs I used are appropriate.

Regarding the results shown above, I think they may not demonstrate a strong correlation. Or is the correlation presented above acceptable for using TeaCache?

LiewFeng · 2025-01-03T11:34:59Z

The output for visualization is the residual output, i.e., (output hidden_states - input hidden_states), as shown in the link in the last reply.
The normed output hidden_states in Ruyi-Models is suggested. You may also try the output hidden_states before normalization.
Time Embs shows a not bad correlation. You can reduce the estimation error with rescaling.
Make sure you are using the relative l1 distance instead of l1 distance.

KivenJonathan · 2025-01-03T12:51:48Z

The output for visualization is the residual output, i.e., (output hidden_states - input hidden_states), as shown in the link in the last reply.

The normed output hidden_states in Ruyi-Models is suggested. You may also try the output hidden_states before normalization.

Time Embs shows a not bad correlation. You can reduce the estimation error with rescaling.

Make sure you are using the relative l1 distance instead of l1 distance.

Hi, thank you for your work on this project!

I have a question regarding the computation of coefficients. If we adopt the coefficient computed from the normed output hidden_states (using L1 relative residue between steps) and compute the coefficient with the time-modulated inputs (using L1 relative values after the first block of norm1), does this imply that the cached residue output should be updated based on the normed output, rather than the transformer outputs before the final layer?

Additionally, are there any suggested metrics or evidence that could help verify whether the coefficients are being computed correctly and derived from the appropriate modulated input and output of the transformer blocks?

LiewFeng · 2025-01-03T13:09:09Z

Hi, @KivenJonathan . Thank you for your interest in our work. I don't get your first point. What's the difference between 'normed output' and ' transformer outputs before the final layer'? In my understanding, they are the same.

You can plot with the rescaled data to check it.

cellzero · 2025-01-03T16:19:03Z

The output for visualization is the residual output, i.e., (output hidden_states - input hidden_states), as shown in the link in the last reply.

The normed output hidden_states in Ruyi-Models is suggested. You may also try the output hidden_states before normalization.

Time Embs shows a not bad correlation. You can reduce the estimation error with rescaling.

Make sure you are using the relative l1 distance instead of l1 distance.

Oh, I see where I made a mistake earlier. I used the wrong input and output values for visualization, and I've corrected that now. However, the visualization still doesn't seem particularly ideal.

In the visualization, I used the hidden_states before any blocks as the input, the hidden_states after all the blocks minus input as the residual output, and the normed hidden_states minus input as the residual norm output.

I've searched extensively and made several attempts, but I still haven't identified the issue. Sadly.

Here is the code for calculating the L1 Rel Distance:

l1_distance = torch.abs(tensor1 - tensor2).mean()
norm = torch.abs(tensor1).mean()
relative_l1_distance = l1_distance / norm
return relative_l1_distance.to(torch.float32)

I plan to continue experimenting with different inputs to see if there are any changes.

LiewFeng · 2025-01-04T01:53:47Z

According to the visualization, you may use 'Time with Conditions'. It's hard to get them equal. It's okay with similar trend, increasing or decreasing at the same time. Polynomial fitting helps to reduce the estimation error.

LiewFeng · 2025-01-04T01:59:07Z

By the way, tensor 1 should be the feature in the last timestep and tensore 2 is the feature in the current timestep.

cellzero · 2025-01-04T02:20:33Z

Thank you for your quick reply. I've generated some additional visualizations based on different inputs, and the trends appear to be similar.

Interestingly, both the residual output and the residual norm output tend to fluctuate during the initial few steps, making it difficult to align with the timestep embeddings. Perhaps I could try enforcing the first few steps without caching. Anyway, I think I should try polynomial fitting first.

As for tensor1 and tensor2, I used a for step in range(1, 25) loop, where tensor1 corresponds to the step value and tensor2 corresponds to the step + 1 value. I think this is essentially the same thing for visualization.

LiewFeng · 2025-01-04T02:52:17Z

Sounds good. Looking forward to your final result.

cellzero · 2025-01-09T06:28:45Z

Collecting data for polynomial fitting does take a considerable amount of time. Currently, I have generated about 100 videos and collected the L1 Rel Distances. After performing polynomial fitting, it appears that the Time Modulated Inputs and Transformer Residual Norm Output match the best.

I then used that value to integrate TeaCache into Ruyi. Sometimes, the generated videos show no obvious differences, while at other times, the videos are acceptable but do exhibit some notable differences. I think this could be a normal occurrence; is that correct?

Additionally, I would like to confirm one more thing. When applying polynomial fitting, the input is the L1 Rel Distance of Time Modulated Inputs, and the output is the L1 Rel Distances of Transformer Residual Norm Output. I hope I haven’t made any mistakes.

Thank you.

LiewFeng · 2025-01-09T06:48:38Z

The difference depends on the extent of speeding up. Speeding up less than 1.6x should works well under many models and prompts.

The output can also be the L1 Rel Distances of Transformer Residual Output before norm.

LiewFeng · 2025-01-09T06:51:30Z

Differnce is acceptable if the visual quality doesn't degrade much.

cellzero · 2025-01-09T07:09:57Z

Yes, I think the visual quality is almost the same, although there are some differences. I think this might be caused by the inconsistency between Time Modulated Inputs and Transformer Residual Norm Output of first several steps.

I also tried using the L1 relative distances of the Transformer Residual Output before normalization as the polynomial fitting output. It appears that the Time Modulated Inputs and Transformer Residual Norm Output match better than that (more closely aligned and with less noise). Therefore, I used that to test the generated video results.

I expect to finish this work and close this issue by the end of the week if everything goes fine.

Thank you for your help.

cellzero · 2025-01-15T07:10:25Z

Finally, I have organized the code and submitted it to the Ruyi-Models GitHub repository. Now, TeaCache can be used directly in Ruyi, and it's really good to generate videos faster.

Therefore, I'm wondering if I should still submit a Pull Request to the TeaCache repository, as it might be somewhat redundant?

LiewFeng · 2025-01-15T07:51:52Z

Congratulations! It's okay to keep it in Ruyi-Models. I will update the README.

LiewFeng closed this as completed Jan 15, 2025

LiewFeng mentioned this issue Feb 21, 2025

parameters of the polynomial equation #38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to support Ruyi-Mini-7B #18

Trying to support Ruyi-Mini-7B #18

cellzero commented Jan 3, 2025

LiewFeng commented Jan 3, 2025

cellzero commented Jan 3, 2025

LiewFeng commented Jan 3, 2025

KivenJonathan commented Jan 3, 2025

LiewFeng commented Jan 3, 2025

cellzero commented Jan 3, 2025

LiewFeng commented Jan 4, 2025

LiewFeng commented Jan 4, 2025

cellzero commented Jan 4, 2025

LiewFeng commented Jan 4, 2025

cellzero commented Jan 9, 2025

LiewFeng commented Jan 9, 2025

LiewFeng commented Jan 9, 2025

cellzero commented Jan 9, 2025 •

edited

Loading

cellzero commented Jan 15, 2025

LiewFeng commented Jan 15, 2025

Trying to support Ruyi-Mini-7B #18

Trying to support Ruyi-Mini-7B #18

Comments

cellzero commented Jan 3, 2025

LiewFeng commented Jan 3, 2025

cellzero commented Jan 3, 2025

LiewFeng commented Jan 3, 2025

KivenJonathan commented Jan 3, 2025

LiewFeng commented Jan 3, 2025

cellzero commented Jan 3, 2025

LiewFeng commented Jan 4, 2025

LiewFeng commented Jan 4, 2025

cellzero commented Jan 4, 2025

LiewFeng commented Jan 4, 2025

cellzero commented Jan 9, 2025

LiewFeng commented Jan 9, 2025

LiewFeng commented Jan 9, 2025

cellzero commented Jan 9, 2025 • edited Loading

cellzero commented Jan 15, 2025

LiewFeng commented Jan 15, 2025

cellzero commented Jan 9, 2025 •

edited

Loading