Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[examples] add controlnet sd3 example #9249

Merged
merged 11 commits into from
Sep 11, 2024

Conversation

DavyMorgan
Copy link
Contributor

@DavyMorgan DavyMorgan commented Aug 23, 2024

What does this PR do?

Fixes #8834

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@DavyMorgan
Copy link
Contributor Author

My implementation is based on the official examples for ControlNet and SD3 DreamBooth

@kadirnar
Copy link
Contributor

@DavyMorgan

Great. Are you going to write controlnet train code for Flux? I really need it.

@DavyMorgan DavyMorgan changed the title add controlnet sd3 example [examples] add controlnet sd3 example Aug 24, 2024
@DavyMorgan
Copy link
Contributor Author

@DavyMorgan

Great. Are you going to write controlnet train code for Flux? I really need it.

Hi, a nice implementation of controlnet flux with training scripts can be found at https://github.com/XLabs-AI/x-flux.

@kadirnar
Copy link
Contributor

@DavyMorgan
Great. Are you going to write controlnet train code for Flux? I really need it.

Hi, a nice implementation of controlnet flux with training scripts can be found at https://github.com/XLabs-AI/x-flux.

I have been using this library for 1 week and multi gpu is not working. I need multi gpu support to train large datasets.

@DavyMorgan
Copy link
Contributor Author

@yiyixuxu Could you take a look at this PR when you get a chance? It addresses issue #8834 and provides an example for controlnet+sd3. Thanks!

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Sep 4, 2024

ohh thanks for your PR!

@haofanwang @wangqixun would you be able to give this PR a review too?

@yiyixuxu yiyixuxu requested a review from sayakpaul September 4, 2024 01:24
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I have left a couple of comments.

Could you also share some results from your experiments?

And as @yiyixuxu mentioned, it'd be great to have this PR reviewed by @haofanwang @wangqixun as they were the first ones to have come up with SD3 ControlNets.

@xduzhangjiayu
Copy link
Contributor

xduzhangjiayu commented Sep 4, 2024

My implementation is based on the official examples for ControlNet and SD3 DreamBooth

Hi,
I tried run your script with mixed-precision=fp16, but got unexpected error in
model_pred = transformer( hidden_states=noisy_model_input, timestep=timesteps, encoder_hidden_states=prompt_embeds, pooled_projections=pooled_prompt_embeds, block_controlnet_hidden_states=control_block_res_samples, return_dict=False, )[0]
the error is RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
But the noisy_model_input is fp16 in the script, do you have any advice to solve this problem? thanks so much!

It can be fixed by following review.

@DavyMorgan DavyMorgan force-pushed the controlnet-sd3-example branch from 3360355 to d9bf0d2 Compare September 9, 2024 01:33
@DavyMorgan
Copy link
Contributor Author

DavyMorgan commented Sep 9, 2024

@sayakpaul @xduzhangjiayu Thank you very much for your kind reviews. I have updated the codes according to your comments and suggestions. I have also added two experimental images as results in README. Would you mind have another look?

@DavyMorgan
Copy link
Contributor Author

Thank you for working on this! I have left a couple of comments.

Could you also share some results from your experiments?

And as @yiyixuxu mentioned, it'd be great to have this PR reviewed by @haofanwang @wangqixun as they were the first ones to have come up with SD3 ControlNets.

Please find the result images at the bottom of README_sd3.md :)

@sayakpaul
Copy link
Member

Thanks for the changes @DavyMorgan! Let's also add a test similar to https://github.com/huggingface/diffusers/blob/main/examples/controlnet/test_controlnet.py?

@DavyMorgan
Copy link
Contributor Author

@sayakpaul
Copy link
Member

Could you run make style && make quality?

@DavyMorgan
Copy link
Contributor Author

Could you run make style && make quality?

@sayakpaul Yeah. I have run make style && make quality which fixes the import order and a few style issues. Thanks a lot!

@DavyMorgan
Copy link
Contributor Author

DavyMorgan commented Sep 9, 2024

In the Fast tests for PRs / PyTorch Example CPU tests (pull_request):

09/09/2024 13:20:09 - INFO - __main__ - Initializing controlnet weights from transformer
Traceback (most recent call last):
  File "/__w/diffusers/diffusers/examples/controlnet/train_controlnet_sd3.py", line 1415, in <module>
    main(args)
  File "/__w/diffusers/diffusers/examples/controlnet/train_controlnet_sd3.py", line 997, in main
    controlnet = SD3ControlNetModel.from_transformer(transformer)
  File "/__w/diffusers/diffusers/src/diffusers/models/controlnet_sd3.py", line 254, in from_transformer
    controlnet.transformer_blocks.load_state_dict(transformer.transformer_blocks.state_dict(), strict=False)
  File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ModuleList:
	size mismatch for 0.norm1_context.linear.weight: copying a param with shape torch.Size([64, 32]) from checkpoint, the shape in current model is torch.Size([192, 32]).
	size mismatch for 0.norm1_context.linear.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([192]).

@sayakpaul It seems that the hf-internal-testing/tiny-sd3-pipe has size mismatch with SD3ControlNetModel. Should I use "InstantX/SD3-Controlnet-Canny" in the test script, while I find all the test codes use pretrained models from hf-internal-testing?

@sayakpaul
Copy link
Member

We need to use a smaller ControlNet model. We should be able to initialize that from the transformer of hf-internal-testing/tiny-sd3-pipe, as followed in the https://github.com/huggingface/diffusers/blob/main/examples/controlnet/test_controlnet.py script.

@DavyMorgan
Copy link
Contributor Author

DavyMorgan commented Sep 10, 2024

We need to use a smaller ControlNet model. We should be able to initialize that from the transformer of hf-internal-testing/tiny-sd3-pipe, as followed in the https://github.com/huggingface/diffusers/blob/main/examples/controlnet/test_controlnet.py script.

@sayakpaul Thanks. I have updated the test to leverage the smaller SD3 model used in the official test script in

@DavyMorgan
Copy link
Contributor Author

I see. I have also added a tiny controlnet model based on the official test script of controlnet-sd3. Now the example test passes on my local machine. @sayakpaul

@DavyMorgan
Copy link
Contributor Author

@sayakpaul It seems that the failure in fast pipeline test is unrelated to this PR. WDYT?

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions!

import torch

base_model_path = "stabilityai/stable-diffusion-3-medium-diffusers"
controlnet_path = "sd3-controlnet-out/checkpoint-6500/controlnet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a local path. Can we update this to a checkpoint on the Hub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure, I will upload my checkpoint to the hub.

Comment on lines +148 to +151
| | |
|-------------------|:-------------------------:|
|| pale golden rod circle with old lace background |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![pale golden rod circle with old lace background](https://huggingface.co/datasets/DavyMorgan/sd3-controlnet-results/resolve/main/step-6500.png) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there are artifacts in the output image but it could also be because of overfitting. Any comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can include more sample images for validation.

Comment on lines +71 to +79
def image_grid(imgs, rows, cols):
assert len(imgs) == rows * cols

w, h = imgs[0].size
grid = Image.new("RGB", size=(cols * w, rows * h))

for i, img in enumerate(imgs):
grid.paste(img, box=(i % cols * w, i // cols * h))
return grid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the make_image_grid() utility function from diffusers.utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks.

@sayakpaul sayakpaul merged commit c002731 into huggingface:main Sep 11, 2024
8 checks passed
@xduzhangjiayu
Copy link
Contributor

xduzhangjiayu commented Sep 13, 2024 via email

@DavyMorgan
Copy link
Contributor Author

Yes,I'm sure it is text encoders that occupy the memory。After computing the embedding,text encoders are still in the GPU memory. (That means clear_objs_and_retain_memory can't work).I think it may related to my accelerate config. I will check later. Have you tried training with a large dataset(at least 1000k image)? I think it may needs lots of CPU RAM when pre-computing the text embedding.
---- Replied Message ---- | From | Yu @.> | | Date | 09/13/2024 17:43 | | To | @.> | | Cc | @.>@.> | | Subject | Re: [huggingface/diffusers] [examples] add controlnet sd3 example (PR #9249) | @DavyMorgan commented on this pull request. In examples/controlnet/train_controlnet_sd3.py:

  • )
    • train_dataset = make_train_dataset(args, tokenizer_one, tokenizer_two, tokenizer_three, accelerator) + + train_dataloader = torch.utils.data.DataLoader( + train_dataset, + shuffle=True, + collate_fn=collate_fn, + batch_size=args.train_batch_size, + num_workers=args.dataloader_num_workers, + ) + + tokenizers = [tokenizer_one, tokenizer_two, tokenizer_three] + text_encoders = [text_encoder_one, text_encoder_two, text_encoder_three] + + def compute_text_embeddings(prompt, text_encoders, tokenizers): Are you sure it is the text encoders that occupy the memory? The GPU memory can be other models, as the vae, transformer, and controlnet models are still in memory. Besides, as we periodically run the validation, the text encoders will also be loaded every validation_steps steps. From my experiments, previously I need to separate the training and validation in two distinct GPUS, and after the above update I only need one GPU to run the script. During training, the text embeddings from text encoders are in memory, though there is a cached one in disk such that it will not compute them in your next run as long as the configs are the same. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

I only tested the fill50k dataset. As we use the datasets library, I believe it will handle the memory/disk issue well. You can check https://huggingface.co/docs/datasets/en/about_arrow#memory-mapping.

@xduzhangjiayu
Copy link
Contributor

Yes,I'm sure it is text encoders that occupy the memory。After computing the embedding,text encoders are still in the GPU memory. (That means clear_objs_and_retain_memory can't work).I think it may related to my accelerate config. I will check later. Have you tried training with a large dataset(at least 1000k image)? I think it may needs lots of CPU RAM when pre-computing the text embedding.
---- Replied Message ---- | From | Yu @.> | | Date | 09/13/2024 17:43 | | To | _@**._> | | Cc | _@.>@.**_> | | Subject | Re: [huggingface/diffusers] [examples] add controlnet sd3 example (PR #9249) | @DavyMorgan commented on this pull request. In examples/controlnet/train_controlnet_sd3.py:

  • )
    • train_dataset = make_train_dataset(args, tokenizer_one, tokenizer_two, tokenizer_three, accelerator) + + train_dataloader = torch.utils.data.DataLoader( + train_dataset, + shuffle=True, + collate_fn=collate_fn, + batch_size=args.train_batch_size, + num_workers=args.dataloader_num_workers, + ) + + tokenizers = [tokenizer_one, tokenizer_two, tokenizer_three] + text_encoders = [text_encoder_one, text_encoder_two, text_encoder_three] + + def compute_text_embeddings(prompt, text_encoders, tokenizers): Are you sure it is the text encoders that occupy the memory? The GPU memory can be other models, as the vae, transformer, and controlnet models are still in memory. Besides, as we periodically run the validation, the text encoders will also be loaded every validation_steps steps. From my experiments, previously I need to separate the training and validation in two distinct GPUS, and after the above update I only need one GPU to run the script. During training, the text embeddings from text encoders are in memory, though there is a cached one in disk such that it will not compute them in your next run as long as the configs are the same. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

I only tested the fill50k dataset. As we use the datasets library, I believe it will handle the memory/disk issue well. You can check https://huggingface.co/docs/datasets/en/about_arrow#memory-mapping.

OK, once again thank you very much for your reply !

@DavyMorgan DavyMorgan mentioned this pull request Oct 21, 2024
6 tasks
sayakpaul added a commit that referenced this pull request Dec 23, 2024
* add controlnet sd3 example

* add controlnet sd3 example

* update controlnet sd3 example

* add controlnet sd3 example test

* fix quality and style

* update test

* update test

---------

Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Will the training code of SD3 Controlnet be released?
6 participants