|
| 1 | +# ImageBART |
| 2 | +#### [NeurIPS 2021](https://nips.cc/) |
| 3 | + |
| 4 | + |
| 5 | +<br/> |
| 6 | +[Patrick Esser](https://github.com/pesser)\*, |
| 7 | +[Robin Rombach](https://github.com/rromb)\*, |
| 8 | +[Andreas Blattmann](https://github.com/ablattmann)\*, |
| 9 | +[Björn Ommer](https://ommer-lab.com/)<br/> |
| 10 | +\* equal contribution |
| 11 | + |
| 12 | +[arXiv](https://arxiv.org/abs/2108.08827) | [BibTeX](#bibtex) | [Poster](assets/imagebart_poster.pdf) |
| 13 | + |
| 14 | +## Requirements |
| 15 | +A suitable [conda](https://conda.io/) environment named `imagebart` can be created |
| 16 | +and activated with: |
| 17 | + |
| 18 | +``` |
| 19 | +conda env create -f environment.yaml |
| 20 | +conda activate imagebart |
| 21 | +``` |
| 22 | + |
| 23 | +## Get the Models |
| 24 | + |
| 25 | +We provide pretrained weights and hyperparameters for models trained on the following datasets: |
| 26 | + |
| 27 | +* FFHQ: |
| 28 | + * [4 scales, geometric noise schedule](https://ommer-lab.com/files/ffhq_4_scales_geometric.zip): `wget -c https://ommer-lab.com/files/ffhq_4_scales_geometric.zip` |
| 29 | + * [2 scales, custom noise schedule](https://ommer-lab.com/files/ffhq_2_scales_custom.zip): `wget -c https://ommer-lab.com/files/ffhq_2_scales_custom.zip` |
| 30 | +* LSUN, 3 scales, custom noise schedules: |
| 31 | + * [Churches](https://ommer-lab.com/files/churches_3_scales.zip): `wget -c https://ommer-lab.com/files/churches_3_scales.zip` |
| 32 | + * [Bedrooms](https://ommer-lab.com/files/bedrooms_3_scales.zip): `wget -c https://ommer-lab.com/files/bedrooms_3_scales.zip` |
| 33 | + * [Cats](https://ommer-lab.com/files/cats_3_scales.zip): `wget -c https://ommer-lab.com/files/cats_3_scales.zip` |
| 34 | +* Class-conditional ImageNet: |
| 35 | + * [5 scales, custom noise schedule](https://ommer-lab.com/files/cin_5_scales_custom.zip): `wget -c https://ommer-lab.com/files/cin_5_scales_custom.zip` |
| 36 | + * [4 scales, geometric noise schedule](https://ommer-lab.com/files/cin_4_scales_geometric.zip): `wget -c https://ommer-lab.com/files/cin_4_scales_geometric.zip` |
| 37 | + |
| 38 | +Download the respective files and extract their contents to a directory `./models/`. |
| 39 | + |
| 40 | +Moreover, we provide all the required VQGANs as a .zip at [https://ommer-lab.com/files/vqgan.zip](https://ommer-lab.com/files/vqgan.zip), |
| 41 | +which contents have to be extracted to `./vqgan/`. |
| 42 | + |
| 43 | +## Get the Data |
| 44 | +Running the training configs or the [inpainting script](scripts/inpaint_imagebart.py) requires |
| 45 | +a dataset available locally. For ImageNet and FFHQ, see this repo's parent directory [taming-transformers](https://github.com/CompVis/taming-transformers). |
| 46 | +The LSUN datasets can be conveniently downloaded via the script available [here](https://github.com/fyu/lsun). |
| 47 | +We performed a custom split into training and validation images, and provide the corresponding filenames |
| 48 | +at [https://ommer-lab.com/files/lsun.zip](https://ommer-lab.com/files/lsun.zip). |
| 49 | +After downloading, extract them to `./data/lsun`. The beds/cats/churches subsets should |
| 50 | +also be placed/symlinked at `./data/lsun/bedrooms`/`./data/lsun/cats`/`./data/lsun/churches`, respectively. |
| 51 | + |
| 52 | +## Inference |
| 53 | + |
| 54 | +### Unconditional Sampling |
| 55 | +We provide a script for sampling from unconditional models trained on the LSUN-{bedrooms,bedrooms,cats}- and FFHQ-datasets. |
| 56 | + |
| 57 | +#### FFHQ |
| 58 | + |
| 59 | +On the FFHQ dataset, we provide two distinct pretrained models, one with a chain of length 4 and a geometric noise schedule as proposed by Sohl-Dickstein et al. [[1]](##References) , and another one with a chain of length 2 and a custom schedule. |
| 60 | +These models can be started with |
| 61 | +```shell script |
| 62 | +CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/ffhq/<config> |
| 63 | +``` |
| 64 | + |
| 65 | +#### LSUN |
| 66 | +For the models trained on the LSUN-datasets, use |
| 67 | +```shell script |
| 68 | +CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/lsun/<config> |
| 69 | +``` |
| 70 | + |
| 71 | +### Class Conditional Sampling on ImageNet |
| 72 | + |
| 73 | + |
| 74 | +To sample from class-conditional ImageNet models, use |
| 75 | +```shell script |
| 76 | +CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/sample_imagebart.py configs/sampling/imagenet/<config> |
| 77 | +``` |
| 78 | + |
| 79 | +### Image Editing with Unconditional Models |
| 80 | + |
| 81 | +We also provide a script for image editing with our unconditional models. For our FFHQ-model with geometric schedule this can be started with |
| 82 | +```shell script |
| 83 | +CUDA_VISIBLE_DEVICES=<gpu_id> streamlit run scripts/inpaint_imagebart.py configs/sampling/ffhq/ffhq_4scales_geometric.yaml |
| 84 | +``` |
| 85 | +resulting in samples similar to the following. |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | +## Training |
| 90 | +In general, there are two options for training the autoregressive transition probabilities of the |
| 91 | +reverse Markov chain: (i) train them jointly, taking into account a weighting of the |
| 92 | +individual scale contributions, or (ii) train them independently, which means that each |
| 93 | +training process optimizes a single transition and the scales must be stacked after training. |
| 94 | +We conduct most of our experiments using the latter option, but provide configurations for both cases. |
| 95 | + |
| 96 | +### Training Scales Independently |
| 97 | +For training scales independently, each transition requires a seperate optimization process, which can |
| 98 | +started via |
| 99 | + |
| 100 | +``` |
| 101 | +CUDA_VISIBLE_DEVICES=<gpu_id> python main.py --base configs/<data>/<config>.yaml -t --gpus 0, |
| 102 | +``` |
| 103 | + |
| 104 | +We provide training configs for a four scale training of FFHQ using a geometric schedule, |
| 105 | +a four scale geometric training on ImageNet and various three-scale experiments on LSUN. |
| 106 | +See also the overview of our [pretrained models](#get-the-models). |
| 107 | + |
| 108 | + |
| 109 | +### Training Scales Jointly |
| 110 | + |
| 111 | +For completeness, we also provide a config to run a joint training with 4 scales on FFHQ. |
| 112 | +Training can be started by running |
| 113 | + |
| 114 | +``` |
| 115 | +CUDA_VISIBLE_DEVICES=<gpu_id> python main.py --base configs/ffhq/ffhq_4_scales_joint-training.yaml -t --gpus 0, |
| 116 | +``` |
| 117 | + |
| 118 | + |
| 119 | +## Shout-Outs |
| 120 | +Many thanks to all who make their work and implementations publicly available. |
| 121 | +For this work, these were in particular: |
| 122 | + |
| 123 | +- The extremely clear and extensible encoder-decoder transformer implementations by [lucidrains](https://github.com/lucidrains): |
| 124 | +https://github.com/lucidrains/x-transformers |
| 125 | +- Emiel Hoogeboom et al's paper on multinomial diffusion and argmax flows: https://arxiv.org/abs/2102.05379 |
| 126 | + |
| 127 | + |
| 128 | + |
| 129 | + |
| 130 | +## References |
| 131 | + |
| 132 | +[1] Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S.. (2015). Deep Unsupervised Learning using Nonequilibrium Thermodynamics. <i>Proceedings of the 32nd International Conference on Machine Learning |
| 133 | + |
| 134 | +## Bibtex |
| 135 | + |
| 136 | +``` |
| 137 | +@article{DBLP:journals/corr/abs-2108-08827, |
| 138 | + author = {Patrick Esser and |
| 139 | + Robin Rombach and |
| 140 | + Andreas Blattmann and |
| 141 | + Bj{\"{o}}rn Ommer}, |
| 142 | + title = {ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive |
| 143 | + Image Synthesis}, |
| 144 | + journal = {CoRR}, |
| 145 | + volume = {abs/2108.08827}, |
| 146 | + year = {2021} |
| 147 | +} |
| 148 | +``` |
0 commit comments