Fix issues with unwrapping modules with accelerate #963

BenjaminBossan · 2023-05-09T14:36:24Z

There were a few issues at once that are now being fixed:

When unwrapping, pass keep_fp32_wrapper=True, or else the modules are not completely unwrapped and could not be pickled. The test that assumed pickling would fail is now passing.
Only net.module_ was unwrapped, but there can be other modules and criteria. Those are now also unwrapped.
In some circumstances, unwrapping is undesired. For instance, when users want to continue training or benefit from AMP during inference. Therefore, an option was added to prevent automatic unwrapping: set unwrap_after_train=False.

Note: This first fix could also work for bf16, but I cannot test it on my machine, so the test still assumes a PicklingError to be raised.

There were a few issues at once that are now being fixed: - When unwrapping, pass keep_fp32_wrapper=True, or else the modules are not completely unwrapped and could not be pickled. - Only net.module_ was unwrapped, but there can be other modules and criteria. Those are now also unwrapped. - In some circumstances, unwrapping is undesired. For instance, when users want to continue training or benefit from AMP during inference. Therefore, an option was added to prevent automatic unwrapping: set unwrap_after_train=False.

thomasjpfan

This makes sense. LGTM

Preparation for release of version 0.13.0 Release text: The new skorch release is here and it has some changes that will be exiting for some users. - First of all, you may have heard of the [PyTorch 2.0 release](https://pytorch.org/get-started/pytorch-2.0/), which includes the option to compile the PyTorch module for better runtime performance. This skorch release allows you to pass `compile=True` when initializing the net to enable compilation. - Support for training on multiple GPUs with the help of the [`accelerate`](https://huggingface.co/docs/accelerate/index) package has been improved by fixing some bugs and providing a dedicated [history class](https://skorch.readthedocs.io/en/latest/user/history.html#distributed-history). Our documentation contains more information on [what to consider when training on multiple GPUs](https://skorch.readthedocs.io/en/latest/user/huggingface.html#caution-when-using-a-multi-gpu-setup). - If you have ever been frustrated with your neural net not training properly, you know how hard it can be to discover the underlying issue. Using the new [`SkorchDoctor`](https://skorch.readthedocs.io/en/latest/helper.html#skorch.helper.SkorchDoctor) class will simplify the diagnosis of underlying issues. Take a look at the accompanying [notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb) Apart from that, a few bugs have been fixed and the included notebooks have been updated to properly install requirements on Google Colab. We are grateful for external contributors, many thanks to: - Kshiteej K (kshitij12345) - Muhammad Abdullah (abdulasiraj) - Royi (RoyiAvital) - Sawradip Saha (sawradip) - y10ab1 (y10ab1) Find below the list of all changes since v0.12.1 below: ### Added - Add support for compiled PyTorch modules using the `torch.compile` function, introduced in [PyTorch 2.0 release](https://pytorch.org/get-started/pytorch-2.0/), which can greatly improve performance on new GPU architectures; to use it, initialize your net with the `compile=True` argument, further compilation arguments can be specified using the dunder notation, e.g. `compile__dynamic=True` - Add a class [`DistributedHistory`](https://skorch.readthedocs.io/en/latest/history.html#skorch.history.DistributedHistory) which should be used when training in a multi GPU setting (#955) - `SkorchDoctor`: A helper class that assists in understanding and debugging the neural net training, see [this notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb) (#912) - When using `AccelerateMixin`, it is now possible to prevent unwrapping of the modules by setting `unwrap_after_train=True` (#963) ### Fixed - Fixed install command to work with recent changes in Google Colab (#928) - Fixed a couple of bugs related to using non-default modules and criteria (#927) - Fixed a bug when using `AccelerateMixin` in a multi-GPU setup (#947) - `_get_param_names` returns a list instead of a generator so that subsequent error messages return useful information instead of a generator `repr` string (#925) - Fixed a bug that caused modules to not be sufficiently unwrapped at the end of training when using `AccelerateMixin`, which could prevent them from being pickleable (#963)

BenjaminBossan added bug enhancement labels May 9, 2023

BenjaminBossan requested review from thomasjpfan and ottonemo May 9, 2023 15:05

thomasjpfan approved these changes May 16, 2023

View reviewed changes

thomasjpfan merged commit a218ebc into master May 16, 2023

thomasjpfan deleted the accelerate-correctly-unwrap-modules branch May 16, 2023 14:59

thomasjpfan mentioned this pull request May 16, 2023

Release 0.13.0 #961

Merged

BenjaminBossan mentioned this pull request May 25, 2023

Nets using accelerate cannot be pickled #898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with unwrapping modules with accelerate #963

Fix issues with unwrapping modules with accelerate #963

BenjaminBossan commented May 9, 2023

thomasjpfan left a comment •

edited

Loading

Fix issues with unwrapping modules with accelerate #963

Fix issues with unwrapping modules with accelerate #963

Conversation

BenjaminBossan commented May 9, 2023

thomasjpfan left a comment • edited Loading

Choose a reason for hiding this comment

thomasjpfan left a comment •

edited

Loading