Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with unwrapping modules with accelerate #963

Merged
merged 1 commit into from
May 16, 2023

Conversation

BenjaminBossan
Copy link
Collaborator

There were a few issues at once that are now being fixed:

  • When unwrapping, pass keep_fp32_wrapper=True, or else the modules are not completely unwrapped and could not be pickled. The test that assumed pickling would fail is now passing.
  • Only net.module_ was unwrapped, but there can be other modules and criteria. Those are now also unwrapped.
  • In some circumstances, unwrapping is undesired. For instance, when users want to continue training or benefit from AMP during inference. Therefore, an option was added to prevent automatic unwrapping: set unwrap_after_train=False.

Note: This first fix could also work for bf16, but I cannot test it on my machine, so the test still assumes a PicklingError to be raised.

There were a few issues at once that are now being fixed:

- When unwrapping, pass keep_fp32_wrapper=True, or else the modules are
  not completely unwrapped and could not be pickled.
- Only net.module_ was unwrapped, but there can be other modules and
  criteria. Those are now also unwrapped.
- In some circumstances, unwrapping is undesired. For instance, when
  users want to continue training or benefit from AMP during inference.
  Therefore, an option was added to prevent automatic unwrapping:
  set unwrap_after_train=False.
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. LGTM

@thomasjpfan thomasjpfan merged commit a218ebc into master May 16, 2023
@thomasjpfan thomasjpfan deleted the accelerate-correctly-unwrap-modules branch May 16, 2023 14:59
@thomasjpfan thomasjpfan mentioned this pull request May 16, 2023
BenjaminBossan added a commit that referenced this pull request May 17, 2023
Preparation for release of version 0.13.0

Release text:

The new skorch release is here and it has some changes that will be exiting for
some users.

- First of all, you may have heard of the [PyTorch 2.0
  release](https://pytorch.org/get-started/pytorch-2.0/), which includes the
  option to compile the PyTorch module for better runtime performance. This
  skorch release allows you to pass `compile=True` when initializing the net to
  enable compilation.
- Support for training on multiple GPUs with the help of the
  [`accelerate`](https://huggingface.co/docs/accelerate/index) package has been
  improved by fixing some bugs and providing a dedicated [history
  class](https://skorch.readthedocs.io/en/latest/user/history.html#distributed-history).
  Our documentation contains more information on [what to consider when training
  on multiple
  GPUs](https://skorch.readthedocs.io/en/latest/user/huggingface.html#caution-when-using-a-multi-gpu-setup).
- If you have ever been frustrated with your neural net not training properly,
  you know how hard it can be to discover the underlying issue. Using the new
  [`SkorchDoctor`](https://skorch.readthedocs.io/en/latest/helper.html#skorch.helper.SkorchDoctor)
  class will simplify the diagnosis of underlying issues. Take a look at the
  accompanying
  [notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb)

Apart from that, a few bugs have been fixed and the included notebooks have been
updated to properly install requirements on Google Colab.

We are grateful for external contributors, many thanks to:

- Kshiteej K (kshitij12345)
- Muhammad Abdullah (abdulasiraj)
- Royi (RoyiAvital)
- Sawradip Saha (sawradip)
- y10ab1 (y10ab1)

Find below the list of all changes since v0.12.1 below:

### Added
- Add support for compiled PyTorch modules using the `torch.compile` function,
  introduced in [PyTorch 2.0
  release](https://pytorch.org/get-started/pytorch-2.0/), which can greatly
  improve performance on new GPU architectures; to use it, initialize your net
  with the `compile=True` argument, further compilation arguments can be
  specified using the dunder notation, e.g. `compile__dynamic=True`
- Add a class
  [`DistributedHistory`](https://skorch.readthedocs.io/en/latest/history.html#skorch.history.DistributedHistory)
  which should be used when training in a multi GPU setting (#955)
- `SkorchDoctor`: A helper class that assists in understanding and debugging the
  neural net training, see [this
  notebook](https://nbviewer.org/github/skorch-dev/skorch/blob/master/notebooks/Skorch_Doctor.ipynb)
  (#912)
- When using `AccelerateMixin`, it is now possible to prevent unwrapping of the
  modules by setting `unwrap_after_train=True` (#963)

### Fixed
- Fixed install command to work with recent changes in Google Colab (#928)
- Fixed a couple of bugs related to using non-default modules and criteria
  (#927)
- Fixed a bug when using `AccelerateMixin` in a multi-GPU setup (#947)
- `_get_param_names` returns a list instead of a generator so that subsequent
  error messages return useful information instead of a generator `repr` string
  (#925)
- Fixed a bug that caused modules to not be sufficiently unwrapped at the end of
  training when using `AccelerateMixin`, which could prevent them from being
  pickleable (#963)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants