Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration Guide #123

Merged
merged 25 commits into from
Oct 11, 2020
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
794622d
Start migration guide
araffin Jul 22, 2020
7cd7c0b
Update guide
araffin Jul 22, 2020
3050073
Merge branch 'master' into doc/migration
araffin Jul 31, 2020
b6490bc
Add comment on RMSpropTFLike plus PPO/A2C migrations
Miffyli Aug 3, 2020
5d30f4c
Merge branch 'master' into doc/migration
araffin Aug 3, 2020
c04673e
Merge branch 'master' into doc/migration
araffin Aug 6, 2020
82724ca
Merge branch 'master' into doc/migration
araffin Aug 23, 2020
e2b8d71
Merge branch 'master' into doc/migration
araffin Aug 23, 2020
dd2e2f9
Merge branch 'master' into doc/migration
araffin Aug 27, 2020
7c6c9f8
Merge branch 'master' into doc/migration
araffin Aug 29, 2020
ce8b68b
Merge branch 'master' into doc/migration
araffin Sep 1, 2020
7559c64
Merge branch 'master' into doc/migration
araffin Sep 20, 2020
a0435e3
Add note about set/get-parameters
Miffyli Sep 23, 2020
9087f83
Merge branch 'master' into doc/migration
araffin Sep 24, 2020
e713b40
Merge branch 'master' into doc/migration
araffin Sep 24, 2020
342fc7c
Merge branch 'master' into doc/migration
araffin Sep 26, 2020
002eae3
Merge branch 'master' into doc/migration
araffin Sep 30, 2020
3366e94
Merge branch 'master' into doc/migration
araffin Oct 3, 2020
59b9122
Update migration guide
araffin Oct 3, 2020
7ca9511
Merge branch 'master' into doc/migration
araffin Oct 4, 2020
ef478e5
Merge branch 'master' into doc/migration
araffin Oct 7, 2020
0befe28
Update changelog and readme
araffin Oct 7, 2020
397a52b
Merge branch 'doc/migration' of github.com:DLR-RM/stable-baselines3 i…
araffin Oct 7, 2020
b0c6e9b
Update doc + clean changelog
araffin Oct 9, 2020
0b42647
Address comments
araffin Oct 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ Planned features:
- [ ] TRPO


## Migration guide
## Migration guide: from Stable-Baselines (SB2) to Stable-Baselines3 (SB3)

**TODO: migration guide from Stable-Baselines in the documentation**
A migration guide from SB2 to SB3 can be found in the [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/migration.html).

## Documentation

Expand Down
2 changes: 1 addition & 1 deletion docs/guide/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ To install Stable Baselines3 with pip, execute:

pip install stable-baselines3[extra]

This includes an optional dependencies like Tensorboard, OpenCV or ```atari-py``` to train on atari games. If you do not need those, you can use:
This includes an optional dependencies like Tensorboard, OpenCV or ``atari-py`` to train on atari games. If you do not need those, you can use:

.. code-block:: bash

Expand Down
182 changes: 181 additions & 1 deletion docs/guide/migration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,184 @@ This is a guide to migrate from Stable-Baselines to Stable-Baselines3.

It also references the main changes.

**TODO**
.. warning::
This section is still a Work In Progress (WIP) Things might be added in the future before 1.0 release.



Overview
========

Overall Stable-Baselines3 (SB3) keeps the high-level API of Stable-Baselines (SB2).
Most of the changes are to ensure more consistency and are internal ones.
Because of the backend change, from Tensorflow to PyTorch, the internal code is much much readable and easy to debug
at the cost of some speed (dynamic graph vs static graph., see `Issue #90 <https://github.com/DLR-RM/stable-baselines3/issues/90>`_)
However, the algorithms were extensively benchmarked on Atari games and continuous control PyBullet envs
(see `Issue #48 <https://github.com/DLR-RM/stable-baselines3/issues/48>`_ and `Issue #49 <https://github.com/DLR-RM/stable-baselines3/issues/49>`_)
so you should not expect performance drop when switching from SB2 to SB3.

Breaking Changes
================

- SB3 requires python 3.6+ (instead of python 3.5+ for SB2)
- Dropped MPI support
- Dropped layer normalized policies (e.g. ``LnMlpPolicy``)
- Dropped parameter noise for DDPG and DQN
- PPO is now closer to the original implementation (no clipping of the value function by default), cf PPO section below
- orthogonal initialization is only used by A2C/PPO
- the features extractor (CNN extractor) is shared between policy and q-networks for DDPG/SAC/TD3 and only the policy loss used to update it (much faster)
- Tensorboard legacy logging was dropped in favor of having one logger for the terminal and Tensorboard (cf :ref:`Tensorboard integration <tensorboard>`)
- we dropped ACKTR/ACER support because of their complexity compared to simpler alternatives (PPO, SAC, TD3) performing as good.
- we dropped GAIL support as we are focusing on model-free RL only, you can however take a look at the `Imitation Learning Baseline Implementations <https://github.com/HumanCompatibleAI/imitation>`_
which are based on SB3.

TODO: change to deterministic predict for SAC/TD3

state api breaking changes and implementation differences (e.g. clip range ppo and renaming of parameters)

Moved Files
-----------

- ``bench/monitor.py`` -> ``common/monitor.py``
- ``logger.py`` -> ``common/logger.py``
- ``results_plotter.py`` -> ``common/results_plotter.py``

Utility functions are no longer exported from ``common`` module, you should import them with their absolute path, e.g.:

.. code-block:: python

from stable_baselines3.common.cmd_util import make_atari_env, make_vec_env
from stable_baselines3.common.utils import set_random_seed

instead of ``from stable_baselines3.common import make_atari_env``



Parameters Change and Renaming
------------------------------

Base-class (all algorithms)
^^^^^^^^^^^^^^^^^^^^^^^^^^^

- ``load_parameters`` -> ``set_parameters``

- ``get/set_parameters`` return a dictionary mapping object names
to their respective PyTorch tensors and other objects representing
their parameters, instead of simpler mapping of parameter name to
a NumPy array. These functions also return PyTorch tensors rather
than NumPy arrays.


Policies
^^^^^^^^

- ``cnn_extractor`` -> ``feature_extractor``, as ``feature_extractor`` in now used with ``MlpPolicy`` too

A2C
^^^

- ``epsilon`` -> ``rms_prop_eps``
- ``lr_schedule`` is part of ``learning_rate`` (it can be a callable).
- ``alpha``, ``momentum`` are modifiable through ``policy_kwargs`` key ``optimizer_kwargs``.

.. warning::

PyTorch implementation of RMSprop `differs from Tensorflow's <https://github.com/pytorch/pytorch/issues/23796>`_,
which leads to `different and potentially more unstable results <https://github.com/DLR-RM/stable-baselines3/pull/110#issuecomment-663255241>`_.
Use ``stable_baselines3.common.sb2_compat.rmsprop_tf_like.RMSpropTFLike`` optimizer to match the results
with Tensorflow's implementation. This can be done through ``policy_kwargs``: ``A2C(policy_kwargs=dict(optimizer_class=RMSpropTFLike))``


PPO
^^^

- ``cliprange`` -> ``clip_range``
- ``cliprange_vf`` -> ``clip_range_vf``
- ``nminibatches`` -> ``batch_size``

.. warning::

``nminibatches`` gave different batch size depending on the number of environments: ``batch_size = (n_steps * n_envs) // nminibatches``


- ``clip_range_vf`` behavior for PPO is slightly different: Set it to ``None`` (default) to deactivate clipping (in SB2, you had to pass ``-1``, ``None`` meant to use ``clip_range`` for the clipping)
- ``lam`` -> ``gae_lambda``
- ``noptepochs`` -> ``n_epochs``

PPO default hyperparameters are the one tuned for continuous control environment.
We recommend taking a look at the :ref:`RL Zoo <rl_zoo>` for hyperparameters tuned for Atari games.


DQN
^^^

Only the vanilla DQN is implemented right now but extensions will follow (cf planned features).
Default hyperparameters are taken from the nature paper, except for the optimizer and learning rate that were taken from Stable Baselines defaults.

DDPG
^^^^

DDPG now follows the same interface as SAC/TD3.
For state/reward normalization, you should use ``VecNormalize`` as for all other algorithms.

SAC/TD3
^^^^^^^

SAC/TD3 now accept any number of critics, e.g. ``policy_kwargs=dict(n_critics=3)``, instead of only two before.


.. note::

SAC/TD3 default hyperparameters (including network architecture) now match the ones from the original papers.
DDPG is using TD3 defaults.


New logger API
--------------

- Methods were renamed in the logger:

- ``logkv`` -> ``record``, ``writekvs`` -> ``write``, ``writeseq`` -> ``write_sequence``,
- ``logkvs`` -> ``record_dict``, ``dumpkvs`` -> ``dump``,
- ``getkvs`` -> ``get_log_dict``, ``logkv_mean`` -> ``record_mean``,


Internal Changes
----------------

Please read the :ref:`Developper Guide <developer>` section.


New Features
============

- much cleaner and consistent base code (and no more warnings =D!) and static type checks
- independent saving/loading/predict for policies
- A2C now supports Generalized Advantage Estimation (GAE) and advantage normalization (both are deactivated by default)
- generalized State-Dependent Exploration (gSDE) exploration is available for A2C/PPO/SAC. It allows to use RL directly on real robots (cf https://arxiv.org/abs/2005.05719)
- proper evaluation (using separate env) is included in the base class (using ``EvalCallback``),
if you pass the environment as a string, you can pass ``create_eval_env=True`` to the algorithm constructor.
- better saving/loading: optimizers are now included in the saved parameters and there is two new methods ``save_replay_buffer`` and ``load_replay_buffer`` for the replay buffer when using off-policy algorithms (DQN/DDPG/SAC/TD3)
- you can pass ``optimizer_class`` and ``optimizer_kwargs`` to ``policy_kwargs`` in order to easily
customize optimizers
- seeding now works properly to have deterministic results
- replay buffer does not grow, allocate everything at build time (faster)
- we added a memory efficient replay buffer variant (pass ``optimize_memory_usage=True`` to the constructor), it reduces drastically the memory used especially when using images
- you can specify an arbitrary number of critics for SAC/TD3 (e.g. ``policy_kwargs=dict(n_critics=3)``)


How to migrate?
===============

In most cases, replacing ``from stable_baselines`` by ``from stable_baselines3`` will be sufficient.
Some files were moved to the common folder (cf above) and could result to import errors.
We recommend looking at the `rl-zoo3 <https://github.com/DLR-RM/rl-baselines3-zoo>`_ and compare the imports
to the `rl-zoo <https://github.com/araffin/rl-baselines-zoo>`_ of SB2 to have a concrete example of successful migration.

Planned Features
================

- Recurrent (LSTM) policies
- DQN extensions (the current implementation is a vanilla DQN)

cf `roadmap <https://github.com/DLR-RM/stable-baselines3/issues/1>`_
1 change: 1 addition & 0 deletions docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Others:

Documentation:
^^^^^^^^^^^^^^
- Added first draft of migration guide


Pre-Release 0.9.0 (2020-10-03)
Expand Down