Skip to content

Commit

Permalink
Merge pull request #12 from GilesStrong/v0.2.1
Browse files Browse the repository at this point in the history
V0.3
  • Loading branch information
GilesStrong authored Aug 1, 2019
2 parents a229eea + ff406a7 commit 5ebdf1b
Show file tree
Hide file tree
Showing 52 changed files with 13,083 additions and 7,809 deletions.
123 changes: 116 additions & 7 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,113 @@
# Master - targeting V0.2
# Targeting V0.3

## Important changes

- `norm_in` default value for `get_pre_proc_pipes` is now `True` rather than `False`
- layer width in dense=True `FullyConnected` now no longer scales with input size to prevent parameter count from exploding
- Biases in `FullyConnected` linear layers are now initialised to zero, rather that default PyTorch init
- Bias in `ClassRegMulti` linear layer is now intitialised to 0.5 if sigmoid output, zero if linear output, and 1/n_out if softmax, unless a bias_init value is specified

## Breaking

- Changed order of arugments in `AMS` and `MultiAMS` and removed some default values
- Removed default for `return_mean` in `RegAsProxyPull` and `RegPull`
- Changed`settings` to `plot_settings` in `rf_rank_features`
- Removed some default parameters for NN blocks in `ModelBuilder`
- `ModelBuilder` `model_args` should now be a dictionary of dictionaries of keyword arguments, one for head, body, and tail blocks,
previously was a single dictionary of keyword arguments
- `Embedder.from_fy` now no longer works: change to `CatEmbedder.from_fy`
- `CatEmbHead` now no longer has a `n_cont_in` argument, instead one should pass a list of feature names to `cont_feats`

## Additions

- Added `n_estimators` parameter to `rf_rank_features` and `get_opt_rf_params` to adjust the number of trees
- Added `n_rfs` parameter to `rf_rank_features` to average feature importance over several random forests
- Added automatic computation of 3-momenta magnitude to `add_mass` if it's missing
- `n_components` to `get_pre_proc_pipes` to be passed to `PCA`
- `Pipeline` configuration parameters to `fit_input_pipe`
- Ability to pass an instantiated `Pipeline` to `fit_input_pipe`
- Callbacks now receive `model_num` and `savepath` in `on_train_begin`
- Random Forest *like* ensembling:
- `BootstrapResample` callback for resampling training and validation data
- Feature subsambling:
- `FeatureSubsample` callback for training on random selection of features
- `Model` now has an `input_mask` to automatically mask inputs at inference time (train-time inputs should be masked at `BatchYielder` level)
- `plot_roc` now returns aucs as dictionary
- growth_rate scaling coefficient to `FullyConnected` to adjust layer width by depth
- `n_in` parameter to `FullyConnected` so it works on arbitray size inputs
- `freeze_tail` to `ModelBuilder` and `ClassRegMulti`
- Abstract blocks for head, body, and tail
- `cont_feats` argument to `ModelBuilder` to allow passing of list of named features, eventually allowing more advanced methods based on named outputs of head blocks.
- `CatEmbHead` now computes a mapping from named input features to their outputs
- body blocks now expect to be passed a dictionary mapping from named input features to the model to the outputs of the head block
- `Model` and `AbsBlock` classes now have a method to compute total number of (trainable) parameters
- `MultiBlock` body, providing possibility for multiple, parallel body blocks taking subsets of input features
- Explicit initialisation paramater for bias in `ClassRegMulti`
- `plot_1d_partial_dependence` now takes `pdp_isolate_kargs` and `pdp_plot_kargs` to pass to `pdp_isolate` and `pdp_plot`, respectively
- `plot_2d_partial_dependence` now takes `pdp_interact_kargs` and `pdp_interact_plot_kargs` to pass to `pdp_interact` and `pdp_interact_plot`, respectively
- `ForwardHook` class
- `plot_multibody_weighted_outputs` an interpration plot for `MultiBlock` models
- Better documentation for methods and classes

## Removals

- Some default values of arugments in `AMS` and `MultiAMS`
- Default for `return_mean` in `RegAsProxyPull` and `RegPull`

## Fixes

- Missing bbox_inches in `plot_embedding`
- Typing for `cont_feats` and `savename` in `fit_input_pipe`
- Typing for `targ_feats` and `savename` in `fit_output_pipe`
- Moved predictions to after callback `on_eval_begin`
- Updated `from_model_builder` class method of `ModelBuilder`to use and `CatEmbedder`
- Hard coded savename in `Model` during save to hopefull solve occaisional permission error during save
- Typing for `val_fold` in `SWA`
- 'lr' to 'momentum' in `Model.set_mom`
- `Model.get_mom` now actually returns momentum (beta_1) rather than lr
- Added catch for infinite uncertainties being passed to `uncert_round`
- Added catch for `plot_roc` with bootstraping when resamples data only contains one class
- Error when attempting to plot categorical feature in `plot_1d_partial_dependence`
- layer width in dense=True `FullyConnected` scaling with input size
- Fixed `lookup_act` for linear function
- `plot_1d_partial_dependence` not using `n_points` parameter
- Errors in `plot_rocs` when passing non-lists and when requesting plot_params and bootsrapping
- Missing `to_device` call when exporting to ONNX on a CUDA device

## Changes

- `to_pt_eta_phi` now infers presence of z momentum from dataframe
- `norm_in` default value for `get_pre_proc_pipes` is now `True` rather than `False`
- `fold_train_ensemble` now always trains `n_models`, and validation fold IDs are cycled through according to `fy.n_folds % model_num`
- `FoldYielder.set_ignore` changed to `FoldYielder.add_ignore`
- Changed `HEPAugFoldYielder.rotate` and `HEPAugFoldYielder.reflect` to private methods
- `compute` method of `RegPull` now private
- Renamed `data` to `fy` in `RegPull.evaluate` and `RegAsProxyPull.evaluate`
- Made `get_layer` in `FullyConnected` private
- Made `get_dense` and `load_embeds` in `CatEmbHead` private
- Made `build_layers` in 'ClassRegMulti` private
- Made parse methods and `build_opt` in `ModelBuilder` private
- Made `get_folds` private
- Changed `settings` to `plot_settings` in `rf_rank_features`
- Dense layer from `CatEmbHead` removed and placed in `FullyConnected`
- Swapped order of continuous and categorical embedding concatination in `CatEmbHead` in order to match input data
- `arr` in `plot_kdes_from_bs` changed to `x`
- weighted partial dependencies in `plot_1d_partial_dependence` are now computed by passing the name of the weight coulmn in the dataframe and normalisation is done automatically
- `data` argument for `plot_binary_class_pred` renamed to `df`
- `plot_1d_partial_dependence` and `plot_2d_partial_dependence` both now expect to be passed a list on training features, rather than expecteing the DataFrame to only contain the trainign features
- rfpimp package nolonger requires manual installation

## Depreciations

- Passing `eta` argument to `to_pt_eta_phi`. Will be removed in v0.4
- `binary_class_cut` renamed to `binary_class_cut_by_ams`. Code added to call `binary_class_cut_by_ams`. Will be removed in v0.4
- `plot_dendrogram` renamed to `plot_rank_order_dendrogram`. Code added to call `plot_rank_order_dendrogram`. Will be removed in v0.4
- `Embedder` renamed to `CatEmbedder`. Code added to call `CatEmbedder`. Will be removed in v0.4
- `n_cont_in` (number of continuous input features) argument of `ModelBuilder` depreciated in favour of `cont_feats` (list of named continuous input features). Code added to create this by encoding numbers as string. Will be removed in v0.4.

## Comments

# V0.2 Bonfire Lit

## Important changes

Expand All @@ -17,7 +126,7 @@
- In cases where an odd number of layers are specified for the body, the number of layers is increased by one
- Batch normalisation now corrected to be after the addition step (previously was set before)
- Dense mode in `FullyConnected` now no longer adds an extra layer to scale down to the original width, instead `get_out_size` now returns the width of the final concatinated layer and the tail of the network is expected to accept this input size
- Initialisation arguments for `CatEmbHead` changed considerably w.r.t. embedding arguments; now expects to receive a `Embedder` class
- Initialisation arguments for `CatEmbHead` changed considerably w.r.t. embedding arguments; now expects to receive a `CatEmbedder` class

## Additions

Expand All @@ -32,7 +141,7 @@
- Added option to turn of realtime loss plots
- Added `from_results` and `from_save` classmethods for `Ensemble`
- Added option to `SWA` to control whether it only updates on cycle end when paired with an `AbsCyclicalCallback`
- Added helper class `Embedder` to simplify parsing of embedding settings
- Added helper class `CatEmbedder` to simplify parsing of embedding settings
- Added parameters to save and configure plots to `get_nn_feat_importance`, `get_ensemble_feat_importance`, and `rf_rank_features`
- Added classmethod for `Model` to load from save
- Added experimental export to Tensorflow Protocol Buffer
Expand Down Expand Up @@ -63,12 +172,12 @@

- Callbacks:
- Added `callback_partials` parameter (a list of partials that yield a Callback object) in `fold_train_ensemble` to eventually replace `callback_args`; Neater appearance than previous Dict of object and kargs
- `callback_args` now depreciated, to be removed in v0.3
- Currently `callback_args` are converted to `callback_partials`, code will also be removed in v0.3
- `callback_args` now depreciated, to be removed in v0.4
- Currently `callback_args` are converted to `callback_partials`, code will also be removed in v0.4
- Embeddings:
- Added `cat_embedder` parameter to `ModelBuilder` to eventuall replace `cat_args`
- `cat_args` now depreciated to be removed in v0.3
- Currently `cat_args` are converted to an `Embedder`, code will also be removed in v0.3
- `cat_args` now depreciated to be removed in v0.4
- Currently `cat_args` are converted to an `Embedder`, code will also be removed in v0.4

## Comments

Expand Down
2 changes: 1 addition & 1 deletion abbr.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@
|Random Forest|rf||
|Percentage|perc||
|Identification number|id||

|Batch Yielder|by||



Loading

0 comments on commit 5ebdf1b

Please sign in to comment.