Releases: GilesStrong/lumin
Releases · GilesStrong/lumin
v0.4 Hypothetically Useful But Of Limited Actual Utility
Important changes
- Moved to Pandas 0.25.0
- Moved to Seaborn 0.9.0
- Moved to Scikit-learn 0.21.0
Breaking
Additions
rf_check_feat_removal
method to check whether one of several (correlated) features can safely be ignoredrf_rank_features
:n_max_display
torf_rank_features
to adjust number of features displayed in plotplot_results
,retrain_on_import_feats
, andverbose
to control printed outputs of function- Can now take preset RF params, rather than optimising each time
- Control over x-axis label in
plot_importance
repeated_rf_rank_features
get_df
function toLRFinder
- Ability to use dictionaries for
PlotSettings.style
plot_rank_order_dendrogram
:- added threshold param to control plotting colour and return
- returns list of paris of correlated features
FoldYielder
- Method to list columns in foldfile
- option to initialise using a string or path for the foldfile
- close method to close the foldfile
- New methods to
hep_proc
focussing on vectoriesed transformations and operatins of Lorentz Vectors subsample_df
to sub sample a data frame (with optional stratification and replacement)- Callbacks during prediction:
on_pred_begin
andon_pred_end
methods added toAbsCallback
which are called duringModel.predict_array
Model.predict
,Model.predict_folds
,Model.predict_array
now take a list of instantiated callbacks to apply during predicitonEnsemble.predict
,Ensemble.predict_folds
,Ensemble.predict_array
now take a list of instantiated callbacks to apply during prediciton
ParametrisedPrediction
callback for setting a single parameterisation feature to a set value during model prediction- y-axis limit argument to
plot_1d_partial_dependence
auto_filter_on_linear_correlation
auto_filter_on_mutual_dependence
Removals
- Passing
eta
argument toto_pt_eta_phi
: now inferred from data Embedder
renamed toCatEmbedder
cat_args
andn_cont_in
arguments inModelBuilder
: Usecat_embedder
andcont_feats
insteadcallback_args
argument infold_train_ensemble
: Usecallback_partials
insteadbinary_class_cut
renamed tobinary_class_cut_by_ams
plot_dendrogram
renamed toplot_rank_order_dendrogram
Fixes
- Remove mutable default paramert for
get_opt_rf_params
- Missing
n_estimators
in call toget_opt_rf_params
torf_rank_features
- Added string interpretation check when loading
ModelBuilder
saved in pre-v0.3.1 versions rf_rank_features
importance cut now >= threshold, was previously >plot_rank_order_dendrogram
now clusters by absolute Spearman's rank correlation coeficientfeat_map
toself.feat_map
inMultiBlock.__init__
- Bias initialisation for sigmoids in
ClassRegMulti
corrected to zero, was 0.5 - Removed uncertainties from the moments shown by
plot_feat
when plotting with weights; uncertainties were underestimated
Changes
- Improved
plot_lr_finders
- Moved to Pandas 0.25.0
- Moved to Seaborn 0.9.0
- Moved to Scikit-learn 0.21.0
model_builder.get_model
now returns a 4th object, an input_mask- Feature subsampling:
- Moved to
ModelBuilder
rather thanFeatureSubsample
callback: required to handleMultiBlock
models - Now allows a list of features to always be present in model via
ModelBuilder.guaranteed_feats
- Moved to
plot_1d_partial_dependence
andplot_2d_partial_dependence
now better handle weighted resampling of data: replacement sampling, and auto fix whenwgt_name
specified but nosample_sz
Depreciations
FeatureSubsample
in favour ofguaranteed_feats
andcont_subsample_rate
inModelBuilder
. Will be removed in v0.5.
Comments
v0.3.1 Micro update
Important changes
- Online documentation now available at https://lumin.readthedocs.io
Breaking
Additions
bin_binary_class_pred
- Ability to only consider classes rather than samples when computing bin edges
- Ability to add pure signal bins if normalisation uncertainty would be below some value
plot_bottleneck_weighted_inputs
method for interpretting bottleneck blocks inMultiBlock
- Online documentation: https://lumin.readthedocs.io
- Default optimiser notice
- Can now pass arbitary optimisers to the 'opt' value in
opt_args
. Optimisers still interpretable from strings. - Expanded advanced model building example to include more interpretation examples and diagrams of network architectures
Removals
- weak decorators for losses
Fixes
CatEmbedder.from_fy
using features ignored byFoldYielder
bottleneck_sz_masks
tobottleneck_masks
inMultiBlock
- SWA crahsing when evaluating targets of type long, when loss expects a float (model.evaluate now converts to float when objective is not multiclass classification)
- Doc string fixes
- Fixed model being moved to device after instantiating optimiser (sometimes leads to an error). Models now moved to device in
ModelBuilder.get_model
rather than inModel.__init__
Changes
Depreciations
Comments
V0.3 Tears in Rain
Important changes
norm_in
default value forget_pre_proc_pipes
is nowTrue
rather thanFalse
- layer width in dense=True
FullyConnected
now no longer scales with input size to prevent parameter count from exploding - Biases in
FullyConnected
linear layers are now initialised to zero, rather that default PyTorch init - Bias in
ClassRegMulti
linear layer is now intitialised to 0.5 if sigmoid output, zero if linear output, and 1/n_out if softmax, unless a bias_init value is specified
Breaking
- Changed order of arugments in
AMS
andMultiAMS
and removed some default values - Removed default for
return_mean
inRegAsProxyPull
andRegPull
- Changed
settings
toplot_settings
inrf_rank_features
- Removed some default parameters for NN blocks in
ModelBuilder
ModelBuilder
model_args
should now be a dictionary of dictionaries of keyword arguments, one for head, body, and tail blocks,
previously was a single dictionary of keyword argumentsEmbedder.from_fy
now no longer works: change toCatEmbedder.from_fy
CatEmbHead
now no longer has an_cont_in
argument, instead one should pass a list of feature names tocont_feats
Additions
- Added
n_estimators
parameter torf_rank_features
andget_opt_rf_params
to adjust the number of trees - Added
n_rfs
parameter torf_rank_features
to average feature importance over several random forests - Added automatic computation of 3-momenta magnitude to
add_mass
if it's missing n_components
toget_pre_proc_pipes
to be passed toPCA
Pipeline
configuration parameters tofit_input_pipe
- Ability to pass an instantiated
Pipeline
tofit_input_pipe
- Callbacks now receive
model_num
andsavepath
inon_train_begin
- Random Forest like ensembling:
BootstrapResample
callback for resampling training and validation data- Feature subsambling:
FeatureSubsample
callback for training on random selection of featuresModel
now has aninput_mask
to automatically mask inputs at inference time (train-time inputs should be masked atBatchYielder
level)
plot_roc
now returns aucs as dictionary- growth_rate scaling coefficient to
FullyConnected
to adjust layer width by depth n_in
parameter toFullyConnected
so it works on arbitray size inputsfreeze_tail
toModelBuilder
andClassRegMulti
- Abstract blocks for head, body, and tail
cont_feats
argument toModelBuilder
to allow passing of list of named features, eventually allowing more advanced methods based on named outputs of head blocks.CatEmbHead
now computes a mapping from named input features to their outputs- body blocks now expect to be passed a dictionary mapping from named input features to the model to the outputs of the head block
Model
andAbsBlock
classes now have a method to compute total number of (trainable) parametersMultiBlock
body, providing possibility for multiple, parallel body blocks taking subsets of input features- Explicit initialisation paramater for bias in
ClassRegMulti
plot_1d_partial_dependence
now takespdp_isolate_kargs
andpdp_plot_kargs
to pass topdp_isolate
andpdp_plot
, respectivelyplot_2d_partial_dependence
now takespdp_interact_kargs
andpdp_interact_plot_kargs
to pass topdp_interact
andpdp_interact_plot
, respectivelyForwardHook
classplot_multibody_weighted_outputs
an interpration plot forMultiBlock
models- Better documentation for methods and classes
Removals
- Some default values of arugments in
AMS
andMultiAMS
- Default for
return_mean
inRegAsProxyPull
andRegPull
Fixes
- Missing bbox_inches in
plot_embedding
- Typing for
cont_feats
andsavename
infit_input_pipe
- Typing for
targ_feats
andsavename
infit_output_pipe
- Moved predictions to after callback
on_eval_begin
- Updated
from_model_builder
class method ofModelBuilder
to use andCatEmbedder
- Hard coded savename in
Model
during save to hopefull solve occaisional permission error during save - Typing for
val_fold
inSWA
- 'lr' to 'momentum' in
Model.set_mom
Model.get_mom
now actually returns momentum (beta_1) rather than lr- Added catch for infinite uncertainties being passed to
uncert_round
- Added catch for
plot_roc
with bootstraping when resamples data only contains one class - Error when attempting to plot categorical feature in
plot_1d_partial_dependence
- layer width in dense=True
FullyConnected
scaling with input size - Fixed
lookup_act
for linear function plot_1d_partial_dependence
not usingn_points
parameter- Errors in
plot_rocs
when passing non-lists and when requesting plot_params and bootsrapping - Missing
to_device
call when exporting to ONNX on a CUDA device
Changes
to_pt_eta_phi
now infers presence of z momentum from dataframenorm_in
default value forget_pre_proc_pipes
is nowTrue
rather thanFalse
fold_train_ensemble
now always trainsn_models
, and validation fold IDs are cycled through according tofy.n_folds % model_num
FoldYielder.set_ignore
changed toFoldYielder.add_ignore
- Changed
HEPAugFoldYielder.rotate
andHEPAugFoldYielder.reflect
to private methods compute
method ofRegPull
now private- Renamed
data
tofy
inRegPull.evaluate
andRegAsProxyPull.evaluate
- Made
get_layer
inFullyConnected
private - Made
get_dense
andload_embeds
inCatEmbHead
private - Made
build_layers
in 'ClassRegMulti` private - Made parse methods and
build_opt
inModelBuilder
private - Made
get_folds
private - Changed
settings
toplot_settings
inrf_rank_features
- Dense layer from
CatEmbHead
removed and placed inFullyConnected
- Swapped order of continuous and categorical embedding concatination in
CatEmbHead
in order to match input data arr
inplot_kdes_from_bs
changed tox
- weighted partial dependencies in
plot_1d_partial_dependence
are now computed by passing the name of the weight coulmn in the dataframe and normalisation is done automatically data
argument forplot_binary_class_pred
renamed todf
plot_1d_partial_dependence
andplot_2d_partial_dependence
both now expect to be passed a list on training features, rather than expecteing the DataFrame to only contain the trainign features- rfpimp package nolonger requires manual installation
Depreciations
- Passing
eta
argument toto_pt_eta_phi
. Will be removed in v0.4 binary_class_cut
renamed tobinary_class_cut_by_ams
. Code added to callbinary_class_cut_by_ams
. Will be removed in v0.4plot_dendrogram
renamed toplot_rank_order_dendrogram
. Code added to callplot_rank_order_dendrogram
. Will be removed in v0.4Embedder
renamed toCatEmbedder
. Code added to callCatEmbedder
. Will be removed in v0.4n_cont_in
(number of continuous input features) argument ofModelBuilder
depreciated in favour ofcont_feats
(list of named continuous input features). Code added to create this by encoding numbers as string. Will be removed in v0.4.
Comments
Online documentation to be created soon
v0.2 Bonfire Lit
Important changes
- Residual mode in
FullyConnected
:- Identity paths now skip two layers instead of one to align better with arXiv:1603.05027
- In cases where an odd number of layers are specified for the body, the number of layers is increased by one
- Batch normalisation now corrected to be after the addition step (previously was set before)
- Dense mode in
FullyConnected
now no longer adds an extra layer to scale down to the original width, insteadget_out_size
now returns the width of the final concatinated layer and the tail of the network is expected to accept this input size - Fixed rule-of-thumb for embedding sizes from max(50, 1+(sz//2)) to max(50, (1+sz)//2)
Breaking
- Changed callbacks to receive
kargs
, rather than logs to allow for great flexibility - Residual mode in
FullyConnected
:- Identity paths now skip two layers instead of one to align better with arXiv:1603.05027
- In cases where an odd number of layers are specified for the body, the number of layers is increased by one
- Batch normalisation now corrected to be after the addition step (previously was set before)
- Dense mode in
FullyConnected
now no longer adds an extra layer to scale down to the original width, insteadget_out_size
now returns the width of the final concatinated layer and the tail of the network is expected to accept this input size - Initialisation arguments for
CatEmbHead
changed considerably w.r.t. embedding arguments; now expects to receive aEmbedder
class
Additions
- Added wrapper class for significance-based losses (
SignificanceLoss
) - Added label smoothing for binary classification
- Added
on_eval_begin
andon_eval_end
callback calls - Added
on_backwards_begin
andon_backwards_end
callback calls - Added callbacks to
fold_lr_find
- Added gradient-clipping callback
- Added default momentum range to
OneCycle
of .85-.95 - Added
SequentialReweight
classes - Added option to turn of realtime loss plots
- Added
from_results
andfrom_save
classmethods forEnsemble
- Added option to
SWA
to control whether it only updates on cycle end when paired with anAbsCyclicalCallback
- Added helper class
Embedder
to simplify parsing of embedding settings - Added parameters to save and configure plots to
get_nn_feat_importance
,get_ensemble_feat_importance
, andrf_rank_features
- Added classmethod for
Model
to load from save - Added experimental export to Tensorflow Protocol Buffer
Removals
Fixes
- Added missing data download cell for multiclass example
- Corrected type hint for
OneCycle lr_range
toList
- Corrected
on_train_end
not being called infold_train_ensemble
- Fixed crash in
plot_feat
when plotting non-bulk without cuts, and non-crash bug when plotting non-bulk with cuts - Fixed typing of callback_args in
fold_train_ensemble
- Fixed crash when trying to load model trained on cuda device for application on CPU device
- Fixed positioning of batch normalisation in residual mode of
FullyConnected
to after addition rf_rank_features
was accidentally evaluating feature importance on validation data rather than training data, resulting in lower importances that it should- Fixed feature selection in examples using a test size of 0.8 rather than 0.2
- Fixed crash when no importnat features were found by
rf_rank_features
- Fixed rule-of-thumb for embedding sizes from max(50, 1+(sz//2)) to max(50, (1+sz)//2)
- Fixed cutting when saving plots as pdf
Changes
- Moved
on_train_end
call infold_train_ensemble
to after loading best set of weights - Replaced all mutable default arguments
Depreciations
- Callbacks:
- Added
callback_partials
parameter (a list of partials that yield a Callback object) infold_train_ensemble
to eventually replacecallback_args
; Neater appearance than previous Dict of object and kargs callback_args
now depreciated, to be removed in v0.3- Currently
callback_args
are converted tocallback_partials
, code will also be removed in v0.3
- Added
- Embeddings:
- Added
cat_embedder
parameter toModelBuilder
to eventuall replacecat_args
cat_args
now depreciated to be removed in v0.3- Currently
cat_args
are converted to anEmbedder
, code will also be removed in v0.3
- Added
Comments
v0.1 micro update
Change log
Breaking
binary_class_cut
now returns tuple of(cut, mean_AMS, maximum_AMS)
as opposed to just the cut- Initialisation lookups now expected to return callable, rather than callable and dictionary of arguments.
partial
used instead. top_perc
inbinary_class_cut
now treated as percentage rather than fraction
Additions
- Added PReLU activation
- Added uniform initialisation lookup
Removals
Fixes
uncert_round
convertsNaN
uncertainty to0
- Correct name of internal embedding dropout layer in
CatEmbHead
: emd_do -> emb_do - Adding missing settings for activations and initialisations to body and tail
- Corrected plot annotation for percentage in
binary_class_cut
Changes
- Removed the
BatchNorm1d
automatically added inCatEmbHead
when using categorical inputs; assuming unit-Gaussian continuous inputs, no a priori resaon to add it, and tests indicated it hurt performance and train-time. - Changed weighting factor when not loading loading cycles only to n+2 from n+1
Depreciations
Comments
v0.1 PyPI am assuming direct control
1st Beta release
- Now available from PIP
- Various bug fixes and improvements
- Zenodo DOI
v0.0 Hello there
0.0.1 Updates and init correction