Create RGDR modules #38

semvijverberg · 2022-06-30T11:45:04Z

We want to have a minimal workflow to test our pipeline, we like to implement the RGDR method to reduce dimensionality. The approach broadly consists of the following steps, of which a basic minimum workflow is not implemented in prototype_RGDR.ipynb in branch (prototype_RGDR_usecase). For more info on the method, see this paper.

gridpoint-wise correlation map analysis versus 1-d target
clusters these region via DBSCAN
calculate 1-d timeseries based on each label (i.e. precursor region)

My vision is create two separate python files:

one purely for the (partial) correlation map analysis , with also support for regression analysis; now called map_analysis.py
the second for the clustering analysis (people might want to use this functionality stand-alone; now called cluster_regions.py

I already created some NotImplemented functions in both python files. To a large extent, the cluster_regions.py is a refractor of the original find_precursors.py in proto.

geek-yang · 2022-06-30T18:36:31Z

Thanks for the explanation and for completing the minimum example @semvijverberg . This #36 already seems to be a very nice starting point. Now, I think we need to discuss the design of the new module. Going back to the whiteboard discussion we had before, my envision about the modules in the package looks like this:

Basically, what we are trying to address here, is the module for dimensionality reduction. In this module, we want to have some methods, like:

PCA
MCA
RGDR (Response guided dimensionality reduction)
...

For most of the methods, the implementations are not difficult and there are packages we could make use of easily (e.g. statsmodel, scipy, scikit-learn). Only for RGDR, we need to code ourselves. But this method is quite valuable (and essential for s2s) and we want to include it as a minimum example.

This is my understand about what we've done so far. Before diving into the implementation of RGDR method, we need to think about the general API design of this module, and how do we want this module to work with the module we already have (e.g. time.py, traintest.py), and the module we skip for now but would like to have later (e.g. preprocess module). When we decide the design of this module, we will know what python files do we need and how will we organize them. For now, RGDR is just one method (although it carries the largest work-load for us), let's think about the overall design first and then get into details about the implementation.

My vision is create two separate python files:
one purely for the (partial) correlation map analysis , with also support for regression analysis; now called map_analysis.py
the second for the clustering analysis (people might want to use this functionality stand-alone; now called cluster_regions.py

To me, it is more logical that we create a new module to do dimensionality reduction (dimensionality.py?). We support multiple methods in it. And RGDR is one of the method. Based on the code you have, we know what parameters we need. The implementation of RGDR could be placed in an independent python file. Regarding the correlation map analysis, I think we need to think about it as I would assume that the user may also want to check correlations of timeseries as well as the autocorrelation of timeseries for each data point. These are relevant topics and we would like to group them together.

semvijverberg · 2022-07-04T15:14:11Z

Dear Yang,

Thanks for the nice workflow image!

Yes, creating one python script (dimensionality.py) is an option, but I prefer to not group all methods to make things more clear for the users (and ourselves). I do think 1 overarching package such as dimensionality could/should exist, but it seems inconsistent to 'hide' RGDR in dimensionality.py.

All these packages (see overview below) are already stand-alone packages, and (my vision of) the task of dimensionality.py is to be a wrapper that enables integration with our pipeline (build upon time.py and traintest.py).

It seems logical to me that RGDR should become a stand-alone thing too. Maybe this requires an in-person discussion:).

geek-yang · 2022-07-06T11:21:52Z

Thanks for the illustration. The pipeline idea sounds good to me. Once we discussed the plan for next-step on the whiteboard, we came up with the thought that maybe we could use the sklearn.pipepline to accommodate our workflow and see how it fits the recipe of the methods we would like to implement. To me, it makes sense to keep RGDR a individual python module since it just includes many operations and features, but again since it is also a method doing dimensionality reduction, we'd better fit it to the pipeline (or wrapper) as well, just to provide a consistent interface to the user. Unless we have specific reason to make it isolated. Let's discuss it next time we meet. 😄

geek-yang mentioned this issue Jun 30, 2022

Implement minimum example of RGDR (dimensionality reduction) usecase #36

Closed

BSchilperoort mentioned this issue Jul 21, 2022

Implemented correlation map function #49

Merged

BSchilperoort mentioned this issue Aug 10, 2022

RGDR class implementation #68

Merged

BSchilperoort closed this as completed in #68 Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create RGDR modules #38

Create RGDR modules #38

semvijverberg commented Jun 30, 2022

geek-yang commented Jun 30, 2022

semvijverberg commented Jul 4, 2022

geek-yang commented Jul 6, 2022

Create RGDR modules #38

Create RGDR modules #38

Comments

semvijverberg commented Jun 30, 2022

geek-yang commented Jun 30, 2022

semvijverberg commented Jul 4, 2022

geek-yang commented Jul 6, 2022