-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create RGDR modules #38
Comments
Thanks for the explanation and for completing the minimum example @semvijverberg . This #36 already seems to be a very nice starting point. Now, I think we need to discuss the design of the new module. Going back to the whiteboard discussion we had before, my envision about the modules in the package looks like this: Basically, what we are trying to address here, is the module for dimensionality reduction. In this module, we want to have some methods, like:
For most of the methods, the implementations are not difficult and there are packages we could make use of easily (e.g. statsmodel, scipy, scikit-learn). Only for RGDR, we need to code ourselves. But this method is quite valuable (and essential for s2s) and we want to include it as a minimum example. This is my understand about what we've done so far. Before diving into the implementation of RGDR method, we need to think about the general API design of this module, and how do we want this module to work with the module we already have (e.g. time.py, traintest.py), and the module we skip for now but would like to have later (e.g. preprocess module). When we decide the design of this module, we will know what python files do we need and how will we organize them. For now, RGDR is just one method (although it carries the largest work-load for us), let's think about the overall design first and then get into details about the implementation.
To me, it is more logical that we create a new module to do dimensionality reduction ( |
Dear Yang, Thanks for the nice workflow image! Yes, creating one python script (dimensionality.py) is an option, but I prefer to not group all methods to make things more clear for the users (and ourselves). I do think 1 overarching package such as dimensionality could/should exist, but it seems inconsistent to 'hide' RGDR in dimensionality.py. All these packages (see overview below) are already stand-alone packages, and (my vision of) the task of dimensionality.py is to be a wrapper that enables integration with our pipeline (build upon time.py and traintest.py). It seems logical to me that RGDR should become a stand-alone thing too. Maybe this requires an in-person discussion:). |
Thanks for the illustration. The pipeline idea sounds good to me. Once we discussed the plan for next-step on the whiteboard, we came up with the thought that maybe we could use the |
We want to have a minimal workflow to test our pipeline, we like to implement the RGDR method to reduce dimensionality. The approach broadly consists of the following steps, of which a basic minimum workflow is not implemented in prototype_RGDR.ipynb in branch (prototype_RGDR_usecase). For more info on the method, see this paper.
My vision is create two separate python files:
I already created some NotImplemented functions in both python files. To a large extent, the cluster_regions.py is a refractor of the original find_precursors.py in proto.
The text was updated successfully, but these errors were encountered: