Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGDR class implementation #68

Merged
merged 11 commits into from
Aug 12, 2022
Merged

RGDR class implementation #68

merged 11 commits into from
Aug 12, 2022

Conversation

BSchilperoort
Copy link
Contributor

@BSchilperoort BSchilperoort commented Aug 8, 2022

This PR implements RGDR as a class, with the previously implemented DBSCAN functionality integrated into the class, and with plotting methods available for users to play with the RGDR configuration.

Note that this is a standalone implementation of RGDR, without any knowledge about splits/folds implemented explicitly.


Example:

from s2spy import RGDR
rgdr = RGDR(target_timeseries) # instantiate the class

# Fit & apply the RGDR clustering to the training data
clustered_train_data = rgdr.fit(precursor_field_train_data) 

# Apply the clustering to test data
clustered_test_data = rgdr.transform(precursor_field_test_data)

Closes #38

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BSchilperoort BSchilperoort marked this pull request as ready for review August 10, 2022 12:30
Copy link
Member

@geek-yang geek-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! I played with it and checked the code. All look pretty good to me.

But still I have some concerns about the structure of the code. To me it seems too crowded to put RGDR class, all functions doing actual calculations and visualization functions all in one place.

I understand that this fits well with the design if we have a dimensionality folder and put all methods (e.g. rgdr, pca, mca, t-sne...) there, each in a separate file. But if we only want to have RGDR as an independent service and only wrappers for other methods. Then it is better to restructure RGDR module into a few separate files, for readability and maintenance.

We can think about it later when this module grows bigger. For now, just go ahead merge it!

Again, thanks a lot for the endeavor 😄!

Comment on lines +248 to +288
def plot_correlation(
self,
precursor: xr.DataArray,
ax1: Optional[plt.Axes] = None,
ax2: Optional[plt.Axes] = None,
) -> List[Type[mpl.collections.QuadMesh]]:
"""Generates a figure showing the correlation and p-value results with the
initiated RGDR class and input precursor field.

Args:
precursor (xr.DataArray): Precursor field data with the dimensions
'latitude', 'longitude', and 'anchor_year'
ax1 (plt.Axes, optional): a matplotlib axis handle to plot
the correlation values into. If None, an axis handle will be created
instead.
ax2 (plt.Axes, optional): a matplotlib axis handle to plot
the p-values into. If None, an axis handle will be created instead.

Returns:
List[mpl.collections.QuadMesh]: List of matplotlib artists.
"""

if not isinstance(precursor, xr.DataArray):
raise ValueError("Please provide an xr.DataArray, not a dataset")

corr, p_val = correlation(precursor, self.timeseries, corr_dim="anchor_year")

if (ax1 is None) and (ax2 is None):
_, (ax1, ax2) = plt.subplots(ncols=2)
elif (ax1 is None) or (ax2 is None):
raise ValueError(
"Either pass axis handles for both ax1 and ax2, or pass neither."
)

plot1 = corr.plot.pcolormesh(ax=ax1, cmap="viridis") # type: ignore
plot2 = p_val.plot.pcolormesh(ax=ax2, cmap="viridis") # type: ignore

ax1.set_title("correlation")
ax2.set_title("p-value")

return [plot1, plot2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it makes more sense to place these "plot" codes in a visual.py module and create a method in the RGDR class to call these "plot" codes. This will reduces the size of rgdr.py file and makes it more readable. It is fine for now. But when the code grows again due to the implementation of new methods (e.g. regression map?). We need to restructure the code a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Yeah we can do this when it's required. Although moving the plotting methods would only reduce the number of lines by ~25. Most of the lines are the docstrings.

Comment on lines +13 to +14
combining `weighted` and `groupby`. An open PR adds supports for this functionality
(https://github.com/pydata/xarray/pull/5480), but this branch was never merged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finding, haha. Maybe we should leave a comment there to push them a bit.

return reduced_data


def geographical_cluster_center(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice implementation. We can add the center in the visualization in next iteration, together the restructuring of all plot codes if necessary.

@BSchilperoort
Copy link
Contributor Author

Hi Yang, thanks for the review. RGDR does feel cluttered, however it currently only contains the following;

  • spherical_area
  • cluster_area
  • remove_small_area_clusters
  • masked_spherical_dbscan
  • _pearsonr_nan
  • correlation
  • RGDR
    • __init__
    • plot_correlation
    • plot_clusters
    • fit
    • transform

I think the size and structure of the file is fine at the moment. We could move out the correlation method along with the pearsonr_nan function, but that would indeed be appropriate to do when implementing different correlation/regression methods.

We could refactor it to something like;

  • spherical_area → utils.py
  • cluster_area → utils.py
  • remove_small_area_clusters → utils.py
  • masked_spherical_dbscan
  • _pearsonr_nan → correlation.py
  • correlation → correlation.py
  • RGDR
    • __init__
    • plot_correlation
    • plot_clusters
    • fit
    • transform

Two docstring corrections

Co-authored-by: Yang <[email protected]>
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

94.9% 94.9% Coverage
0.0% 0.0% Duplication

@BSchilperoort BSchilperoort merged commit c254e11 into main Aug 12, 2022
@BSchilperoort BSchilperoort deleted the rgdr_class branch August 12, 2022 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create RGDR modules
2 participants