Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate DA Forcing Module #642

Merged

Conversation

shorvath-noaa
Copy link
Contributor

Integration of new DA forcing module into t-route. All DA options (streamflow nudging, persistence reservoirs, and RFC reservoirs) can now work with the BMI modules.

Additions

  • Use of reservoir_rfc_da.py functions within mc_reach.pyx if running t-route via BMI functions
  • Creation of reservoir_rfc_df and reservoir_rfc_param_df dataframes in the data_assimilation object
  • Tracking of rfc DA indexing and update times
  • Loading all DA data on initialization of DAforcing module, performing QA/QC, then pass dataframes of timeslice file info and timeseries file info directly to t-route model
  • Read RFC reservoir gage-lake crosswalk info from a csv file. This is a temporary solution until RFC gages are included in the new hydrofabric

Removals

Changes

  • New method of converting gage info from hydrofabric 'network' table into gage-segment crosswalk and gage-lake crosswalk.
  • When an individual lake can be either USGS or USACE, default to USGS
  • Adjust logic in reservoir_rfc_da() to determine whether or not to advance to the next index

Testing

  1. Passed a 5 day simulation via BMI functions
  2. Confirmed by plotting that an individual RFC reservoir's outflows are matching gage info (have not exhaustively checked all reservoirs)

Screenshots

Reservoir 167299819 with gage ID GRRA4
image

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

@shorvath-noaa shorvath-noaa self-assigned this Aug 31, 2023
# pivot the dataframe from long to wide format. We therefore need to adjust the timeseries index and
# total counts to reflect the new position of t0 in the observation array.
new_timeseries_idx = self._reservoir_rfc_df.columns.get_loc(network.t0) - 1 #minus 1 so on first call of reservoir_rfc_da(), timeseries_idx will advance 1 position to t0.
self._reservoir_rfc_param_df['totalCounts'] = self._reservoir_rfc_param_df['totalCounts'] + (new_timeseries_idx - self._reservoir_rfc_param_df['timeseries_idx'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't totalCounts the total number of observed + Forecasted in any RFC file, so equal to the value of "totalCounts" attribute of the file? It looks that this equation new_timeseries_idx - self._reservoir_rfc_param_df['timeseries_idx'] is always zero. If so, do we need this line?

return np.asarray(data_idx, dtype=np.intp)[fill_index_mask], output.reshape(output.shape[0], -1)[fill_index_mask], 0, (np.asarray([data_idx[usgs_position_i] for usgs_position_i in usgs_positions]), np.asarray(lastobs_times), np.asarray(lastobs_values)), (usgs_idx, usgs_update_time-((timestep-1)*dt), usgs_prev_persisted_ouflow, usgs_prev_persistence_index, usgs_persistence_update_time-((timestep-1)*dt)), (usace_idx, usace_update_time-((timestep-1)*dt), usace_prev_persisted_ouflow, usace_prev_persistence_index, usace_persistence_update_time-((timestep-1)*dt)), output_upstream.reshape(output.shape[0], -1)[fill_index_mask], (rfc_idx, rfc_update_time-((timestep-1)*dt), rfc_timeseries_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the new returned variables used in computation when returned in compute.py? I guess computation over the given entire time loop takes place in mc_reach.pyx, so these variables values never gets used for any next time steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are passed to the data_assimilation object to update the rfc_param_df for the next loop.

@@ -229,8 +229,8 @@ def reservoir_RFC_da(use_RFC, time_series, timeseries_idx, total_counts, routing
Notes
-----
'''
if use_RFC and (current_time+routing_period)<=rfc_forecast_persist_seconds:
if (current_time+routing_period) >= update_time and timeseries_idx<total_counts:
if use_RFC and (current_time)<=rfc_forecast_persist_seconds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason of removing "+routing_period" line 539 in DataAssimilation.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding routing_period to this made the reservoir module update the index 1 timestep too early.

@@ -254,6 +298,7 @@ def compute_nhd_routing_v02(
waterbody_type_specified,
subnetwork_list,
flowveldepth_interorder = {},
from_files = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 306, there is an error saying "ValueError: could not convert string to float: '01010000,01010000,01010000,01010000' . When I checked param_df's gages column, I found

(Pdb) test.gages
key
302 01010000,01010000,01010000,01010000
Name: gages, dtype: object

I think a single stream take a single gage at most, but it seems weird.

@shorvath-noaa shorvath-noaa merged commit d06b11d into NOAA-OWP:master Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants