Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Analysis plan for investigation into the heterogeneity in wastewater forecast performance #227

Open
wants to merge 1 commit into
base: prod
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions docs/investigation_into_drivers_of_heterogeneity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
editor_options:
markdown:
wrap: 72
---

## Wastewater forecasting evaluation: Investigation into drivers of heterogeneity in the impact of wastewater on forecast performance

This document is intended to outline concretely the proposed analyses
requested as a follow-up to the previously decided on [evaluation
plan](https://github.com/cdcgov/wastewater-informed-covid-forecasting/tree/prod/docs).
This evaluation focused on two major components: - retrospective
comparison of forecast performance with and without wastewater using
vintaged datasets of COVID-19 hospital admissions and wastewater at the
jurisdiction and wastewater treatment plant level, across the 2023-24
epidemic season - comparison of the performance of the
wastewater-informed model to other models submitting to the COVID-19
forecast Hub, both in real-time (when submitted from Feb - March of
2024) and retrospectively (using the real-time COVID Hub submissions and
our retrospectively produce submission), to be presented in the
manuscript: "Bayesian generative modeling for heterogeneous wastewater
data applied to COVID-19 forecasting". In the interest of maintaining
scientific integrity, we specified the plan for this evaluation prior to
performing the analysis.

The goal of the proposed analyses here is to investigate, using the
empirical wastewater data, hospital admissions data, model performance
metrics, and the model parameters, the potential drivers of the
heterogeneity in the relative performance of wastewater. This analysis
is intended for hypothesis generation, rather than hypothesis testing or
assigning any causal relationships, as we believe a full-fledged
analysis of the impact of different characteristics of the wastewater
and hospital admissions data on forecast performance is out of scope for
this paper and should be performed as a separate independent analysis.

We are writing this analysis plan with the intention of coming to a
consensus on the scope of the required additional analysis and the form
of the presentation of the results, prior to running the analysis, again
with the intention of promoting scientific integrity and holding
ourselves accountable to presenting the results in an unbiased manner.

We plan to address the following questions via the proposed analyses:

All of the planned analysis will focus on the retrospective model
performance with and without wastewater data.

1. How strong is the correlation in the recent trend in wastewater
signals (with one another), and is that consistent with the eventual
trend in hospital admissions data? How does it impact performance?

- We will quantify correlation in recent trend in wastewater signals
using only data from the 2 weeks prior to the forecast date. We will
pool data from all sites in those two weeks, calculate a correlation
coefficient, and estimate the instantaneous exponential growth rate
in the observed data. We will compare this to an estimate of the
exponential growth rate in the hospital admissions using the
evaluation data from 1 week prior to the forecast to 2 weeks beyond
the forecast date, in an attempt to characterize the trend of
hospital admissions into the forecast period. Next, we will bin
forecasts into the following categories:

- high correlation in wastewater signal, trend in same direction

- high correlation in wastewater signal, trend in opposite
direction

- low correlation in wastewater signal, trend in same direction

- low correlation in wastewater signal, trend in opposite
direction

We will then display and summarize the distributions of forecast
performances (average and relative CRPS of the full 38 day horizon for
an individual forecast date and location) in each of the bins.

2. How did variability in the wastewater data impact forecast
performance?
- We will quantify the variability in wastewater data in two ways
1 by computing the coefficient of variation across the time
series for each wastewater treatment plant (returning a
distribution of CVs) and 2 by looking at the posterior estimate
of the mean obsrvation error across sites.

- We will summarize the "average variability in the wastewater
signal" for each location and forecast date by taking the mean
of each of the empirical and model-based distributions, and we
will plot the mean variability from both methods compared to the
forecast performance (average and relative CRPS of the full 38
day horizon for an individual forecast date and location)
3. How much does latency impact forecast performance?
- We will bin forecasts by those containing more than 5 sites with
wastewater concentration data within the last 20-15, 14-11 days,
10-8 days, and 7-0 days of the forecast date.

- We will present violin plots demonstrating the distribution of
forecast performance (average and relative CRPS of the full 38
day horizon for an individual forecast date and location)
4. How does relative vs absolute forecast performance compare, e.g.
does wastewater improve forecasts when the performance would have
otherwise been poor, or vice versa?
- scatterplot of the relative CRPS of the wastewater informed
model versus the absolute CRPS of the hospital admissions only
model

- scatterplot of the absolute CRPS of the wastewater model versus
the absolute CRPS of the hospital admissions only model

We plan to add the proposed plots and analysis to the supplement of the
manuscript.