Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomness conciseness #10

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Randomness conciseness #10

wants to merge 6 commits into from

Conversation

sammymuench
Copy link
Collaborator

Pull request for updated code. Changes in annual ipynb, make_xy_data_splits, and intro ipynb. data dir is now included with a couple large files removed.

@kheuton kheuton self-requested a review February 12, 2024 21:14
Copy link
Collaborator

@kheuton kheuton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on things that should change. In brief: I don't love the environment variable system we've got now, but unless we get rid of it completely we should not silently override it. Most changes requested are small

@@ -206,6 +206,7 @@ def calc_score_dict_uncertainty(model, x_df, y_df, split_name,
'INTPTLAT', 'INTPTLON']

tr, va, te = make_xy_data_splits.load_xy_splits(
data_dir = '../cook-county/cleaning-cook-county/data_dir',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. load_xy_splits uses the data directory set by the users environment variable. I don't love the environment variable either, but by hardcoding this path here it silently breaks how the DATA_DIR environment variable works

@@ -7,7 +7,7 @@
parser = argparse.ArgumentParser()
parser.add_argument('--steps_to_run', default='1,2,3,4,5', type=str)
args = parser.parse_args()

os.environ['DATA_DIR'] = 'data_dir'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add this as the default on the next line, but we shouldn't overwrite the user's environment here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-Dont override environment variable, but can proivde ./data_dir as a default if needed

  • Why was the rounding change necessary?

…ch will generate yearly semi and quarterly dfs in one command. This script also has option to add code to easily generate monthly, biweekly, and weekly dfs. The data has been rearranged to do this correct with new "data_dir" in the right spot. Also, this commit added compare_timesteps.py and compare_timesteps_fit_and_predict.py, scripts which can compare the predictive power of 2 timesteps using one as a baseline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants