Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CmdStan 2.31] Add log_prob function to model class #637

Merged
merged 6 commits into from
Dec 15, 2022

Conversation

WardBrian
Copy link
Member

Submission Checklist

  • Run unit tests
  • Declare copyright holder and open-source license: see below

Summary

Closes #593. Like with #634, this relies on 2.31 and for now will be tested with release candidates

I am not entirely sure if/how we want to expose this, but here's what I've done for now:

Signature:

    def log_prob(
        self,
        params: Union[Dict[str, Any], str, os.PathLike],
        data: Union[Mapping[str, Any], str, os.PathLike, None] = None,
    ) -> pd.DataFrame:

This takes in (constrained) parameters in the same format as the inits argument, and data the same as for other functions. It directly returns a dataframe rather than a larger object, since it is exactly one set of outputs and the diagnostic information is not important.

I've added a big warning:

NOTE: This function is NOT an efficient way to evaluate the log
density of the model. It should be used for diagnostics ONLY.
Please, do not use this for other purposes such as testing new
sampling algorithms!

Should we mention things users might want to use, like BridgeStan?

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Simons Foundation

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@WardBrian WardBrian added feature New feature or request method outputs CmdStan outputs to Python objects labels Nov 8, 2022
@WardBrian WardBrian requested a review from mitzimorris November 8, 2022 19:23
@WardBrian WardBrian linked an issue Nov 8, 2022 that may be closed by this pull request
@WardBrian
Copy link
Member Author

WardBrian commented Nov 8, 2022

@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2022

Codecov Report

Merging #637 (71444c3) into develop (d5ed366) will increase coverage by 0.03%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #637      +/-   ##
===========================================
+ Coverage    80.13%   80.16%   +0.03%     
===========================================
  Files           69       69              
  Lines        10335    10392      +57     
===========================================
+ Hits          8282     8331      +49     
- Misses        2053     2061       +8     
Impacted Files Coverage Δ
a/cmdstanpy/cmdstanpy/cmdstanpy/install_cmdstan.py 39.19% <0.00%> (-0.93%) ⬇️
a/cmdstanpy/cmdstanpy/cmdstanpy/model.py 88.78% <0.00%> (-0.18%) ⬇️
cmdstanpy/cmdstanpy/model.py 88.02% <0.00%> (+0.25%) ⬆️
...runner/work/cmdstanpy/cmdstanpy/cmdstanpy/model.py 88.02% <0.00%> (+0.25%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should a function that outputs wanted parameters, their shape and dtype. Also validation function is needed.

This needs ndarray or unravelled table format parameters?

@WardBrian
Copy link
Member Author

There should a function that outputs wanted parameters, their shape and dtype. Also validation function is needed.

Not sure I follow what you're requesting here.

The input is the same as for inits for sampling - path to json file or a dictionary we serialize to json first, containing constrained parameters

@ahartikainen
Copy link
Contributor

Ok, yeah maybe what I describe is more like a helper functions for users so they are aware of what kind of input is needed.

Init works also with a subset of parameters? So users are fine with giving it wrong parameters.

@WardBrian
Copy link
Member Author

The content isn't necessarily the same as inits, only the format. You will need to specify every parameter or an error will be thrown (see the test test_lp_bad). That said, if you wrote the model you should probably understand how to construct this file, since it is basically 1-1 with the parameters block.

The output is harder to interpret, since it is on the unconstrained space, but I'm not sure if CmdStan exposes the kind of information needed for a helper function which would make it any clearer

@ahartikainen
Copy link
Contributor

Yeah true.

stanc has the --info or something similar but I don't think executable has access to these.

@mitzimorris
Copy link
Member

mitzimorris commented Nov 22, 2022

perhaps we should wait on adding this to CmdStanPy? cf stan-dev/cmdstan#1133

who is using this feature? would bridgestan be an alternative?

@WardBrian
Copy link
Member Author

There are basically 0 situations where this is preferable to bridgestan except for the fact that it’s easily accessible.

The only legit uses IMO are for debugging and the like. I know Bob and I did some problems where we were optimizing and wanted to check that the gradient at the true solution was really 0

As for waiting, the only thing that issue would change here is it would allow you to pass a CSV, which doesn’t require any code changes here, just doc.

Copy link
Member

@mitzimorris mitzimorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we only exposing log_prob given inputs on the constrained scale?
file can be Rdump JSON or CSV, correct?

@WardBrian
Copy link
Member Author

Unless/until we expose a way from cmdstan to constrain and unconstrain, I think it only really makes sense

Copy link
Member

@mitzimorris mitzimorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@WardBrian WardBrian merged commit 78a7fef into develop Dec 15, 2022
@WardBrian WardBrian deleted the feature/log-prob branch December 15, 2022 19:03
@WardBrian WardBrian mentioned this pull request Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request method outputs CmdStan outputs to Python objects
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CmdStan 2.31] Support new method log_prob
4 participants