Added the Dirichlet-Multinomial distribution (issue 54) #2979

chvandorp · 2023-11-28T22:03:18Z

Summary

This PR adds the Dirichlet-Multinomial distribution to the Stan math library. The Dirichlet-Multinomial (DirMult) distribution generalizes the Beta-Binomial distribution with more than two categories. It can also be seen as an over-dispersed multinomial distribution. It is implemented in other popular frameworks such as Pyro and the Python package scipy. In issue 54 (first created in 2014) it is suggested to add this distribution.

I largely based my implementation on the existing multinomial distribution. As such the DirMult distribution is currently not vectorized, as I could not find an example of a vectorized multivariate discrete distribution in the current code base. However, I would be interested in implementing vectorization in a future PR. I have added clear Doxygen documentation for the lpmf, rng (and log) functions.

I added 4 files for the log-PMF, the log function, the PRNG, a number of tests, and I updated the prob.hpp header. I have also prepared PRs for stanc3 and the docs, and tested the new native distribution in action with a Stan model using cmdstanpy.

Tests

I have included 8 unit tests.

test 1 compares LPMF values with pre-computed values (using scipy.stats).
test 2 checks behavior of propto (a log-prob of 0.0)
test 3 checks that the right exceptions are thrown in case of incorrect arguments
test 4 checks that the observation [0, 0, ..., 0] has log-prob 0.0
test 5 checks that the PRNG returns values in the right domain
test 6 checks that the PRNG throws the correct exceptions
test 7 is a goodness-of-fit test
test 8 checks that for two categories, the DirMult coincides the BetaBinom

Side Effects

There should not be any side effects.

Release notes

Added the Dirichlet-Multinomial distribution to the Stan Math library (dirichlet_multinomial_lpmf, dirichlet_multinomial_log, and dirichlet_multinomial_rng).

Checklist

[x ] Copyright holder: Christiaan H. van Dorp

By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

[x ] the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
[x ] the code is written in idiomatic C++ and changes are documented in the doxygen
[x ] the new changes are tested

SteveBronder · 2023-12-04T16:11:10Z

Thanks! Can you also add a test in test/unit/math/mix/prob/ that uses the expect_ad test framework? The expect_ad() function will check and compare the gradients and higher order derivativers against a finite difference method to see if they are close. You can see how the other mix tests files use the expect_ad() function as examples.

chvandorp · 2023-12-05T14:56:23Z

Thanks @SteveBronder! I wrote the AD test. But before I push this, do you know how to fix this continuous integration issue I get? I think it might be because I named my fork "stan_math" instead of "math"

SteveBronder · 2023-12-05T15:39:01Z

Yes the name looks to be an issue. We can fix it on our side but it might just be easier to change the name of your fork

andrjohns

Thanks for the contribution! A couple of minor requests to code style and files.

I've also updated the lpmf function to take advantage of our/Eigen's vectorised functions and added analytical gradients. Let me know if there's anything in those changes that you disagree with or similar.

Thanks!

stan/math/prim/prob/dirichlet_multinomial_log.hpp

stan/math/prim/prob/dirichlet_multinomial_rng.hpp

chvandorp · 2023-12-16T20:40:23Z

Thanks @andrjohns for the improvements. I've added a (ns_array > 0).select to the ops_partials definition. It does not change the math, but I found that it gives a significant speed increase when the data contains lots of zeros. I also added a test case to the AD test that contains zeros in the count vector to cover this case.

andrjohns

LGTM! Thanks for adding this!

chvandorp added 5 commits November 26, 2023 18:16

added dirichlet_multinomial files and updated prob.hpp

fe4cb47

correction in documentation for dirichlet_multinomial_lpmf

bdd9a90

added unit test. equivalence of BetaBinom and DirMult when K=2

30c5976

cleaned up tests, added cases for lmpf test

ae20c02

used clang-format to properly format the added files

28a29cb

added AD test for dirichlet_multinomial_lpmf

6176216

WardBrian mentioned this pull request Dec 5, 2023

VerifyChanges fails when user's fork has different name stan-dev/jenkins-shared-libraries#7

Closed

andrjohns and others added 3 commits December 13, 2023 15:50

Update vectorised handling, add analytical gradients

5014ede

Merge branch 'develop' into feature/issue-54-dirichlet-multinomial

2992897

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

22d9eb2

andrjohns requested changes Dec 13, 2023

View reviewed changes

stan/math/prim/prob/dirichlet_multinomial_log.hpp Outdated Show resolved Hide resolved

stan/math/prim/prob/dirichlet_multinomial_rng.hpp Outdated Show resolved Hide resolved

style changes, removed _log file, optimized lpmf

beb70f3

andrjohns approved these changes Dec 17, 2023

View reviewed changes

andrjohns merged commit 56d0432 into stan-dev:develop Dec 17, 2023

This was referenced Dec 17, 2023

added documentation for the Dirichlet-multinomial distribution stan-dev/docs#693

Merged

Exposed the dirichlet_multinomial distribution stan-dev/stanc3#1389

Merged

chvandorp deleted the feature/issue-54-dirichlet-multinomial branch December 18, 2023 15:36

WardBrian mentioned this pull request Feb 20, 2024

Dirichlet-multinomial density and (P)RNGs #54

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the Dirichlet-Multinomial distribution (issue 54) #2979

Added the Dirichlet-Multinomial distribution (issue 54) #2979

chvandorp commented Nov 28, 2023

SteveBronder commented Dec 4, 2023

chvandorp commented Dec 5, 2023

SteveBronder commented Dec 5, 2023

andrjohns left a comment

chvandorp commented Dec 16, 2023

andrjohns left a comment

Added the Dirichlet-Multinomial distribution (issue 54) #2979

Added the Dirichlet-Multinomial distribution (issue 54) #2979

Conversation

chvandorp commented Nov 28, 2023

Summary

Tests

Side Effects

Release notes

Checklist

SteveBronder commented Dec 4, 2023

chvandorp commented Dec 5, 2023

SteveBronder commented Dec 5, 2023

andrjohns left a comment

Choose a reason for hiding this comment

chvandorp commented Dec 16, 2023

andrjohns left a comment

Choose a reason for hiding this comment