Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CategoricalMADE #1269

Merged
merged 24 commits into from
Feb 25, 2025
Merged

Add CategoricalMADE #1269

merged 24 commits into from
Feb 25, 2025

Conversation

jnsbck
Copy link
Contributor

@jnsbck jnsbck commented Sep 5, 2024

What does this implement/fix? Explain your changes

This implements a CategoricalMADE to generelize MNLE to multiple discrete dimensions addressing #1112.
Essentially adapts nflows's MixtureofGaussiansMADE to autoregressively model categorical distributions.

Does this close any currently open issues?

Fixes #1112

Comments

I have already discussed this with @michaeldeistler.

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

  • I have read and understood the contribution
    guidelines
  • I agree with re-licensing my contribution from AGPLv3 to Apache-2.0.
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have reported how long the new tests run and potentially marked them
    with pytest.mark.slow.
  • New and existing unit tests pass locally with my changes
  • I performed linting and formatting as described in the contribution
    guidelines
  • I rebased on main (or there are no conflicts with main)
  • For reviewer: The continuous deployment (CD) workflow are passing.

Copy link

codecov bot commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 93.24324% with 5 lines in your changes missing coverage. Please review.

Project coverage is 78.18%. Comparing base (18f92b1) to head (36aeb51).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
sbi/neural_nets/estimators/categorical_net.py 91.11% 4 Missing ⚠️
sbi/neural_nets/net_builders/mnle.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1269       +/-   ##
===========================================
- Coverage   89.31%   78.18%   -11.14%     
===========================================
  Files         119      119               
  Lines        8779     8916      +137     
===========================================
- Hits         7841     6971      -870     
- Misses        938     1945     +1007     
Flag Coverage Δ
unittests 78.18% <93.24%> (-11.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sbi/neural_nets/estimators/__init__.py 100.00% <100.00%> (ø)
.../neural_nets/estimators/mixed_density_estimator.py 94.73% <100.00%> (-1.49%) ⬇️
sbi/neural_nets/net_builders/categorial.py 95.83% <100.00%> (+1.09%) ⬆️
sbi/utils/nn_utils.py 94.44% <100.00%> (+6.20%) ⬆️
sbi/neural_nets/net_builders/mnle.py 96.55% <85.71%> (-3.45%) ⬇️
sbi/neural_nets/estimators/categorical_net.py 93.65% <91.11%> (-4.18%) ⬇️

... and 37 files with indirect coverage changes

@jnsbck
Copy link
Contributor Author

jnsbck commented Sep 16, 2024

Hey @janfb,
would very much appreciate your input at this stage:

Currently the PR adds the CategoricalMADE and builder build_autoregressive_categoricalestimator + some minor modifications to build_mnle and MixedDensityEstimator. This enables multiple discrete variables with different numbers of classes via trainer = MNLE(density_estimator=lambda x,y: build_mnle(y,x,categorical_model="made")) Note that for some reason x and y have to be flipped for mnle.

As far as I can tell all functionalities of CategoricalMADE work for both 1D and ND inputs and running the Example_01_DecisionMakingModel.ipynb with the CatMADE matches the ground truth
image

The question now is: How should I verify this works? / Which tests should I add/modify? Do you have an idea for a good toy example with several discrete variables that I could use?

I have cooked up a toy simulator, for which I am getting good posteriors using SNPE, but for some reason MNLE raises a RuntimeError: probability tensor contains either 'inf', 'nan' or element < 0 Even for the unmodified MNLE. Any ideas why this could be?

This is the simulator

def toy_simulator(theta: torch.Tensor, centers: list[torch.Tensor]) -> torch.Tensor:
    batch_size, n_dimensions = theta.shape
    assert len(centers) == n_dimensions, "Number of center sets must match theta dimensions"
    
    # Calculate discrete classes by assiging to the closest center
    x_disc = torch.stack([
        torch.argmin(torch.abs(centers[i].unsqueeze(1) - theta[:, i].unsqueeze(0)), dim=0)
        for i in range(n_dimensions)
    ], dim=1)

    closest_centers = torch.stack([centers[i][x_disc[:, i]] for i in range(n_dimensions)], dim=1)
    # Add Gaussian noise to assigned class centers
    std = 0.4
    x_cont = closest_centers + std * torch.randn_like(closest_centers)
       
    return torch.cat([x_cont, x_disc], dim=1)

The setup:

torch.random.manual_seed(0)
centers = [
    torch.tensor([-0.5, 0.5]),
    # torch.tensor([-1.0, 0.0, 1.0]),
]

prior = BoxUniform(low=torch.tensor([-2.0]*len(centers)), high=torch.tensor([2.0]*len(centers)))
theta = prior.sample((20000,))
x = toy_simulator(theta, centers)

theta_o = prior.sample((1,))
x_o = toy_simulator(theta_o, centers)

NPE:

trainer = SNPE()
estimator = trainer.append_simulations(theta=theta, x=x).train(training_batch_size=1000)

snpe_posterior = trainer.build_posterior(prior=prior)
posterior_samples = snpe_posterior.sample((2000,), x=x_o)
pairplot(posterior_samples, limits=[[-2, 2], [-2, 2]], figsize=(5, 5), points=theta_o)

and the equivalent MNLE:

trainer = MNLE()
estimator = trainer.append_simulations(theta=theta, x=x).train(training_batch_size=1000)

mnle_posterior = trainer.build_posterior(prior=prior)
mnle_samples = mnle_posterior.sample((10000,), x=x_o)
pairplot(mnle_samples, limits=[[-2, 2], [-2, 2]], figsize=(5, 5), points=theta_o)

Hoping this makes sense. Lemme know if you need clarifications anywhere. Thanks for your feedback.

@jnsbck
Copy link
Contributor Author

jnsbck commented Oct 22, 2024

Hey @janfb,
you might have missed this, but I would be happy about feedback :)

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for tackling this @jnsbck! 👏

Please find below some comments and questions.
There might be some misunderstanding about variables and categories on my side. We can have a call if that's more efficient than commenting here.

@jnsbck
Copy link
Contributor Author

jnsbck commented Nov 4, 2024

Cool, thanks for all the feedback! A quick call would be great, also to discuss suitable tests for this. Will reach out via email and tackle the straight forward things in the meantime.


# outputs (batch_size, num_variables, num_categories)
def log_prob(self, inputs, context=None):
outputs = self.forward(inputs, context=context)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these shapes correct?

@jnsbck
Copy link
Contributor Author

jnsbck commented Nov 14, 2024

After discussion with @janfb I will:

  1. adapt the simulator of Example_01_DecisionMakingModel.ipynb to multiple discrete variables.
  2. Get this to run for 1D and ND
  3. fix remaining comments/issues
  4. possibly refactor (fold 1D into ND code).

@janfb could you still check tho what is up with the simulator above? Do you have a hunch why the SNPE and MNLE posteriors different?

EDIT:

  1. wip
  2. add new tests / update old ones with mutli dim example.

@jnsbck jnsbck force-pushed the jnsbck-categorical_made branch 2 times, most recently from 8407911 to 2e5898b Compare January 9, 2025 16:39
@jnsbck
Copy link
Contributor Author

jnsbck commented Jan 9, 2025

I did a bit more work on this PR, current tests should be passing and I have swapped out all the legacy CategoricalNet code for the CategoricalMADE. See changes and comments above (please close if no longer relevant).
A few things remain:

  • Add a bit more docs and comments
  • add test cases
  • adapt the tutorial to 2d? (Can just add another beta distribution to the prior)
  • make sure it runs for ND.

This last thing has been haunting me in my sleep, as I cannot figure out what is wrong. Maybe you have an idea of what could be causing this.
For 1D it works, but for ND it always gets the first discrete dim wrong, i.e. yields the prior (all other dims are correct, see image). I am not sure if the conditioning for the first dimension is broken somehow, but I am not able to pin down where this would be happening in my code.

image

Help would be much appreciated. @janfb

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the update!
Made another round of comments. Happy to have another call to sort them out.

Comment on lines 113 to 153
def _initialize(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am missing something the _initialize() is needed only in MixtureOfGaussiansMADE(MADE):, not in MADE, so it's not needed here?

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had another look and made two suggestion which could be a reason for the missing first dim fit.

@jnsbck
Copy link
Contributor Author

jnsbck commented Feb 5, 2025

Thanks for all the input <3, looking into the remaining ones over the coming days hopefully

@jnsbck
Copy link
Contributor Author

jnsbck commented Feb 6, 2025

Turns out, since the posterior is an MCMCPosterior, only log_prob is used for sampling and not sample. This bug still eludes me.

I have spent some time today and been able to rule a lot of things out (i.e. posterior, sampling, MNLE related things...), which is great, but nonetheless I am still stuck. I have been able to reduce it to the following example of just training a CategoricalMADE, which means that whatever is causing the weird behaviour can be found here. I am sure I am completely missing something probably obvious. I hope the code below makes it easier to identify.
@dgedon , @janfb .

#... snle tutorial (incl in this PR)

from sbi.neural_nets.estimators.categorical_net import CategoricalMADE

# Define independent prior.
prior = MultipleIndependent(
    [
        Gamma(torch.tensor([1.0]), torch.tensor([0.5])),
        Beta(torch.tensor([2.0]), torch.tensor([2.0])),
        Beta(torch.tensor([2.0]), torch.tensor([2.0])),
        # Beta(torch.tensor([2.0]), torch.tensor([2.0])),
    ],
    validate_args=False,
)

torch.manual_seed(42)
theta_o = prior.sample((1,))

# Training data
num_simulations = 10000
batch_size = 1000
num_epochs = 100
theta = prior.sample((num_simulations,))
x = mixed_simulator(theta)

# only pred disc dimensions
x = x[:, 1:]

made = CategoricalMADE(
    num_categories=torch.ones(x.shape[1], dtype=torch.int32)*2,
    hidden_features=20,
    context_features=theta.shape[1],
)

# quick and dirty training loop
in_batches = lambda x: x.reshape(num_simulations // batch_size, batch_size, -1)
optimizer = torch.optim.Adam(made.parameters(), lr=5e-4)
for i in range(num_epochs):
    print(f"\repoch {i+1} / {num_epochs}", end="")
    for theta_batch, x_batch in zip(in_batches(theta), in_batches(x)):
        optimizer.zero_grad()
        loss = -made.log_prob(x_batch, theta_batch).mean()
        loss.backward()
        optimizer.step()

p_true_disc = theta_o[0, 1:] # theta specifies the true probs
num_disc = x.shape[1]

# compute marginal likelihoods p(x)
choices = torch.arange(2**num_disc).unsqueeze(-1).bitwise_and(2**torch.arange(num_disc)).ne(0).unsqueeze(1)
p_est_disc = torch.zeros(num_disc)
for i in range(num_disc):
    ways_of_choosing_i = choices[torch.any(choices[:, :, i], dim=-1)].float()
    log_prob = made.log_prob(ways_of_choosing_i, theta_o)
    p_est_disc[i] = torch.exp(log_prob).sum().detach()

print("\n")
print(f"true: {p_true_disc}")
print(f"est: {p_est_disc}") # <-- dim=0 incorrect for dim_disc > 1 

Copy link
Collaborator

@dgedon dgedon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be the initialisation of the pytorch native MADE. The first dimension in the >1D discrete variables case has constant output over all batches which prohibits any learning. But I couldn't figure out why this happens. Happy to chat though.

@janfb
Copy link
Contributor

janfb commented Feb 10, 2025

Thanks for the updates @jnsbck !

Good catch @dgedon ! I looked into this a big and noticed that actually the first two dimensions in the output of the forward pass remain constant. As a consequence, the log_probs stay constant across the batch as well, and no learning happens.

I dig into into the nflows MADE code to find out why this happens, but I could not find a solution either. I tried changing the initialization e.g., make it a broader uniform to induce more variability in the first autoregressive dimension (which is only influenced by the bias term).
I could not find out why the second dimension remains constant as well.

I noticed that with only two input features (autoregressive_features), the degrees in the MADE are just [1, 1, 1, ...], because

                max_ = max(1, autoregressive_features - 1)
                min_ = min(1, autoregressive_features - 1)
                out_degrees = torch.arange(out_features) % max_ + min_

so min_=max_=1.

changing this to

out_degrees = torch.arange(out_features) % autoregressive_features + 1

fixes this to be [1, 2, 1, 2, ...], but it does not fix the problem.

I also noticed that the problem seems to be located in the final_layer of the MADE, e.g.,

    def forward(self, inputs, context=None):
        temps = self.initial_layer(inputs)
        if context is not None:
            temps += self.context_layer(context)
        for block in self.blocks:
            temps = block(temps, context)
        outputs = self.final_layer(temps)
        return outputs

here, all is fine (no constant values over the batch) until the final line.

I tried to debug the MaskedLinear linear that defines the final layer, but could not find the problem.

Maybe one of you @dgedon or @jnsbck give it a try as well?

@dgedon
Copy link
Collaborator

dgedon commented Feb 10, 2025

Thanks @janfb!

I dig into into the nflows MADE code to find out why this happens

This assumes that the nflows MADE implementation does not work for d>1 dimensions, right? And actually they do not have a test case for MADE in their repository see here, so this might be reasonable to look into with more care.

@jnsbck
Copy link
Contributor Author

jnsbck commented Feb 10, 2025

Thanks a ton to both of you! I also did a bit more digging, but apart from what @janfb also found, I have nothing conclusive yet.

@janfb
Copy link
Contributor

janfb commented Feb 10, 2025

another option would be switching to zuko instead, e.g., use their

https://github.com/probabilists/zuko/blob/master/zuko/flows/autoregressive.py

instead of nflows MADE. They use a MaskedMLP as a conditioning net. I could not figure out the shape handling yet though.

@gmoss13
Copy link
Contributor

gmoss13 commented Feb 13, 2025

Hi all, had a look at this repo and I found the problem. the nflows implementation of MADE doesn't take the context into account correctly. The input to the "first" dimension in MADE gets completely masked out, which is fine for unconditional MADE as the first dimension is generated unconditionally. But as it is currently implemented, nflows masks out the input to the last layer, which includes information from the context. I think the easiest fix for this, which I will also put up as a PR in nflows, is to simply add a dummy first variable in the MADE implementation in nflows, as follows:

class MADE(nn.Module):
    """Implementation of MADE.

    It can use either feedforward blocks or residual blocks (default is residual).
    Optionally, it can use batch norm or dropout within blocks (default is no).
    """

    def __init__(
        self,
        features,
        hidden_features,
        context_features=None,
        num_blocks=2,
        output_multiplier=1,
        use_residual_blocks=True,
        random_mask=False,
        activation=F.relu,
        dropout_probability=0.0,
        use_batch_norm=False,
    ):
        if use_residual_blocks and random_mask:
            raise ValueError("Residual blocks can't be used with random masks.")
        super().__init__()

        self.output_multiplier = output_multiplier
        # Initial layer.
        self.initial_layer = MaskedLinear(
            in_degrees=_get_input_degrees(features+1),
            out_features=hidden_features,
            autoregressive_features=features+1,
            random_mask=random_mask,
            is_output=False,
        )

        if context_features is not None:
            self.context_layer = nn.Linear(context_features, hidden_features)

        # Residual blocks.
        blocks = []
        if use_residual_blocks:
            block_constructor = MaskedResidualBlock
        else:
            block_constructor = MaskedFeedforwardBlock
        prev_out_degrees = self.initial_layer.degrees
        for _ in range(num_blocks):
            blocks.append(
                block_constructor(
                    in_degrees=prev_out_degrees,
                    autoregressive_features=features+1,
                    context_features=context_features,
                    random_mask=random_mask,
                    activation=activation,
                    dropout_probability=dropout_probability,
                    use_batch_norm=use_batch_norm,
                    zero_initialization=True,
                )
            )
            prev_out_degrees = blocks[-1].degrees
        self.blocks = nn.ModuleList(blocks)

        # Final layer.
        self.final_layer = MaskedLinear(
            in_degrees=prev_out_degrees,
            out_features=(features+1) * output_multiplier,
            autoregressive_features=(features+1),
            random_mask=random_mask,
            is_output=True,
        )

    def forward(self, inputs, context=None):
        # add dummy input to ensure all dims conditioned on context.
        dummy_input = torch.zeros((inputs.shape[:-1]+(1,)))
        concat_input = torch.cat((dummy_input,inputs),dim=-1)
        temps = self.initial_layer(concat_input)
        if context is not None:
            temps += self.context_layer(context)
        for block in self.blocks:
            temps = block(temps, context)
        outputs = self.final_layer(temps)
        return outputs[...,self.output_multiplier:] # remove dummy input

As far as I understand it, this is a bug in conditional MADE as a whole, unrelated to whether we are estimating categorical or continuous distributions, so for what it's worth @jnsbck I don't think you did anything wrong :)

@jnsbck
Copy link
Contributor Author

jnsbck commented Feb 14, 2025

Thank you all for taking so much interest in this. This bug was almost literally causing me so many headaches over the last few weeks! And thanks for already making the upstream PR @gmoss13! I also thought about just masking the dimension out in my implementation, but that felt a bit opportunistic w.o. knowing where the problem originated haha.

Question: Should we wait for the upstream PR to be merged, fork nflows for the moment or wrap the MADE and overwrite forward with the dummy variable?

@janfb
Copy link
Contributor

janfb commented Feb 20, 2025

@jnsbck this is now fixed by a patch within sbi, see #1398

so you can proceed finishing this PR 🚀

@jnsbck
Copy link
Contributor Author

jnsbck commented Feb 20, 2025

Amazing!

@jnsbck jnsbck force-pushed the jnsbck-categorical_made branch from 9ebb6c9 to 9a7e7b7 Compare February 21, 2025 15:20
@jnsbck
Copy link
Contributor Author

jnsbck commented Feb 21, 2025

I did it :)
Ready for final send off from my side. Tests passing on my end. Added a comment about option to run with >1D discrete conditions and changed all MNLE tests to pass for discrete_dim ==2.

Hoping this can be merged if tests pass :)

@jnsbck jnsbck requested a review from janfb February 21, 2025 15:39
Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is converging now! 🎉

I just added a couple of minor comments.

@jnsbck
Copy link
Contributor Author

jnsbck commented Feb 25, 2025

Done :) Thanks for the final check up!

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks a lot keep pushing on this 🚀

@janfb janfb merged commit 5914851 into sbi-dev:main Feb 25, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change MixedDensityEstimator to AutoregressiveMixedDensityEstimator
4 participants