-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CategoricalMADE
#1269
Add CategoricalMADE
#1269
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1269 +/- ##
===========================================
- Coverage 89.31% 78.18% -11.14%
===========================================
Files 119 119
Lines 8779 8916 +137
===========================================
- Hits 7841 6971 -870
- Misses 938 1945 +1007
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Hey @janfb, Currently the PR adds the As far as I can tell all functionalities of The question now is: How should I verify this works? / Which tests should I add/modify? Do you have an idea for a good toy example with several discrete variables that I could use? I have cooked up a toy simulator, for which I am getting good posteriors using SNPE, but for some reason MNLE raises a This is the simulator def toy_simulator(theta: torch.Tensor, centers: list[torch.Tensor]) -> torch.Tensor:
batch_size, n_dimensions = theta.shape
assert len(centers) == n_dimensions, "Number of center sets must match theta dimensions"
# Calculate discrete classes by assiging to the closest center
x_disc = torch.stack([
torch.argmin(torch.abs(centers[i].unsqueeze(1) - theta[:, i].unsqueeze(0)), dim=0)
for i in range(n_dimensions)
], dim=1)
closest_centers = torch.stack([centers[i][x_disc[:, i]] for i in range(n_dimensions)], dim=1)
# Add Gaussian noise to assigned class centers
std = 0.4
x_cont = closest_centers + std * torch.randn_like(closest_centers)
return torch.cat([x_cont, x_disc], dim=1) The setup: torch.random.manual_seed(0)
centers = [
torch.tensor([-0.5, 0.5]),
# torch.tensor([-1.0, 0.0, 1.0]),
]
prior = BoxUniform(low=torch.tensor([-2.0]*len(centers)), high=torch.tensor([2.0]*len(centers)))
theta = prior.sample((20000,))
x = toy_simulator(theta, centers)
theta_o = prior.sample((1,))
x_o = toy_simulator(theta_o, centers) NPE: trainer = SNPE()
estimator = trainer.append_simulations(theta=theta, x=x).train(training_batch_size=1000)
snpe_posterior = trainer.build_posterior(prior=prior)
posterior_samples = snpe_posterior.sample((2000,), x=x_o)
pairplot(posterior_samples, limits=[[-2, 2], [-2, 2]], figsize=(5, 5), points=theta_o) and the equivalent MNLE: trainer = MNLE()
estimator = trainer.append_simulations(theta=theta, x=x).train(training_batch_size=1000)
mnle_posterior = trainer.build_posterior(prior=prior)
mnle_samples = mnle_posterior.sample((10000,), x=x_o)
pairplot(mnle_samples, limits=[[-2, 2], [-2, 2]], figsize=(5, 5), points=theta_o) Hoping this makes sense. Lemme know if you need clarifications anywhere. Thanks for your feedback. |
Hey @janfb, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for tackling this @jnsbck! 👏
Please find below some comments and questions.
There might be some misunderstanding about variables
and categories
on my side. We can have a call if that's more efficient than commenting here.
Cool, thanks for all the feedback! A quick call would be great, also to discuss suitable tests for this. Will reach out via email and tackle the straight forward things in the meantime. |
|
||
# outputs (batch_size, num_variables, num_categories) | ||
def log_prob(self, inputs, context=None): | ||
outputs = self.forward(inputs, context=context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these shapes correct?
After discussion with @janfb I will:
@janfb could you still check tho what is up with the simulator above? Do you have a hunch why the SNPE and MNLE posteriors different? EDIT:
|
8407911
to
2e5898b
Compare
I did a bit more work on this PR, current tests should be passing and I have swapped out all the legacy
This last thing has been haunting me in my sleep, as I cannot figure out what is wrong. Maybe you have an idea of what could be causing this. Help would be much appreciated. @janfb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the update!
Made another round of comments. Happy to have another call to sort them out.
def _initialize(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am missing something the _initialize()
is needed only in MixtureOfGaussiansMADE(MADE):
, not in MADE
, so it's not needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had another look and made two suggestion which could be a reason for the missing first dim fit.
Thanks for all the input <3, looking into the remaining ones over the coming days hopefully |
Turns out, since the posterior is an I have spent some time today and been able to rule a lot of things out (i.e. posterior, sampling, MNLE related things...), which is great, but nonetheless I am still stuck. I have been able to reduce it to the following example of just training a #... snle tutorial (incl in this PR)
from sbi.neural_nets.estimators.categorical_net import CategoricalMADE
# Define independent prior.
prior = MultipleIndependent(
[
Gamma(torch.tensor([1.0]), torch.tensor([0.5])),
Beta(torch.tensor([2.0]), torch.tensor([2.0])),
Beta(torch.tensor([2.0]), torch.tensor([2.0])),
# Beta(torch.tensor([2.0]), torch.tensor([2.0])),
],
validate_args=False,
)
torch.manual_seed(42)
theta_o = prior.sample((1,))
# Training data
num_simulations = 10000
batch_size = 1000
num_epochs = 100
theta = prior.sample((num_simulations,))
x = mixed_simulator(theta)
# only pred disc dimensions
x = x[:, 1:]
made = CategoricalMADE(
num_categories=torch.ones(x.shape[1], dtype=torch.int32)*2,
hidden_features=20,
context_features=theta.shape[1],
)
# quick and dirty training loop
in_batches = lambda x: x.reshape(num_simulations // batch_size, batch_size, -1)
optimizer = torch.optim.Adam(made.parameters(), lr=5e-4)
for i in range(num_epochs):
print(f"\repoch {i+1} / {num_epochs}", end="")
for theta_batch, x_batch in zip(in_batches(theta), in_batches(x)):
optimizer.zero_grad()
loss = -made.log_prob(x_batch, theta_batch).mean()
loss.backward()
optimizer.step()
p_true_disc = theta_o[0, 1:] # theta specifies the true probs
num_disc = x.shape[1]
# compute marginal likelihoods p(x)
choices = torch.arange(2**num_disc).unsqueeze(-1).bitwise_and(2**torch.arange(num_disc)).ne(0).unsqueeze(1)
p_est_disc = torch.zeros(num_disc)
for i in range(num_disc):
ways_of_choosing_i = choices[torch.any(choices[:, :, i], dim=-1)].float()
log_prob = made.log_prob(ways_of_choosing_i, theta_o)
p_est_disc[i] = torch.exp(log_prob).sum().detach()
print("\n")
print(f"true: {p_true_disc}")
print(f"est: {p_est_disc}") # <-- dim=0 incorrect for dim_disc > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be the initialisation of the pytorch native MADE. The first dimension in the >1D discrete variables case has constant output over all batches which prohibits any learning. But I couldn't figure out why this happens. Happy to chat though.
Thanks for the updates @jnsbck ! Good catch @dgedon ! I looked into this a big and noticed that actually the first two dimensions in the output of the forward pass remain constant. As a consequence, the log_probs stay constant across the batch as well, and no learning happens. I dig into into the I noticed that with only two input features (
so changing this to
fixes this to be I also noticed that the problem seems to be located in the
here, all is fine (no constant values over the batch) until the final line. I tried to debug the |
Thanks @janfb!
This assumes that the nflows MADE implementation does not work for d>1 dimensions, right? And actually they do not have a test case for MADE in their repository see here, so this might be reasonable to look into with more care. |
Thanks a ton to both of you! I also did a bit more digging, but apart from what @janfb also found, I have nothing conclusive yet. |
another option would be switching to https://github.com/probabilists/zuko/blob/master/zuko/flows/autoregressive.py instead of |
Hi all, had a look at this repo and I found the problem. the class MADE(nn.Module):
"""Implementation of MADE.
It can use either feedforward blocks or residual blocks (default is residual).
Optionally, it can use batch norm or dropout within blocks (default is no).
"""
def __init__(
self,
features,
hidden_features,
context_features=None,
num_blocks=2,
output_multiplier=1,
use_residual_blocks=True,
random_mask=False,
activation=F.relu,
dropout_probability=0.0,
use_batch_norm=False,
):
if use_residual_blocks and random_mask:
raise ValueError("Residual blocks can't be used with random masks.")
super().__init__()
self.output_multiplier = output_multiplier
# Initial layer.
self.initial_layer = MaskedLinear(
in_degrees=_get_input_degrees(features+1),
out_features=hidden_features,
autoregressive_features=features+1,
random_mask=random_mask,
is_output=False,
)
if context_features is not None:
self.context_layer = nn.Linear(context_features, hidden_features)
# Residual blocks.
blocks = []
if use_residual_blocks:
block_constructor = MaskedResidualBlock
else:
block_constructor = MaskedFeedforwardBlock
prev_out_degrees = self.initial_layer.degrees
for _ in range(num_blocks):
blocks.append(
block_constructor(
in_degrees=prev_out_degrees,
autoregressive_features=features+1,
context_features=context_features,
random_mask=random_mask,
activation=activation,
dropout_probability=dropout_probability,
use_batch_norm=use_batch_norm,
zero_initialization=True,
)
)
prev_out_degrees = blocks[-1].degrees
self.blocks = nn.ModuleList(blocks)
# Final layer.
self.final_layer = MaskedLinear(
in_degrees=prev_out_degrees,
out_features=(features+1) * output_multiplier,
autoregressive_features=(features+1),
random_mask=random_mask,
is_output=True,
)
def forward(self, inputs, context=None):
# add dummy input to ensure all dims conditioned on context.
dummy_input = torch.zeros((inputs.shape[:-1]+(1,)))
concat_input = torch.cat((dummy_input,inputs),dim=-1)
temps = self.initial_layer(concat_input)
if context is not None:
temps += self.context_layer(context)
for block in self.blocks:
temps = block(temps, context)
outputs = self.final_layer(temps)
return outputs[...,self.output_multiplier:] # remove dummy input As far as I understand it, this is a bug in conditional MADE as a whole, unrelated to whether we are estimating categorical or continuous distributions, so for what it's worth @jnsbck I don't think you did anything wrong :) |
Thank you all for taking so much interest in this. This bug was almost literally causing me so many headaches over the last few weeks! And thanks for already making the upstream PR @gmoss13! I also thought about just masking the dimension out in my implementation, but that felt a bit opportunistic w.o. knowing where the problem originated haha. Question: Should we wait for the upstream PR to be merged, fork nflows for the moment or wrap the |
Amazing! |
…ting mixed_density estimator log_probs and sample to work as well
…rg to categorical_model
9ebb6c9
to
9a7e7b7
Compare
I did it :) Hoping this can be merged if tests pass :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is converging now! 🎉
I just added a couple of minor comments.
Done :) Thanks for the final check up! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now. Thanks a lot keep pushing on this 🚀
What does this implement/fix? Explain your changes
This implements a
CategoricalMADE
to generelize MNLE to multiple discrete dimensions addressing #1112.Essentially adapts
nflows
's MixtureofGaussiansMADE to autoregressively model categorical distributions.Does this close any currently open issues?
Fixes #1112
Comments
I have already discussed this with @michaeldeistler.
Checklist
Put an
x
in the boxes that apply. You can also fill these out after creatingthe PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.
guidelines
with
pytest.mark.slow
.guidelines
main
(or there are no conflicts withmain
)