Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of allowing target+= #979

Closed
WardBrian opened this issue Sep 29, 2021 · 23 comments · Fixed by #1435
Closed

Inconsistency of allowing target+= #979

WardBrian opened this issue Sep 29, 2021 · 23 comments · Fixed by #1435

Comments

@WardBrian
Copy link
Member

Consider two stan programs:

prog1:

transformed parameters {
   target += 1;
}

prog2:

functions {
   void foo_lp(real x){
     target += x;
   }
}

transformed parameters {
   foo_lp(1);
}

prog1. stan produces a type error:

Semantic error in 'prog1.stan', line 2, column 3 to column 15:
   -------------------------------------------------
     1:  transformed parameters {
     2:     target += 1;
            ^
     3:  }
   -------------------------------------------------
Target can only be accessed in the model block or in definitions of functions with the suffix _lp.

prog2.stan compiles just fine.

This is because we have this check for target+= statements:

let semantic_check_target_pe_usage ~loc ~cf =
if cf.in_lp_fun_def || cf.current_block = Model then Validate.ok ()

And this check for the calling of _lp functions:

let semantic_check_fn_target_plus_equals cf ~loc id =
Validate.(
if
String.is_suffix id.name ~suffix:"_lp"
&& not
( cf.in_lp_fun_def || cf.current_block = Model
|| cf.current_block = TParam )
then Semantic_error.target_plusequals_outisde_model_or_logprob loc |> error
else ok ())

@nhuurre
Copy link
Collaborator

nhuurre commented Sep 29, 2021

AFAIK this inconsistency was inherited from stanc2, see #767 .
Do you have suggestion for what to do about it?

@WardBrian
Copy link
Member Author

IMO I think we should allow target+= in TParams. @bob-carpenter?

@SteveBronder
Copy link
Contributor

^ I agree

@bob-carpenter
Copy link
Member

I also agree. My original intention was to allow target += in the transformed parameter block because I thought that'd be the natural place to drop Jacobians. Ideally, this should go along with exposing the boolean template parameter for whether the Jacobian is applied. That would give us enough power to implement transforms that match Stan's behavior in the language.

@WardBrian
Copy link
Member Author

It was a quick change to allow target += (#980). The idea of a boolean to allow target adjustments to depend on a jacobian parameter is probably a larger scope change.

@bob-carpenter would you be willing to look into updating the doc accordingly? I'm not sure if the current behavior (where functions allowed you to do this) is even documented

@bob-carpenter
Copy link
Member

Sure, I can update the reference manual. I'm not sure I want to start changing programs in the user's guide, but I'll check where we talk about blocks there, too.

@WardBrian
Copy link
Member Author

Yeah, we don't need to update examples, I'm more just thinking that if there are any statements like "You can only use target += in the model block" they will need to change

@bob-carpenter
Copy link
Member

I opened stan-dev/docs#413 for this.

@betanalpha
Copy link

betanalpha commented Oct 1, 2021 via email

@bob-carpenter
Copy link
Member

Allowing the target to be modified within the transformed parameter block makes Stan programs harder to parse overall.

I take it you mean for a human reader, not stanc3. I think that's the main place we disagree. My intention all along was to allow target increments in the transformed parameters block for Jacobians. Abusing that usage is indeed a way to write a confusing program.

It’s only in the context of trying to assign a density on the output value that a Jacobian is needed, so the proper encapsulation is

My goal was to mirror how our constraints work for parameters. The constraints are applied as soon as they're declared regardless of how the variable is used. For example,

parameters {
  real<lower=0> sigma;
}

always applies the Jacobian correction to make the distribution over sigma improper uniform on (0, infty). And if you apply a constraint down to a compact set, you don't need a sampling statement---just the transform will lead to a proper prior, as we see with

real<lower=0, upper=1> theta;
simplex[K] phi;

Especially given how interfaces will change behavior if the model block is empty.

I don't even see how that'd even be possible for an interface. Are you thinking about plans to run the fixed-param sampler when the parameter size is zero? That's not even going to require the parameter block to be empty, because we can declare size-zero parameters. For example, the following complete Stan program would run with HMC if N > 0 and with fixed-param if N = 0 (not that the fixed-param version would do anything).

data {
  int<lower=0> N;
}
parameters {
   vector<lower=0, upper=1>[N] alpha;
}

@WardBrian
Copy link
Member Author

@betanalpha - would you propose that we disallow the calling of _lp functions in transformed parameters then? I think that would be a rather large breaking change if anyone was relying on the behavior

@betanalpha
Copy link

betanalpha commented Oct 1, 2021 via email

@betanalpha
Copy link

betanalpha commented Oct 1, 2021 via email

@SteveBronder
Copy link
Contributor

@betanalpha https://github.com/betanalpha - would you propose that we disallow the calling of _lp functions in transformed parameters then? I think that would be a rather large breaking change if anyone was relying on the behavior

Yes I personally would prefer that,, although we’ve had similar discussions before where few agreed with me.

One reason I like having the _lp and target += in the transformed params is that it's the only scheme I can think of that would allow for user defined constrains that can be saved. For instance thinking about something like the below where a user wants to write a ordered_simplex_constrain function. Supposing there was a global jacobian keyword (or whatever name/syntax we want for increment when we want a jacobian correction). Then the user writes a _constrain and _free function and applies it in transformed params

functions {
vector[] ordered_simplex_constrain(...) {
  // constrain stuff 
  if (jacobian) {
   target += ...;
  }
};
  ordered_simplex_free(...) {
    // inverse of constrain stuff
  }
}

...

parameters {
 vector[N] blah;
}
transformed parameters {
  vector[N] my_blah = ordered_simplex_constrain(blah);
}

ordered_simplex_constrain() could be done in the model block but then the transformed param would not be saved. If there's some other way to do these user defined constraints as a function then I'm open to it. Maybe there's some middle ground for user defined constraints where these functions can access target when called from the transformed parameters block as long as it's surrounded by if (jacobian) {} (though that's kind of complicated logic and idk how intuitive that would be to users)

@betanalpha
Copy link

betanalpha commented Oct 1, 2021 via email

@bob-carpenter
Copy link
Member

Steve it seems what you’re trying to do is allow for user-defined types.

Yes, something like that. There's an issue of how the Jacobian and inverse can share computation and where it'd get declared. And then how to parameterize transforms (like lower = L) and account for size declarations. It's going to be tricky.

@SteveBronder
Copy link
Contributor

Steve it seems what you’re trying to do is allow for user-defined types.

Yes I do like that!

And then how to parameterize transforms (like lower = L) and account for size declarations. It's going to be tricky.

I think for what Michael is talking about here the user would write like lb_vector.

There's an issue of how the Jacobian and inverse can share computation

I think I'd like a middleground where if (Jacobian) could be used in the constraint so computation can be shared. Though imo if a new type becomes very popular we can always backport it into Stan math / the language and have the more efficient computation then.

@betanalpha
Copy link

betanalpha commented Oct 18, 2021 via email

@bob-carpenter
Copy link
Member

Absolutely. I designed the language so that users could focus on the constrained space (1) and algorithms could focus on the unconstrained space (3).

Unfortunately, that abstraction is leaky when it comes to performance. For example, users have to think about the unconstrained space in moving from a centered to non-centered parameterization.

I didn't understand the comments about indirect implementation in terms of types. The key issue for performance is that we want to be able to compute the constraining transform and log absolute Jacobian determinant together. However we do that, we absolutely have to implement the adjoint-Jacobian product form for efficiency. In the math library, our reverse-mode autodiff functions compute values in the forward pass and store whatever intermediates we might need later for the adjoint-Jacobian product computed in the reverse pass.

@betanalpha
Copy link

betanalpha commented Nov 1, 2021 via email

@bob-carpenter
Copy link
Member

The centered and non-centered parameterizations are different parameterizations and hence defined by different Stan programs.

Exactly. So I don't see why you say the user doesn't have to think about the unconstrained spaces as those are the spaces over which the Stan programs operate for sampling.

The shift/multiply implementation makes this all implicit, and in my opinion confusing.

I agree that shift/multiply is confusing. The whole transformation from constrained to unconstrained is confusing for our users. I just think it's less confusing and far less error prone than recoding everything oneself. But it's a bit less efficient.

As I mentioned previously it also overloads the <> annotation

I don't see that as an overload so much as thinking about it as a general transform that's not necessarily constraining.

I think that the adjoint-Jacobian comments here are a red herring.

I just think of it as the way autodiff works. It's how our chain() methods get written for efficiency for multivariate or matrix functions.

only a program implementing the target density on the final, unconstrained space has the scope to be able to avoid redundant calculations.

I agree that gives you the most generality. It's one of the reasons I'd like to roll our transforms into the language. But it's not the only way to avoid redundancy. For example, it's better to implement a lognormal distribution on a parameter as a normal distribution on the log of the parameter then exp() inverse transform back to positive. When you use lognormal, there's a bit of extra work making and then undoing the transform.

the Stan program specifies only
log_pi(x)
with c(y) and log J_{c}(y) both hidden under the language abstraction.

Yes, that was intentional. But rather than "hidden", I'd say "encapsulated" :-). Of course, you can use only unconstrained types and apply all your own transforms and Jacobian adjustments. (For those following along at home, the J in @betanalpha's notation is the absolute determinant of the Jacobian.)

@betanalpha
Copy link

betanalpha commented Nov 10, 2021 via email

@bob-carpenter
Copy link
Member

I believe that the design of the language introduces a fundamental difference between unconstraining transformations, X subset R^N -> R^M and 1-1 transformations R^M -> R^M.

Both the language syntax and the math back-end are agnostic about the content of transforms. There's nothing at all built into the language or the math back end that requires a range that's different than their domain.

why is writing out the unconstrained model directly, where everything is in scope to optimize the implementation, so onerous?

(1) Because it's inconveniently split over blocks and requires double declaration, and (2) the intended log density never shows up directly in the model.

parameters {
  vector<offset=mu, multiplier=sigma>[N] alpha;
}
model {
  alpha ~ normal(mu, sigma);
}

and

parameters {
  vector[N] alpha_std
}
transformed parameters {
  vector[N] alpha = mu + sigma * alpha_std;
}
model {
  alpha_std ~ normal(0, 1);
}

Then there's the secondary problem of getting double the output in CmdStan or having to explicitly filter it in RStan, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants