use immediately invoked lambdas in size and range error checks #2255

SteveBronder · 2020-12-11T19:52:47Z

Summary

See issue #2249 for technical info. This uses immediately invoked lambdas in size and range error checks. This makes the actual code for the error checks smaller which should help the compiler when attempting to inline these.

This also adds some useful compiler attribute macros. One to call out in particular is STAN_REMOVE_RANGE_AND_SIZE_CHECKS which is the user passes in at compile time will turn off all range and size checks

Tests

Refactor so no new tests

Side Effects

We should generally do this pattern in all the error checks so in a future PR would be good to do that

Release notes

Use immediately invoked lambdas in size and range error checks

Checklist

Math issue Use immedietly invoked lambdas to make error checking less expensive #2249
Copyright holder: Steve Bronder

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

…ode in a cold path for better inlining

…4.1 (tags/RELEASE_600/final)

stan-buildbot · 2020-12-12T03:21:06Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.63	3.55	1.02	2.36% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.95	-5.81% slower
eight_schools/eight_schools.stan	0.13	0.11	1.1	9.43% faster
gp_regr/gp_regr.stan	0.16	0.15	1.0	0.49% faster
irt_2pl/irt_2pl.stan	5.83	5.96	0.98	-2.16% slower
performance.compilation	88.11	86.33	1.02	2.02% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.41	8.41	1.0	0.0% slower
pkpd/one_comp_mm_elim_abs.stan	29.83	30.01	0.99	-0.63% slower
sir/sir.stan	141.79	132.08	1.07	6.84% faster
gp_regr/gen_gp_data.stan	0.05	0.05	1.02	1.78% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.91	2.92	1.0	-0.17% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.38	0.39	0.97	-3.25% slower
arK/arK.stan	2.47	1.78	1.39	28.05% faster
arma/arma.stan	0.58	0.74	0.79	-26.53% slower
garch/garch.stan	0.61	0.53	1.14	12.02% faster
Mean result: 1.02979338971

Jenkins Console Log
Blue Ocean
Commit hash: e23b147

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

bbbales2 · 2020-12-12T19:49:45Z

I'm not in for the STAN_NO_RANGE_AND_SIZE_CHECK macros. I see the use for experimenting, but it just seems like another code branch to maintain.

The COLD_PATH stuff... Wow what a speedup. Looks worth it

SteveBronder · 2020-12-13T06:27:27Z

In totally fine with removing the no check macro. It was mostly just an idea

t4c1

On the other hand I like STAN_NO_RANGE_AND_SIZE_CHECK. I believe we already have some functions in which significant amount of time is spent in checks. Also this is not much work to maintain. It is just an option for early return, without any special cases. Although we may be able to find a better name for this macro. Maybe STAN_RETURN_IF_NO_CHECKS?

stan/math/prim/err/check_matching_dims.hpp

stan/math/prim/err/check_nonzero_size.hpp

stan/math/prim/meta/macros.hpp

wds15 · 2020-12-14T09:33:09Z

What about STAN_NDEBUG ?

SteveBronder · 2020-12-14T15:13:40Z

What about STAN_NDEBUG ?

Oh wait, why are we reinventing the wheel here why don't we just use NDEBUG??

wds15 · 2020-12-14T16:00:54Z

NDEBUG is fine for me as well...

bbbales2 · 2020-12-14T16:56:56Z

On the other hand I like STAN_NO_RANGE_AND_SIZE_CHECK

It just seems more complicated. Like how is this exposed to the surface? The cold path stuff is for free to the user.

I think this would definitely make sense for the O1/O2/O3 discussion. Like, turn off all nan/inf (the expensive ones) at O3 or something -- but that's just a different thing.

…ro to decide to turn on or off range checks

SteveBronder · 2020-12-14T18:25:04Z

It's exposed to the surface by the user defining -DNDEBUG so that at compile time it knows these checks are no ops. It's commonly used in things like the standard library to turn off debug information.

https://en.m.wikibooks.org/wiki/C_Programming/assert.h

bbbales2 · 2020-12-14T18:41:43Z

Without a need for this outside of benchmarking Math, or without a plan to expose them at the interfaces, they're really not doing anything and I don't see why we'd add them. We can always add them once we've got a good reason. The idea doesn't go away if we don't put them in now.

I wouldn't call this debug information. It is the intended behavior of the Math library to do these checks. The COLD_PATH stuff is different if I'm understanding correctly. Big performance difference and anyone using gcc gets it for free.

wds15 · 2020-12-14T19:16:09Z

While I do like the idea to be able to kill those checks for the sake of speed, I do concur here that this is actually a separate thing to sort out. These checks have triggered many debates - heated ones - and it‘s probably better to do that in a dedicated PR.

(I am in favor for having the knob to turn this off, but still let‘s do one by one)

rok-cesnovar · 2020-12-14T19:21:39Z

I agree this should be a separate thing and should not be NDEBUG. These checks are not really debug information.

SteveBronder · 2020-12-14T19:27:34Z

Without a need for this outside of benchmarking Math, or without a plan to expose them at the interfaces,

I'm not sure I'm following, what do you mean when you say we don't have a plan to expose these at the user interface? Or maybe a better question is, what does that look like to you? In my mindplace the user can just add -DNDEBUG to their Makevars in R and cmdstan users would put that in make/local

they're really not doing anything and I don't see why we'd add them. We can always add them once we've got a good reason. The idea doesn't go away if we don't put them in now.

It does do something, which is that it makes the functions do nothing! Since this makes these functions no-ops the compiler completely ignores them

I wouldn't call this debug information. It is the intended behavior of the Math library to do these checks.

Yes and the std library treats it the same way where the intended behavior of assert() is to stop the program when something goes wrong. Then if you have NDEBUG on it changes the behavior to not stop the program. This only effects things if the user ops into it (aka passing -DNDEBUG at compile time)

The COLD_PATH stuff is different if I'm understanding correctly. Big performance difference and anyone using gcc gets it for free.

There's two things happening here.

COLD_PATH puts the error code on a code path that the CPU will never try to branch predict / preload anything related to the error path. __GNUC__ is actually defined for both clang and gcc so both compiler users will get this benefit.
When the user passes -DNDEBUG to the compiler these ops are completely turned off so that the compiler sees they are no-ops and doesn't bother even adding them to the binary.

NDEBUG is only meant to be used for "release" versions of a program. So the user would only turn this on once they've run this program once and made sure all their sizes /ranges are correct.

bbbales2 · 2020-12-14T20:00:00Z

So the user would only turn this on once they've run this program once and made sure all their sizes /ranges are correct.

Yeah, the use case makes sense.

Dropping checks has come up before (I've heard @seantalts talk about this, and @bgoodri may have as well, though he can correct me). We could probably optionally turn off nan/infs/positive definite and bounds checks separately.

It's a popular idea just needs talked to the interfaces, because that's where it will get used.

I'm not sure I'm following, what do you mean when you say we don't have a plan to expose these at the user interface?

The interfaces mean how will people use it from cmdstan, cmdstanr, cmdstanpy, rstan, or pystan. Things like the threading on/off flags haven't been a walk in the park with how make and the precompiled headers work (though I think Rok has some sort of solution there now) even though there is a clear use case. rstanarm and brms and whatnot could probably use it too.

stan-buildbot · 2020-12-15T00:38:19Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.75	3.4	1.1	9.45% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.99	-1.2% slower
eight_schools/eight_schools.stan	0.11	0.11	1.0	-0.49% slower
gp_regr/gp_regr.stan	0.15	0.16	0.99	-1.07% slower
irt_2pl/irt_2pl.stan	5.4	5.6	0.96	-3.8% slower
performance.compilation	88.56	85.97	1.03	2.92% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.37	8.36	1.0	0.03% faster
pkpd/one_comp_mm_elim_abs.stan	30.45	29.7	1.03	2.48% faster
sir/sir.stan	141.96	133.19	1.07	6.18% faster
gp_regr/gen_gp_data.stan	0.04	0.04	1.0	-0.25% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.92	2.91	1.0	0.26% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.38	0.39	0.96	-3.65% slower
arK/arK.stan	2.47	1.78	1.39	27.86% faster
arma/arma.stan	0.58	0.74	0.79	-26.49% slower
garch/garch.stan	0.6	0.54	1.12	10.72% faster
Mean result: 1.02824665276

Jenkins Console Log
Blue Ocean
Commit hash: 1e10fc5

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

SteveBronder · 2020-12-15T02:13:52Z

Dropping checks has come up before (I've heard @seantalts talk about this, and @bgoodri may have as well, though he can correct me). We could probably optionally turn off nan/infs/positive definite and bounds checks separately.

imo I'd rather not turn these off since the turning off non/inf/ number bounds checks is usually associated with things like -Ofast and imo is a bit more of a problem. We could potentially turn those off but it's a much riskier thing than turning of the size and range checks.

The interfaces mean how will people use it from cmdstan, cmdstanr, cmdstanpy, rstan, or pystan. Things like the threading on/off flags haven't been a walk in the park with how make and the precompiled headers work (though I think Rok has some sort of solution there now) even though there is a clear use case. rstanarm and brms and whatnot could probably use it too.

Why wouldn't the user just add -DNDEBUG to their Makevars in R and cmdstan users would put that in make/local? @rok-cesnovar would this mess the the pre-complied headers? My only guess is that the precompiled header would not have NDEBUG defined so worst case this wouldn't do anything

rok-cesnovar · 2020-12-15T08:05:43Z

Why wouldn't the user just add -DNDEBUG to their Makevars in R and cmdstan users would put that in make/local?

One fear I have for this is that some more advanced users may already have this flag and expect to still get the "normal" Stan behavior. Though that can be avoided by prominently displaying this in the release notes. I still think we should avoid NDEBUG.

I am a bit less worried if this only concerns indexing bounds checks and not element-wise checks for nan/inf. For those I would be worried.

would this mess the the precomplied headers?

I dont think so, worst case the user needs to rebuild cmdstan after setting those. That is advised in general to anyone that touches CXXFLAGS directly. We have "quality-of-life" improvements when using STAN_THREADS, STAN_OPENCL and STAN_MPI (no need to rebuild manually) because that is what we expect users to use and that is what we promote or will promote. Anyone setting arbitrary C++ flags that we dont document or publicize/promote is "on their own". We have zero to little knowledge of how Stan behaves with various g++/clang++ flags and we also probably dont have the resources to try/test most of them.

For rstan/pystan I have no idea how/if they handle prebuilding main.o and CRTP and all that. I think we should not cause more issue for those tightly-coupled interfaces by suggesting users to try NDEBUG there. If someone is this concerned with speed they should be using cmdstan in the first place, imo.

t4c1 · 2020-12-15T08:25:15Z

I still think adding an option to turn off checks is a good idea, but I think it should not use NDEBUG macro. Also it might be even better to enable finer control by having two macros - one for bounds checks and one for value checks. However, turning off checks is a separate topic and should probably be in a separate PR.

SteveBronder · 2020-12-15T17:15:51Z

I removed the macro to remove checks right now, but we should have something like that in the future

…4.1 (tags/RELEASE_600/final)

stan-buildbot · 2020-12-16T00:23:50Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.45	3.41	1.01	1.14% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.94	-5.97% slower
eight_schools/eight_schools.stan	0.11	0.12	0.96	-4.28% slower
gp_regr/gp_regr.stan	0.15	0.15	1.0	0.39% faster
irt_2pl/irt_2pl.stan	5.39	5.19	1.04	3.69% faster
performance.compilation	87.13	86.25	1.01	1.01% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.38	8.77	0.96	-4.61% slower
pkpd/one_comp_mm_elim_abs.stan	31.13	29.34	1.06	5.75% faster
sir/sir.stan	144.3	129.7	1.11	10.12% faster
gp_regr/gen_gp_data.stan	0.05	0.04	1.04	3.38% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.93	3.06	0.96	-4.2% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.39	0.44	0.88	-13.38% slower
arK/arK.stan	2.46	1.84	1.34	25.15% faster
arma/arma.stan	0.59	0.62	0.94	-5.97% slower
garch/garch.stan	0.61	0.6	1.03	2.57% faster
Mean result: 1.01858649628

Jenkins Console Log
Blue Ocean
Commit hash: a57c88a

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

SteveBronder · 2020-12-17T19:05:41Z

Is this good to go?

t4c1 · 2020-12-18T08:12:11Z

No. You have not addressed everything from my last review.

…range-size-errors

SteveBronder · 2020-12-18T20:11:59Z

Whups sorry missed that, removed the extra compiler attributes

stan-buildbot · 2020-12-19T03:30:13Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.37	3.37	1.0	-0.06% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.98	-1.67% slower
eight_schools/eight_schools.stan	0.11	0.11	0.98	-1.95% slower
gp_regr/gp_regr.stan	0.16	0.16	1.0	0.29% faster
irt_2pl/irt_2pl.stan	5.42	5.17	1.05	4.59% faster
performance.compilation	89.07	88.42	1.01	0.73% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.4	8.64	0.97	-2.86% slower
pkpd/one_comp_mm_elim_abs.stan	29.94	29.71	1.01	0.78% faster
sir/sir.stan	137.32	133.27	1.03	2.95% faster
gp_regr/gen_gp_data.stan	0.04	0.05	0.92	-9.09% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.91	3.05	0.95	-4.83% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.4	0.39	1.02	2.03% faster
arK/arK.stan	2.5	2.52	0.99	-0.86% slower
arma/arma.stan	0.88	0.61	1.44	30.66% faster
garch/garch.stan	0.53	0.68	0.79	-27.03% slower
Mean result: 1.00966011654

Jenkins Console Log
Blue Ocean
Commit hash: f9f0dc0

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

SteveBronder and others added 3 commits December 11, 2020 12:28

use immedietly invoked lambda in error checks to put error creation c…

f0571b9

…ode in a cold path for better inlining

remove extra semicolon

d0e503d

[Jenkins] auto-formatting by clang-format version 6.0.0-1ubuntu2~16.0…

e23b147

…4.1 (tags/RELEASE_600/final)

t4c1 requested changes Dec 14, 2020

View reviewed changes

stan/math/prim/err/check_matching_dims.hpp Show resolved Hide resolved

stan/math/prim/err/check_nonzero_size.hpp Show resolved Hide resolved

stan/math/prim/meta/macros.hpp Outdated Show resolved Hide resolved

stan/math/prim/meta/macros.hpp Outdated Show resolved Hide resolved

SteveBronder added 2 commits December 14, 2020 11:58

update name of attributes file to compiler_attributes, use ndebug mac…

d7baca4

…ro to decide to turn on or off range checks

fix includes

1e10fc5

removes no check macro and adds cold path to rest of err

0c69ff6

yashikno and others added 2 commits December 15, 2020 17:16

Merge commit '343a3b7802e0d00f79eecbea87d10539637a458d' into HEAD

01bf3aa

[Jenkins] auto-formatting by clang-format version 6.0.0-1ubuntu2~16.0…

a57c88a

…4.1 (tags/RELEASE_600/final)

SteveBronder added 2 commits December 18, 2020 15:10

Merge remote-tracking branch 'origin/develop' into feature/cold-path-…

d0d07d5

…range-size-errors

remove not used compiler attributes

f9f0dc0

t4c1 approved these changes Dec 21, 2020

View reviewed changes

t4c1 merged commit effd263 into develop Dec 21, 2020

rok-cesnovar deleted the feature/cold-path-range-size-errors branch December 21, 2020 08:50

SteveBronder mentioned this pull request Mar 10, 2021

[FR] No bounds checks when NDEBUG flag is set #2420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use immediately invoked lambdas in size and range error checks #2255

use immediately invoked lambdas in size and range error checks #2255

SteveBronder commented Dec 11, 2020 •

edited

Loading

stan-buildbot commented Dec 12, 2020

bbbales2 commented Dec 12, 2020

SteveBronder commented Dec 13, 2020

t4c1 left a comment

wds15 commented Dec 14, 2020

SteveBronder commented Dec 14, 2020

wds15 commented Dec 14, 2020

bbbales2 commented Dec 14, 2020

SteveBronder commented Dec 14, 2020

bbbales2 commented Dec 14, 2020

wds15 commented Dec 14, 2020 •

edited

Loading

rok-cesnovar commented Dec 14, 2020

SteveBronder commented Dec 14, 2020

bbbales2 commented Dec 14, 2020

stan-buildbot commented Dec 15, 2020

SteveBronder commented Dec 15, 2020

rok-cesnovar commented Dec 15, 2020 •

edited

Loading

t4c1 commented Dec 15, 2020

SteveBronder commented Dec 15, 2020

stan-buildbot commented Dec 16, 2020

SteveBronder commented Dec 17, 2020

t4c1 commented Dec 18, 2020

SteveBronder commented Dec 18, 2020

stan-buildbot commented Dec 19, 2020

use immediately invoked lambdas in size and range error checks #2255

use immediately invoked lambdas in size and range error checks #2255

Conversation

SteveBronder commented Dec 11, 2020 • edited Loading

Summary

Tests

Side Effects

Release notes

Checklist

stan-buildbot commented Dec 12, 2020

bbbales2 commented Dec 12, 2020

SteveBronder commented Dec 13, 2020

t4c1 left a comment

Choose a reason for hiding this comment

wds15 commented Dec 14, 2020

SteveBronder commented Dec 14, 2020

wds15 commented Dec 14, 2020

bbbales2 commented Dec 14, 2020

SteveBronder commented Dec 14, 2020

bbbales2 commented Dec 14, 2020

wds15 commented Dec 14, 2020 • edited Loading

rok-cesnovar commented Dec 14, 2020

SteveBronder commented Dec 14, 2020

bbbales2 commented Dec 14, 2020

stan-buildbot commented Dec 15, 2020

SteveBronder commented Dec 15, 2020

rok-cesnovar commented Dec 15, 2020 • edited Loading

t4c1 commented Dec 15, 2020

SteveBronder commented Dec 15, 2020

stan-buildbot commented Dec 16, 2020

SteveBronder commented Dec 17, 2020

t4c1 commented Dec 18, 2020

SteveBronder commented Dec 18, 2020

stan-buildbot commented Dec 19, 2020

SteveBronder commented Dec 11, 2020 •

edited

Loading

wds15 commented Dec 14, 2020 •

edited

Loading

rok-cesnovar commented Dec 15, 2020 •

edited

Loading