Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace sqrt(dot_self()) with norm2 #2648

Closed
wants to merge 4 commits into from

Conversation

lyndond
Copy link
Contributor

@lyndond lyndond commented Jan 9, 2022

Summary

I'm hoping this one won't be too controversial. This PR addresses the remaining TODO in #2562.
PR #2636 introduced specialized prim/fwd/rev functions for L1 and L2 norms. This PR tidies some instances in the codebase where r = sqrt(dot_self(x)); can now be replaced with r = norm2(x);. The main ones I found were:

  • unit_vector_constrain in prim/fwd/rev

If there are any instances I may have missed, please let me know.

Tests

No new tests. Existing tests pass.

Side Effects

N/A

Release notes

Replace this text with a short note on what will change if this pull request is merged in which case this will be included in the release notes.

Checklist

  • Math issue L1 and L2 norms (Euclidean and taxicab lengths) #2562

  • Copyright holder: Lyndon Duong
    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.6 3.64 0.99 -1.08% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.48% slower
eight_schools/eight_schools.stan 0.08 0.08 1.01 0.91% faster
gp_regr/gp_regr.stan 0.14 0.14 0.99 -0.63% slower
irt_2pl/irt_2pl.stan 5.76 5.71 1.01 0.81% faster
performance.compilation 93.18 91.22 1.02 2.1% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.22 8.15 1.01 0.84% faster
pkpd/one_comp_mm_elim_abs.stan 31.78 31.02 1.02 2.41% faster
sir/sir.stan 122.21 121.79 1.0 0.34% faster
gp_regr/gen_gp_data.stan 0.03 0.03 0.99 -1.36% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 3.02 0.98 -1.67% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.42 0.38 1.11 9.92% faster
arK/arK.stan 2.08 2.06 1.01 1.02% faster
arma/arma.stan 0.28 0.27 1.01 0.56% faster
garch/garch.stan 0.6 0.61 0.99 -0.69% slower
Mean result: 1.00954507309

Jenkins Console Log
Blue Ocean
Commit hash: d280811


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@andrjohns
Copy link
Collaborator

andrjohns commented Jan 10, 2022

Thanks for the PR! I'm not sure whether this change is actually necessary though, as it looks like this essentially just replaces sqrt(dot_self()) with calls to square(norm2()). So the amount of operations/complexity is unchanged

@lyndond
Copy link
Contributor Author

lyndond commented Jan 10, 2022

That's fair -- I was thinking that too as I was making the changes. Perhaps there was something else @bob-carpenter had in mind in the issue #2562 .

@bob-carpenter
Copy link
Member

@andrjohns: We want to replace sqrt(dot_self(x)) with norm2(x), not with square(norm2(x)), because norm2(x) = sqrt(dot_self(x)).

@lyndond
Copy link
Contributor Author

lyndond commented Jan 10, 2022

Right. I replaced instances of sqrt(dot_self()) with norm2(). There were a few instances where the squared norm is required (e.g. line 46 in stan/math/fwd/fun/unit_vector_constrain.hpp), and in those cases I left dot_self()as is.

Copy link
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. These changes are OK semantically, but I think the original versions are slightly faster in all cases because it's not just the norm2 calculation, we also need its square.

If you want to work on speed here, what we could use is the autodiff implemented analytically here like for any other special function. The twist is that we also need to set lp (right now it just does lp +=, but we should set it to a new var that holds a pointer to lp and to y), and also define the version that doesn't increment lp.

What we'd really like instead of this replacement is analytic gradients for the constraint functions. We should be able to define this just like any other autodiffable function but have it also set lp to a new value.

eig_partial squared_norm = dot_self(y_val);
eig_partial norm = sqrt(squared_norm);
eig_partial norm = norm2(y_val);
eig_partial squared_norm = norm * norm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are going to be very similar operationally. I think the original code may require fewer operations and be more stable arithmetically, because the new one has the sqrt inside norm2, and then you have to the do the additional squaring. So unless there's something I'm not seeing, I think this should be switched back. If for some reason you do want to keep the new code, line 27 should be square(norm).

@@ -34,7 +34,7 @@ inline auto unit_vector_constrain(const T& y) {
arena_t<T> arena_y = y;
arena_t<promote_scalar_t<double, T>> arena_y_val = arena_y.val();

const double r = arena_y_val.norm();
const double r = norm2(arena_y_val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For double arguments, it's probably better to use the built-in Eigen functions as they're more highly optimized for this. Of course, norm2 should delegate to this .norm() for values in the vector case.

return y_ref.array() / sqrt(SN);
value_type_t<T1> r = norm2(y_ref);
check_positive_finite("unit_vector_constrain", "norm", r);
lp -= 0.5 * (r * r);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's also just paying that squaring cost again here on line 51, so I think the original code may again be faster here.

What we really want here is a specialized function for unit_vector_constrain that does all of this with analytic derivatives for lp and the return type.

@lyndond
Copy link
Contributor Author

lyndond commented Jan 11, 2022

Thanks for the review @bob-carpenter. It looks like this is a bit more nuanced than initially anticipated. I think I'll close the PR for now until we settle on something that requires more significant change.

@lyndond lyndond closed this Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants