Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel generator cast operation #2472

Merged
merged 5 commits into from
May 5, 2021
Merged

Conversation

t4c1
Copy link
Contributor

@t4c1 t4c1 commented Apr 16, 2021

Summary

Adds typecast operation to kernel generator and uses it in distributions instead of forcing type promotions by adding zero.

Tests

Added tests for cast. Reminder of this PR is just refactor.

Side Effects

None.

Release notes

Adds typecast operation to kernel generator.

Checklist

  • Math issue #(issue number)

  • Copyright holder: Tadej Ciglarič

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.37 3.36 1.0 0.4% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -1.13% slower
eight_schools/eight_schools.stan 0.12 0.11 1.02 1.76% faster
gp_regr/gp_regr.stan 0.16 0.16 0.97 -2.68% slower
irt_2pl/irt_2pl.stan 6.02 6.04 1.0 -0.44% slower
performance.compilation 91.2 89.52 1.02 1.85% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.79 8.64 1.02 1.75% faster
pkpd/one_comp_mm_elim_abs.stan 30.74 30.34 1.01 1.28% faster
sir/sir.stan 123.53 119.69 1.03 3.11% faster
gp_regr/gen_gp_data.stan 0.03 0.04 1.0 -0.5% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.03 2.98 1.02 1.59% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.41 0.39 1.05 4.68% faster
arK/arK.stan 1.88 1.89 1.0 -0.31% slower
arma/arma.stan 0.75 0.76 1.0 -0.46% slower
garch/garch.stan 0.58 0.56 1.02 2.31% faster
Mean result: 1.00920388803

Jenkins Console Log
Blue Ocean
Commit hash: fcaff77


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@t4c1
Copy link
Contributor Author

t4c1 commented Apr 26, 2021

This is still waiting for a review.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code looks good but I'm having a hard time following how the cast happens on the device side. I think for this operation in particular where we are sort of fighting a thing in the OpenCL spec it's very good to have more particular docs. If you can clear that up then I think this is good to go

Side note, not necessary for this PR, but I think this is an example where it would be nice to have something in the tests or an easy scheme available from runTests.py for generating the OpenCL kernel code. It would also make reviewing these things a lot simpler. Sort of like how in the stanc3 compiler we have the generated cpp as part of the testing so we can see what stuff changes in the gen'd code between PRs

Comment on lines +58 to +59
res.body = type_str<Scalar>() + " " + var_name_ + " = ("
+ type_str<Scalar>() + ")" + var_name_arg + ";\n";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this line here generates Scalar var_name = (Scalar)input_var_name;? If so then idt I'm understanding how this works for char?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't understand what is the confusion with char. It works exactly the same as for any other supported scalar type.

Side note, not necessary for this PR, but I think this is an example where it would be nice to have something in the tests or an easy scheme available from runTests.py for generating the OpenCL kernel code.

We do have reference kernels in tests, although I am not using them for operations that generate really trivial code, such as typecasting in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm looking at

auto any_n_negative = colwise_max(cast<char>(n < 0));

And here this is turned into

char var_name = (char)input_var_name;

Apologies if this is silly but I'm not following how that is used in an if etc. Like is it that if input_var_name is 0 then the char is 0 and the if is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. input_var_name is in this case the result of the comparison n < 0, which is either 0 or 1. 0 or 1 converted into char is still 0 or 1. That gets written into global memory and than copied to host. Is there still anything you are not understanding?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aight then I think that makes sense

@t4c1
Copy link
Contributor Author

t4c1 commented May 3, 2021

@SteveBronder This PR is still waiting.

Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a note on the behavior for this but I think it is fine as is

@t4c1 t4c1 merged commit b994475 into stan-dev:develop May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants