Skip to content

Commit c3b0c30

Browse files
committed
change back vari_view constructor
2 parents e6fca4e + 63618a1 commit c3b0c30

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+1072
-55
lines changed

.github/ISSUE_TEMPLATE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ If this is a **feature request**, show what you expect to happen if the feature
2525

2626

2727
#### Current Version:
28-
v4.0.1
28+
v4.1.0

RELEASE-NOTES.txt

+59
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,64 @@
11
Stan Math Library Release Notes
22

3+
======================================================================
4+
v4.1.0 (2 June 2021)
5+
======================================================================
6+
7+
- Added the Cash-Karp numerical integrator to improve numerical integration of ODEs with semi-stiffness and/or rapid oscillations.(#2336)
8+
- Added the quantile function.(#2398)
9+
- Added custom reverse mode for diag_pre_multiply() and diag_post_multiply() functions.(#2405, #2453)
10+
- Optimized `multi_normal_cholesky` for non-autodiff covariance. (#2439)
11+
- Updated Sundials to 5.7.0.(#2441)
12+
- Improved memory safety of nested paralellism.(#2445)
13+
- Updated TBB to 2020.3.(#2447)
14+
- Added the `STAN_NO_RANGE_CHECKS` macro which turns off bounds and range checks.(#2423, #2437)
15+
- Optimized `gp_*_cov` functions, especially for large amount of data.(#2464)
16+
- Fixed compilation errors when using `unsigned` and `long` types with `apply_scalar_unary`.(#2469)
17+
- Added the implementation of the loglogistic probability density function.(#2477)
18+
- Adds reverse mode specialization for `csr_matrix_times_vector(sparse data, dense parameter).(#2462)
19+
- Allow tbb init to set the number of threads by an argument.(#2455)
20+
- Fixed a bug with expressions in poisson distribution functions.(#2414)
21+
- Fixed the off by one error in set_zero_all_adjoints_nested.(#2399)
22+
- Fixed bug with printing Eigen expressions.(#2436)
23+
- Refactored operands and partials to avoid extra allocations.(#2418)
24+
- Tidied up distributions C++ code.(#2352)
25+
- Updated the integrate_1d internal interface updated in preparation for closures(#2397)
26+
- Added docs for new contributors with a getting started guide and docs for contributing new distributions.(#2350, #2466)
27+
- Added an ODE testing framework.(#2432)
28+
- Replaced the finite difference approximation of the Hessian from one that is based on function calls to one that is based on gradients.(#2348)
29+
- Updated code generation for expression tests.(#2419)
30+
- Fixed a bug in expression tests and benchmark generation, where downloading `stanc.exe` did not work on Windows.(#2480)
31+
- Varmat:
32+
- Add `rep_*` utility functions for new matrix type(#2358)
33+
- `var<Matrix>` overloads for digamma, distance, Phi, inv_Phi, Phi_approx, sqrt, tail, tgamma, rows_dot_self, fma, offset_multiplier, bessel first and second kind, beta, binary log loss, ceil, erf, erfc, exp2, expm1, falling_factorial and floor (#2362, #2378, #2396, #2461)
34+
- Added lb/ub/lub_constrain specializations.(#2373, #2382, #2387, #2379)
35+
- Added script to automatically check stanc3 signatures for varmat compatibility.(#2434)
36+
- OpenCL:
37+
- Fixed OpenCL implementations of distributions mostly not working with row vectors.(#2360)
38+
- Added prim and rev OpenCL implementations for `to_matrix`, `to_vector`, `to_row_vector`, `to_array_1d`, `to_array_2d`, `append_array`, `reverse`, `symmetrize_from_lower_tri`, `symmetrize_from_upper_tri` `trace`.(#2377, #2383, #2388)
39+
- Added OpenCL functions `rep_matrix`, `rep_vector`, `rep_row_vector`, `rep_array` and `identity_matrix`.(#2388)
40+
- Added operator %.()
41+
- Reorganized how work is distributed between threads in generated kernels that use colwise reductions (including all distributions), significantly improving GPU preformance.(#2392)
42+
- Removed `.triangularTranspose()` member funtion from `matrix_cl` and `TriangularMapCL` enum. `.triangularTranspose()` is replaced by `symmetrize_from_lower_tri()`.(#2393)
43+
- Added support for two dimensional reductions to kernel generator.(#2403)
44+
- Added OpenCL implementations for functions `log_mix`, `log_softmax`, `log_sum_exp`, `rank`, `sd`, `softmax` and `˙variance`.(#2426)
45+
- Added OpenCL implementations for `ub_constrain`, `lb_constrain`, `lub_constrain`, `offset_multiplier_constrain` and `unit_vector_constrain`.(#2427)
46+
- Added OpenCL implementation for `prod` function and kernel generator operation for rowwise, colwise and 2d product.(#2433)
47+
- Added OpenCL implementations for functions: `bernoulli_cdf`, `bernoulli_lcdf`, `bernoulli_lccdf`, `cauchy_cdf`, `cauchy_lcdf`, `cauchy_lccdf`.(#2446)
48+
- Added OpenCL implementations for functions `double_exponential_cdf`, `double exponential_lcd`, `double_exponential_lccdf`˙, `exp_mod_normal_cdf`, `exp_mod_normal_lcdf` and `exp_mod_normal_lccdf`.(#2449)
49+
- Added OpenCL implementations for functions `exponential_cdf`, `exponential_lcdf`, `exponential_lccdf`, `frechet_cdf`, `frechet_lcdf` and `frechet_lccdf`.(#2450)
50+
- Added OpenCL implementations for functions `gumbel_cdf`, `gumbel_lcdf`, `gumbel_lccdf`, `logistic_cdf`, `logistic_lcdf` and `logistic_lccdf`.(#2451)
51+
- Added a new kernel generator operation that allows writing custom OpenCL code.(#2454)
52+
- Added OpenCL implementations for functions `pareto_cdf`, `pareto_lccdf`, `pareto_lcdf`, `pareto_type_2_cdf`, `pareto_type_2_lccdf`, and `pareto_type_2_lcdf`.(#2456)
53+
- Added OpenCL implementations for functions: `rayleigh_cdf`, `rayleigh_lccdf`, `rayleigh_lcdf`, `skew_double_exponential_cdf`, `skew_double_exponential_lccdf`, `skew_double_exponential_lcdf` and `skew_double_exponential_lpdf`.(#2457)
54+
- Added OpenCL implementations for functions `lognormal_cdf`, `lognormal_lccdf`, `lognormal_lcdf`, `normal_cdf`, `normal_lccdf`, `normal_lcdf`.(#2458)
55+
- Added OpenCL implementations for functions `std_normal_cdf`, `std_normal_lccdf`, `std_normal_lcdf`, `uniform_cdf`, `uniform_lccdf` and `uniform_lcdf`.(#2459)
56+
- Added OpenCL implementations for functions `weibull_cdf`, `weibull_lccdf` and `weibull_lcdf`.(#2460)
57+
- Removed unused OpenCL kernels and checks.(#2463)
58+
- Added OpenCL prim implementation for functions: `gp_exponential_cov`, `gp_matern32_cov`, `matern_52_cov` and both prim and rev implementation for `gp_dot_prod_cov`.(#2471)
59+
- Added reference (`ref_type`) for kernel generator expressions.(#2404)
60+
- Added typecast operation to kernel generator.(#2472)
61+
362
======================================================================
463
v4.0.1 (17 February 2021)
564
======================================================================

doxygen/doxygen.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ PROJECT_NAME = "Stan Math Library"
3838
# could be handy for archiving the generated documentation or if some version
3939
# control system is used.
4040

41-
PROJECT_NUMBER = 4.0.1
41+
PROJECT_NUMBER = 4.1.0
4242

4343
# Using the PROJECT_BRIEF tag one can provide an optional one line description
4444
# for a project that appears at the top of each page and should give viewer a

make/tests

+1-1
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ else
100100
endif
101101

102102
%.hpp-test : %.hpp test/dummy.cpp
103-
$(COMPILE.cpp) $(CXXFLAGS) -O0 -include $^ -o $(DEV_NULL)
103+
$(COMPILE.cpp) $(CXXFLAGS) -O0 -include $^ -o $(DEV_NULL) -Wunused-local-typedefs
104104

105105
test/dummy.cpp:
106106
@mkdir -p test

stan/math/fwd/fun/mdivide_left.hpp

-1
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,6 @@ template <typename T1, typename T2,
6060
inline Eigen::Matrix<value_type_t<T2>, T1::RowsAtCompileTime,
6161
T2::ColsAtCompileTime>
6262
mdivide_left(const T1& A, const T2& b) {
63-
using T = typename value_type_t<T2>::Scalar;
6463
constexpr int S1 = T1::RowsAtCompileTime;
6564
constexpr int C2 = T2::ColsAtCompileTime;
6665

stan/math/fwd/fun/mdivide_left_tri_low.hpp

-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,6 @@ template <typename T1, typename T2, require_eigen_t<T1>* = nullptr,
5555
inline Eigen::Matrix<value_type_t<T2>, T1::RowsAtCompileTime,
5656
T2::ColsAtCompileTime>
5757
mdivide_left_tri_low(const T1& A, const T2& b) {
58-
using T = typename value_type_t<T2>::Scalar;
5958
constexpr int S1 = T1::RowsAtCompileTime;
6059
constexpr int C2 = T2::ColsAtCompileTime;
6160

stan/math/opencl/kernel_generator/block_zero_based.hpp

+12-2
Original file line numberDiff line numberDiff line change
@@ -272,9 +272,19 @@ class block_
272272
inline void set_view(int bottom_diagonal, int top_diagonal,
273273
int bottom_zero_diagonal, int top_zero_diagonal) const {
274274
int change = start_col_ - start_row_;
275-
this->template get_arg<0>().set_view(
275+
auto& a = this->template get_arg<0>();
276+
a.set_view(
276277
bottom_diagonal + change, top_diagonal + change,
277-
bottom_zero_diagonal + change, top_zero_diagonal + change);
278+
(start_col_ == 0 && start_row_ <= 1 && start_row_ + rows_ == a.rows()
279+
&& start_col_ + cols_ >= std::min(a.rows() - 1, a.cols())
280+
? bottom_zero_diagonal
281+
: bottom_diagonal)
282+
+ change,
283+
(start_row_ == 0 && start_col_ <= 1 && start_col_ + cols_ == a.cols()
284+
&& start_row_ + rows_ >= std::min(a.rows(), a.cols() - 1)
285+
? top_zero_diagonal
286+
: top_diagonal)
287+
+ change);
278288
}
279289

280290
/**

stan/math/opencl/kernel_generator/elt_function_cl.hpp

+1
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,7 @@ ADD_BINARY_FUNCTION_WITH_INCLUDES(fmod)
338338
ADD_BINARY_FUNCTION_WITH_INCLUDES(hypot)
339339
ADD_BINARY_FUNCTION_WITH_INCLUDES(ldexp)
340340
ADD_BINARY_FUNCTION_WITH_INCLUDES(pow)
341+
ADD_BINARY_FUNCTION_WITH_INCLUDES(copysign)
341342

342343
ADD_BINARY_FUNCTION_WITH_INCLUDES(
343344
beta, stan::math::opencl_kernels::beta_device_function)
+224
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
#ifndef STAN_MATH_OPENCL_KERNELS_CUMULATIVE_SUM_HPP
2+
#define STAN_MATH_OPENCL_KERNELS_CUMULATIVE_SUM_HPP
3+
#ifdef STAN_OPENCL
4+
5+
#include <stan/math/opencl/kernel_cl.hpp>
6+
#include <stan/math/opencl/buffer_types.hpp>
7+
#include <stan/math/opencl/matrix_cl_view.hpp>
8+
#include <string>
9+
10+
namespace stan {
11+
namespace math {
12+
namespace opencl_kernels {
13+
14+
// \cond
15+
static const char *cumulative_sum1_kernel_code = STRINGIFY(
16+
// \endcond
17+
/** \ingroup opencl_kernels
18+
* First kernel of the cumulative sum implementation. Each thread sums the
19+
* assigned elements and threads within same work group add their results
20+
* together.
21+
*
22+
* @param[out] out_wgs results from each work group
23+
* @param[out] out_threads results for each thread
24+
* @param[in] in input data
25+
* @param size size number of elements in the input
26+
*/
27+
__kernel void cumulative_sum1(__global SCAL *out_wgs,
28+
__global SCAL *out_threads, __global SCAL *in,
29+
int size) {
30+
const int gid = get_global_id(0);
31+
const int lid = get_local_id(0);
32+
const int lsize = get_local_size(0);
33+
const int wg_id = get_group_id(0);
34+
const int gsize = get_global_size(0);
35+
36+
int start = (int)((long)gid * size / gsize); // NOLINT
37+
int end = (int)((long)(gid + 1) * size / gsize); // NOLINT
38+
__local SCAL local_storage[LOCAL_SIZE_];
39+
40+
SCAL acc = 0;
41+
if (start != end) {
42+
acc = in[start];
43+
for (int i = start + 1; i < end; i++) {
44+
acc += in[i];
45+
}
46+
}
47+
for (int step = 1; step < lsize; step *= REDUCTION_STEP_SIZE) {
48+
local_storage[lid] = acc;
49+
barrier(CLK_LOCAL_MEM_FENCE);
50+
for (int i = 1; i < REDUCTION_STEP_SIZE && step * i <= lid; i++) {
51+
acc += local_storage[lid - step * i];
52+
}
53+
barrier(CLK_LOCAL_MEM_FENCE);
54+
}
55+
out_threads[gid] = acc;
56+
if (lid == LOCAL_SIZE_ - 1) {
57+
out_wgs[wg_id] = acc;
58+
}
59+
}
60+
// \cond
61+
);
62+
// \endcond
63+
64+
// \cond
65+
static const char *cumulative_sum2_kernel_code = STRINGIFY(
66+
// \endcond
67+
/** \ingroup opencl_kernels
68+
* Second kernel of the cumulative sum implementation. Calculates prefix sum
69+
* of given data in place using a single work group (must be run with a
70+
* single work group).
71+
*
72+
* @param[in, out] data data to calculate cumulative sum of
73+
* @param size size number of elements in the input
74+
*/
75+
__kernel void cumulative_sum2(__global SCAL *data, int size) {
76+
const int gid = get_global_id(0);
77+
const int gsize = get_global_size(0);
78+
79+
int start = (int)((long)gid * size / gsize); // NOLINT
80+
int end = (int)((long)(gid + 1) * size / gsize); // NOLINT
81+
__local SCAL local_storage[LOCAL_SIZE_];
82+
83+
SCAL acc;
84+
if (start == end) {
85+
acc = 0;
86+
} else {
87+
acc = data[start];
88+
for (int i = start + 1; i < end; i++) {
89+
acc += data[i];
90+
}
91+
}
92+
local_storage[gid] = acc;
93+
barrier(CLK_LOCAL_MEM_FENCE);
94+
for (int step = 1; step < gsize; step *= REDUCTION_STEP_SIZE) {
95+
for (int i = 1; i < REDUCTION_STEP_SIZE && step * i <= gid; i++) {
96+
acc += local_storage[gid - step * i];
97+
}
98+
barrier(CLK_LOCAL_MEM_FENCE);
99+
local_storage[gid] = acc;
100+
barrier(CLK_LOCAL_MEM_FENCE);
101+
}
102+
if (start != end) {
103+
if (gid == 0) {
104+
acc = 0;
105+
} else {
106+
acc = local_storage[gid - 1];
107+
}
108+
for (int i = start; i < end; i++) {
109+
acc += data[i];
110+
data[i] = acc;
111+
}
112+
}
113+
}
114+
// \cond
115+
);
116+
// \endcond
117+
118+
// \cond
119+
static const char *cumulative_sum3_kernel_code = STRINGIFY(
120+
// \endcond
121+
/** \ingroup opencl_kernels
122+
* Third kernel of the cumulative sum implementation. Given sums of threads
123+
* and cumulative sum of those calculates cumulative sum of given array.
124+
* Must be run with the same number of threads and work groups as the first
125+
* cumulative sum kernel.
126+
*
127+
* @param[out] out cumulatively summed input
128+
* @param[out] in_data input data
129+
* @param[in] in_threads summed results from each thread from the first
130+
* kernel
131+
* @param[in] in_wgs cumulatively summed results from each work group
132+
* (calculated by previous two kernels)
133+
* @param size size number of elements in the input
134+
*/
135+
__kernel void cumulative_sum3(__global SCAL *out, __global SCAL *in_data,
136+
__global SCAL *in_threads,
137+
__global SCAL *in_wgs, int size) {
138+
const int gid = get_global_id(0);
139+
const int lid = get_local_id(0);
140+
const int lsize = get_local_size(0);
141+
const int wg_id = get_group_id(0);
142+
const int gsize = get_global_size(0);
143+
144+
int start = (int)((long)gid * size / gsize); // NOLINT
145+
int end = (int)((long)(gid + 1) * size / gsize); // NOLINT
146+
__local SCAL local_storage[LOCAL_SIZE_];
147+
148+
SCAL acc = 0;
149+
if (wg_id != 0) {
150+
acc = in_wgs[wg_id - 1];
151+
}
152+
if (lid != 0) {
153+
acc += in_threads[gid - 1];
154+
}
155+
for (int i = start; i < end; i++) {
156+
acc += in_data[i];
157+
out[i] = acc;
158+
}
159+
}
160+
// \cond
161+
);
162+
// \endcond
163+
164+
/**
165+
* struct containing cumulative_sum kernels, grouped by scalar type.
166+
*/
167+
template <typename Scalar, typename = void>
168+
struct cumulative_sum {};
169+
170+
template <typename T>
171+
struct cumulative_sum<double, T> {
172+
static const kernel_cl<out_buffer, out_buffer, in_buffer, int> kernel1;
173+
static const kernel_cl<in_out_buffer, int> kernel2;
174+
static const kernel_cl<out_buffer, in_buffer, in_buffer, in_buffer, int>
175+
kernel3;
176+
};
177+
template <typename T>
178+
struct cumulative_sum<int, T> {
179+
static const kernel_cl<out_buffer, out_buffer, in_buffer, int> kernel1;
180+
static const kernel_cl<in_out_buffer, int> kernel2;
181+
static const kernel_cl<out_buffer, in_buffer, in_buffer, in_buffer, int>
182+
kernel3;
183+
};
184+
185+
template <typename T>
186+
const kernel_cl<out_buffer, out_buffer, in_buffer, int>
187+
cumulative_sum<double, T>::kernel1("cumulative_sum1",
188+
{"#define SCAL double\n",
189+
cumulative_sum1_kernel_code},
190+
{{"REDUCTION_STEP_SIZE", 4},
191+
{"LOCAL_SIZE_", 16}});
192+
template <typename T>
193+
const kernel_cl<out_buffer, out_buffer, in_buffer, int>
194+
cumulative_sum<int, T>::kernel1(
195+
"cumulative_sum1", {"#define SCAL int\n", cumulative_sum1_kernel_code},
196+
{{"REDUCTION_STEP_SIZE", 4}, {"LOCAL_SIZE_", 16}});
197+
198+
template <typename T>
199+
const kernel_cl<in_out_buffer, int> cumulative_sum<double, T>::kernel2(
200+
"cumulative_sum2", {"#define SCAL double\n", cumulative_sum2_kernel_code},
201+
{{"REDUCTION_STEP_SIZE", 4}, {"LOCAL_SIZE_", 1024}});
202+
template <typename T>
203+
const kernel_cl<in_out_buffer, int> cumulative_sum<int, T>::kernel2(
204+
"cumulative_sum2", {"#define SCAL int\n", cumulative_sum2_kernel_code},
205+
{{"REDUCTION_STEP_SIZE", 4}, {"LOCAL_SIZE_", 1024}});
206+
207+
template <typename T>
208+
const kernel_cl<out_buffer, in_buffer, in_buffer, in_buffer, int>
209+
cumulative_sum<double, T>::kernel3("cumulative_sum3",
210+
{"#define SCAL double\n",
211+
cumulative_sum3_kernel_code},
212+
{{"REDUCTION_STEP_SIZE", 4},
213+
{"LOCAL_SIZE_", 16}});
214+
template <typename T>
215+
const kernel_cl<out_buffer, in_buffer, in_buffer, in_buffer, int>
216+
cumulative_sum<int, T>::kernel3(
217+
"cumulative_sum3", {"#define SCAL int\n", cumulative_sum3_kernel_code},
218+
{{"REDUCTION_STEP_SIZE", 4}, {"LOCAL_SIZE_", 16}});
219+
220+
} // namespace opencl_kernels
221+
} // namespace math
222+
} // namespace stan
223+
#endif
224+
#endif

stan/math/opencl/prim.hpp

+7
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@
9797
#include <stan/math/opencl/to_ref_for_opencl.hpp>
9898
#include <stan/math/opencl/value_type.hpp>
9999
#include <stan/math/opencl/zeros_strict_tri.hpp>
100+
#include <stan/math/opencl/qr_decomposition.hpp>
100101

101102
#include <stan/math/opencl/prim/add_diag.hpp>
102103
#include <stan/math/opencl/prim/append_array.hpp>
@@ -124,6 +125,7 @@
124125
#include <stan/math/opencl/prim/columns_dot_product.hpp>
125126
#include <stan/math/opencl/prim/columns_dot_self.hpp>
126127
#include <stan/math/opencl/prim/crossprod.hpp>
128+
#include <stan/math/opencl/prim/cumulative_sum.hpp>
127129
#include <stan/math/opencl/prim/diag_matrix.hpp>
128130
#include <stan/math/opencl/prim/diag_pre_multiply.hpp>
129131
#include <stan/math/opencl/prim/diag_post_multiply.hpp>
@@ -161,6 +163,7 @@
161163
#include <stan/math/opencl/prim/gumbel_lcdf.hpp>
162164
#include <stan/math/opencl/prim/gumbel_lpdf.hpp>
163165
#include <stan/math/opencl/prim/head.hpp>
166+
#include <stan/math/opencl/prim/identity_matrix.hpp>
164167
#include <stan/math/opencl/prim/inv.hpp>
165168
#include <stan/math/opencl/prim/inv_chi_square_lpdf.hpp>
166169
#include <stan/math/opencl/prim/inv_cloglog.hpp>
@@ -210,6 +213,10 @@
210213
#include <stan/math/opencl/prim/poisson_log_lpmf.hpp>
211214
#include <stan/math/opencl/prim/poisson_lpmf.hpp>
212215
#include <stan/math/opencl/prim/prod.hpp>
216+
#include <stan/math/opencl/prim/qr_Q.hpp>
217+
#include <stan/math/opencl/prim/qr_R.hpp>
218+
#include <stan/math/opencl/prim/qr_thin_Q.hpp>
219+
#include <stan/math/opencl/prim/qr_thin_R.hpp>
213220
#include <stan/math/opencl/prim/rank.hpp>
214221
#include <stan/math/opencl/prim/rayleigh_cdf.hpp>
215222
#include <stan/math/opencl/prim/rayleigh_lccdf.hpp>

0 commit comments

Comments
 (0)