Skip to content

Commit f5bb646

Browse files
ailzhangerizmrMorckipre-commit-ci[bot]writinwaters
authored
[misc] Rc v1.1.0 patch4 (#5725)
* [autodiff] Fix AdStackAllocaStmt not correctly backup (#5692) * [autodiff] Fix AdStackAllocaStmt not correctly backup * remove redundant replace * add comments * add polar decompose test * erase outdated AdStackAllocaStmt * [Doc] Add introduction to forward mode autodiff (#5680) * [autodiff] Add introduction to forward mode autodiff * Apply suggestions from code review Co-authored-by: Ailing <[email protected]> * Apply suggestions from code review Co-authored-by: Ailing <[email protected]> * [Doc] Add docs for GGUI's new features (#5647) * improve docs about new features of GGUI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * rename wareframe to wireframe * update some docs * update docs * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * Update docs/lang/articles/visualization/ggui.md Co-authored-by: Vissidarte-Herman <[email protected]> * rename to Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vissidarte-Herman <[email protected]> * [bug] Support indexing via np.integer for field (#5712) * [autodiff] Clear all dual fields when exiting context manager (#5716) * [Doc] [type] Add introduction to quantized types (#5705) * [Doc] [type] Add introduction to quantized types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix * Apply suggestions from code review Co-authored-by: Ailing <[email protected]> * Shuffle * Fix type Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ailing <[email protected]> * [test] Fix autodiff test for unsupported shift ptr (#5723) * [misc] Update version to v1.1.0 Co-authored-by: Mingrui Zhang <[email protected]> Co-authored-by: Mocki <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Vissidarte-Herman <[email protected]> Co-authored-by: Yi Xu <[email protected]>
1 parent 6fac8fd commit f5bb646

17 files changed

+690
-21
lines changed

docs/lang/articles/advanced/quant.md

+249
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
---
2+
sidebar_position: 3
3+
---
4+
5+
# Using quantized data types
6+
7+
High-resolution simulations can deliver great visual quality, but they are often
8+
limited by available memory, especially on GPUs. For the sake of saving memory,
9+
Taichi provides low-precision ("quantized") data types. You can define your own integers,
10+
fixed-point numbers or floating-point numbers with non-standard number of bits so
11+
that you can choose a proper setting with minimum memory for your applications.
12+
Taichi provides a suite of tailored domain-specific optimizations to ensure the
13+
runtime performance with quantized data types close to that with full-precision
14+
data types.
15+
16+
:::note
17+
Quantized data types are only supported on CPU and CUDA backends for now.
18+
:::
19+
20+
## Quantized data types
21+
22+
### Quantized integers
23+
24+
Modern computers represent integers using the [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement)
25+
format. *Quantized integers* in Taichi adopt the same format, and can contain
26+
non-standard number of bits:
27+
28+
```python
29+
i10 = ti.types.quant.int(bits=10) # 10-bit signed (default) integer type
30+
u5 = ti.types.quant.int(bits=5, signed=False) # 5-bit unsigned integer type
31+
```
32+
33+
### Quantized fixed-point numbers
34+
35+
[Fixed-point numbers](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) are
36+
an old way to represent real numbers. The internal representation of a fixed-point number is simply an integer, and
37+
its actual value equals to the integer multiplied by a predefined scaling
38+
factor. Based on the support for quantized integers, Taichi provides *quantized
39+
fixed-point numbers* as follows:
40+
41+
```python
42+
fixed_type_a = ti.types.quant.fixed(bits=10, max_value=20.0) # 10-bit signed (default) fixed-point type within [-20.0, 20.0]
43+
fixed_type_b = ti.types.quant.fixed(bits=5, signed=False, max_value=100.0) # 5-bit unsigned fixed-point type within [0.0, 100.0]
44+
fixed_type_c = ti.types.quant.fixed(bits=6, signed=False, scale=1.0) # 6-bit unsigned fixed-point type within [0, 64.0]
45+
```
46+
47+
`scale` is the scaling factor mentioned above. Because fixed-point numbers are
48+
especially useful when you know the actual value is guaranteed to be within a
49+
range, Taichi allows you to simply set `max_value` and will calculate the
50+
scaling factor accordingly.
51+
52+
### Quantized floating-point numbers
53+
54+
[Floating-point numbers](https://en.wikipedia.org/wiki/Floating-point_arithmetic)
55+
are the standard way to represent real numbers on modern computers. A
56+
floating-point number is composed of exponent bits, fraction bits, and a sign
57+
bit. There are various floating-point formats:
58+
59+
![image](../static/assets/floating-point_formats.png)
60+
61+
In Taichi, you can define a *quantized floating-point number* with arbitrary
62+
combination of exponent bits and fraction bits (the sign bit is made part of
63+
fraction bits):
64+
65+
```python
66+
float_type_a = ti.types.quant.float(exp=5, frac=10) # 15-bit signed (default) floating-point type with 5 exponent bits
67+
float_type_b = ti.types.quant.float(exp=6, frac=9, signed=False) # 15-bit unsigned floating-point type with 6 exponent bits
68+
```
69+
70+
### Compute types
71+
72+
All the parameters you've seen above are specifying the *storage type* of a
73+
quantized data type. However, most quantized data types have no native support
74+
on hardware, so an actual value of that quantized data type needs to convert to
75+
a primitive type ("*compute type*") when it is involved in computation.
76+
77+
The default compute type for quantized integers is `ti.i32`, while the default
78+
compute type for quantized fixed-point/floating-point numbers is `ti.f32`. You
79+
can change the compute type by specifying the `compute` parameter:
80+
81+
```python
82+
i21 = ti.types.quant.int(bits=21, compute=ti.i64)
83+
bfloat16 = ti.types.quant.float(exp=8, frac=8, compute=ti.f32)
84+
```
85+
86+
## Data containers for quantized data types
87+
88+
Because the storage types are not primitive types, you may now wonder how
89+
quantized data types can work together with data containers that Taichi
90+
provides. In fact, some special constructs are introduced to eliminate the gap.
91+
92+
### Bitpacked fields
93+
94+
`ti.BitpackedFields` packs a group of fields whose `dtype`s are
95+
quantized data types together so that they are stored with one primitive type.
96+
You can then place a `ti.BitpackedFields` instance under any SNode as if each member field
97+
is placed individually.
98+
99+
```python
100+
a = ti.field(float_type_a) # 15 bits
101+
b = ti.field(fixed_type_b) # 5 bits
102+
c = ti.field(fixed_type_c) # 6 bits
103+
d = ti.field(u5) # 5 bits
104+
bitpack = ti.BitpackedFields(max_num_bits=32)
105+
bitpack.place(a, b, c, d) # 31 out of 32 bits occupied
106+
ti.root.dense(ti.i, 10).place(bitpack)
107+
```
108+
109+
#### Shared exponent
110+
111+
When multiple fields with quantized floating-point types are packed together,
112+
there is chance that they can share a common exponent. For example, in a 3D
113+
velocity vector, if you know the x-component has a much larger absolute value
114+
compared to y- and z-components, then you probably do not care about the exact
115+
value of the y- and z-components. In this case, using a shared exponent can
116+
leave more bits for components with larger absolute values. You can use
117+
`place(x, y, z, shared_exponent=True)` to make fields `x, y, z` share a common
118+
exponent.
119+
120+
#### Your first program
121+
122+
You probably cannot wait to write your first Taichi program with quantized data
123+
types. The easiest way is to modify the data definitions of an existing example.
124+
Assume you want to save memory for
125+
[examples/simulation/euler.py](https://github.com/taichi-dev/taichi/blob/master/python/taichi/examples/simulation/euler.py).
126+
Because most data definitions in the example are similar, here only field `Q` is
127+
used for illustration:
128+
129+
```python
130+
Q = ti.Vector.field(4, dtype=ti.f32, shape=(N, N))
131+
```
132+
133+
An element of `Q` now occupies 4 x 32 = 128 bits. If you can fit it in
134+
64 bits, then the memory usage is halved. A direct and first attempt is to
135+
use quantized floating-point numbers with a shared exponent:
136+
137+
```python
138+
float_type_c = ti.types.quant.float(exp=8, frac=14)
139+
Q_old = ti.Vector.field(4, dtype=float_type_c)
140+
bitpack = ti.BitpackedFields(max_num_bits=64)
141+
bitpack.place(Q_old, shared_exponent=True)
142+
ti.root.dense(ti.ij, (N, N)).place(bitpack)
143+
```
144+
145+
Surprisingly, you find that there is no obvious difference in visual effects
146+
after the change, and you now successfully finish a Taichi program with
147+
quantized data types! More attempts are left to you.
148+
149+
#### More complicated quantization schemes
150+
151+
Here comes a more complicated scenario. In a 3D Eulerian fluid simulation, a
152+
voxel may need to store a 3D vector for velocity, and an integer value for cell
153+
category with three possible values: "source", "Dirichlet boundary", and
154+
"Neumann boundar". You can actually store all information with a single 32-bit
155+
`ti.BitpackedFields`:
156+
157+
```python
158+
velocity_component_type = ti.types.quant.float(exp=6, frac=8, compute=ti.f32)
159+
velocity = ti.Vector.field(3, dtype=velocity_component_type)
160+
161+
# Since there are only three cell categories, 2 bits are enough.
162+
cell_category_type = ti.types.quant.int(bits=2, signed=False, compute=ti.i32)
163+
cell_category = ti.field(dtype=cell_category_type)
164+
165+
voxel = ti.BitpackedFields(max_num_bits=32)
166+
# Place three components of velocity into the voxel, and let them share the exponent.
167+
voxel.place(velocity, shared_exponent=True)
168+
# Place the 2-bit cell category.
169+
voxel.place(cell_category)
170+
# Create 512 x 512 x 256 voxels.
171+
ti.root.dense(ti.ijk, (512, 512, 256)).place(voxel)
172+
```
173+
174+
The compression scheme above allows you to store 13 bytes (4B x 3 + 1B) into
175+
just 4 bytes. Note that you can still use velocity and cell_category in the
176+
computation code, as if they are `ti.f32` and `ti.u8`.
177+
178+
![image](../static/assets/bitpacked_fields_layout_example.png)
179+
180+
### Quant arrays
181+
182+
Bitpacked fields are actually laid in an array of structure (AOS) order.
183+
However, there are also cases where a single quantized type is required to get
184+
laid in an array. For example, you may want to store 8 x u4 values in a single
185+
u32 type, to represent bin values of a histogram:
186+
187+
![image](../static/assets/quant_array_layout_example.png)
188+
189+
Quant array is exactly what you need. A `quant_array` is a SNode which
190+
can reinterpret a primitive type into an array of a quantized type:
191+
192+
```python
193+
bin_value_type = ti.types.quant.int(bits=4, signed=False)
194+
195+
# The quant array for 512 x 512 bin values
196+
array = ti.root.dense(ti.ij, (512, 64)).quant_array(ti.j, 8, max_num_bits=32)
197+
# Place the unsigned 4-bit bin value into the array
198+
array.place(bin_value_type)
199+
```
200+
201+
:::note
202+
1. Only one field can be placed under a `quant_array`.
203+
2. Only quantized integer types and quantized fixed-point types are supported as
204+
the `dtype` of the field under a `quant_array`.
205+
3. The size of the `dtype` of the field times the shape of the `quant_array`
206+
must be less than or equal to the `max_num_bits` of the `quant_array`.
207+
:::
208+
209+
#### Bit vectorization
210+
211+
For quant arrays of 1-bit quantized integer types ("boolean"), Taichi provides
212+
an additional optimization - bit vectorization. It aims at vectorizing
213+
operations on such quant arrays under struct fors:
214+
215+
```python
216+
u1 = ti.types.quant.int(1, False)
217+
N = 512
218+
M = 32
219+
x = ti.field(dtype=u1)
220+
y = ti.field(dtype=u1)
221+
ti.root.dense(ti.i, N // M).quant_array(ti.i, M, max_num_bits=M).place(x)
222+
ti.root.dense(ti.i, N // M).quant_array(ti.i, M, max_num_bits=M).place(y)
223+
224+
@ti.kernel
225+
def assign_vectorized():
226+
ti.loop_config(bit_vectorize=True)
227+
for i, j in x:
228+
y[i, j] = x[i, j] # 32 bits are handled at a time
229+
230+
assign_vectorized()
231+
```
232+
233+
## Advanced examples
234+
235+
The following examples are picked from the
236+
[QuanTaichi paper](https://yuanming.taichi.graphics/publication/2021-quantaichi/quantaichi.pdf),
237+
so you can dig into details there.
238+
239+
### [Game of Life](https://github.com/taichi-dev/quantaichi/tree/main/gol)
240+
241+
![image](https://github.com/taichi-dev/quantaichi/raw/main/pics/teaser_gol.jpg)
242+
243+
### [Eulerian Fluid](https://github.com/taichi-dev/quantaichi/tree/main/eulerian_fluid)
244+
245+
![image](https://github.com/taichi-dev/quantaichi/raw/main/pics/smoke_result.png)
246+
247+
### [MLS-MPM](https://github.com/taichi-dev/taichi_elements/blob/master/demo/demo_quantized_simulation_letters.py)
248+
249+
![image](https://github.com/taichi-dev/quantaichi/raw/main/pics/mpm-235.jpg)

docs/lang/articles/differentiable/differentiable_programming.md

+76
Original file line numberDiff line numberDiff line change
@@ -448,3 +448,79 @@ Check out [the DiffTaichi paper](https://arxiv.org/pdf/1910.00935.pdf)
448448
and [video](https://www.youtube.com/watch?v=Z1xvAZve9aE) to learn more
449449
about Taichi differentiable programming.
450450
:::
451+
452+
453+
## Forward-Mode Autodiff
454+
455+
There are two modes of automatic differentiation, forward and reverse mode. The forward mode provides a function to compute Jacobian-Vector Product (JVP), which can compute one column of the Jacobian matrix at a time. The reverse mode supports computing Vector-Jacobian Product (VJP), i.e., one row of the Jacobian matrix at a time. Therefore, for functions which have more inputs than outputs, reverse mode is more efficient. The `ti.ad.Tape` and `kernel.grad()` are built on the reverse mode. The forward mode is more efficient when handling functions whose outputs are more than inputs. Taichi autodiff also supports forward mode.
456+
457+
### Using `ti.ad.FwdMode`
458+
The usage of `ti.ad.FwdMode` is very similar to `ti.ad.Tape`. Here we reuse the example for reverse mode above for an explanation.
459+
1. Enable `needs_dual=True` option when declaring fields involved in the derivative chain.
460+
2. Use context manager with `ti.ad.FwdMode(loss=y, param=x)`: to capture the kernel invocations which you want to automatically differentiate. The `loss` and `param` are the output and input of the function respectively.
461+
3. Now dy/dx value at current x is available at function output `y.dual[None]`.
462+
The following code snippet explains the steps above:
463+
464+
```python
465+
import taichi as ti
466+
ti.init()
467+
468+
x = ti.field(dtype=ti.f32, shape=(), needs_dual=True)
469+
y = ti.field(dtype=ti.f32, shape=(), needs_dual=True)
470+
471+
472+
@ti.kernel
473+
def compute_y():
474+
y[None] = ti.sin(x[None])
475+
476+
477+
with ti.ad.FwdMode(loss=y, param=x):
478+
compute_y()
479+
480+
print('dy/dx =', y.dual[None], ' at x =', x[None])
481+
```
482+
483+
:::note
484+
The `dual` here indicates `dual number`in math. The reason for using the name is that forwar-mode autodiff is equivalent to evaluating function with dual numbers.
485+
:::
486+
487+
:::note
488+
The `ti.ad.FwdMode` automatically clears the dual field of `loss`.
489+
:::
490+
491+
ti.ad.FwdMode support multiple inputs and outputs. The param can be a N-D field and the loss can be an individual or a list of N-D fields. The argument `seed` is the 'vector' in Jacobian-vector product, which used to control the parameter that is computed derivative with respect to. Here we show three cases with multiple inputs and outputs. With `seed=[1.0, 0.0] `or `seed=[0.0, 1.0]` , we can compute the derivatives solely with respect to `x_0` or `x_1`.
492+
493+
```python
494+
import taichi as ti
495+
ti.init()
496+
N_param = 2
497+
N_loss = 5
498+
x = ti.field(dtype=ti.f32, shape=N_param, needs_dual=True)
499+
y = ti.field(dtype=ti.f32, shape=N_loss, needs_dual=True)
500+
501+
502+
@ti.kernel
503+
def compute_y():
504+
for i in range(N_loss):
505+
for j in range(N_param):
506+
y[i] += i * ti.sin(x[j])
507+
508+
509+
# Compute derivatives respect to x_0
510+
with ti.ad.FwdMode(loss=y, param=x, seed=[1.0, 0.0]):
511+
compute_y()
512+
print('dy/dx_0 =', y.dual, ' at x_0 =', x[0])
513+
514+
# Compute derivatives respect to x_1
515+
with ti.ad.FwdMode(loss=y, param=x, seed=[0.0, 1.0]):
516+
compute_y()
517+
print('dy/dx_1 =', y.dual, ' at x_1 =', x[1])
518+
```
519+
520+
:::note
521+
The `seed` argument is required if the `param` is not a scalar field.
522+
:::
523+
524+
:::tip
525+
Similar to reverse mode autodiff, Taichi provides an API `ti.root.lazy_dual()` that automatically places the dual fields following the layout of their primal fields.
526+
:::
Loading
Loading
Loading

0 commit comments

Comments
 (0)