|
| 1 | +--- |
| 2 | +sidebar_position: 3 |
| 3 | +--- |
| 4 | + |
| 5 | +# Using quantized data types |
| 6 | + |
| 7 | +High-resolution simulations can deliver great visual quality, but they are often |
| 8 | +limited by available memory, especially on GPUs. For the sake of saving memory, |
| 9 | +Taichi provides low-precision ("quantized") data types. You can define your own integers, |
| 10 | +fixed-point numbers or floating-point numbers with non-standard number of bits so |
| 11 | +that you can choose a proper setting with minimum memory for your applications. |
| 12 | +Taichi provides a suite of tailored domain-specific optimizations to ensure the |
| 13 | +runtime performance with quantized data types close to that with full-precision |
| 14 | +data types. |
| 15 | + |
| 16 | +:::note |
| 17 | +Quantized data types are only supported on CPU and CUDA backends for now. |
| 18 | +::: |
| 19 | + |
| 20 | +## Quantized data types |
| 21 | + |
| 22 | +### Quantized integers |
| 23 | + |
| 24 | +Modern computers represent integers using the [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) |
| 25 | +format. *Quantized integers* in Taichi adopt the same format, and can contain |
| 26 | +non-standard number of bits: |
| 27 | + |
| 28 | +```python |
| 29 | +i10 = ti.types.quant.int(bits=10) # 10-bit signed (default) integer type |
| 30 | +u5 = ti.types.quant.int(bits=5, signed=False) # 5-bit unsigned integer type |
| 31 | +``` |
| 32 | + |
| 33 | +### Quantized fixed-point numbers |
| 34 | + |
| 35 | +[Fixed-point numbers](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) are |
| 36 | +an old way to represent real numbers. The internal representation of a fixed-point number is simply an integer, and |
| 37 | +its actual value equals to the integer multiplied by a predefined scaling |
| 38 | +factor. Based on the support for quantized integers, Taichi provides *quantized |
| 39 | +fixed-point numbers* as follows: |
| 40 | + |
| 41 | +```python |
| 42 | +fixed_type_a = ti.types.quant.fixed(bits=10, max_value=20.0) # 10-bit signed (default) fixed-point type within [-20.0, 20.0] |
| 43 | +fixed_type_b = ti.types.quant.fixed(bits=5, signed=False, max_value=100.0) # 5-bit unsigned fixed-point type within [0.0, 100.0] |
| 44 | +fixed_type_c = ti.types.quant.fixed(bits=6, signed=False, scale=1.0) # 6-bit unsigned fixed-point type within [0, 64.0] |
| 45 | +``` |
| 46 | + |
| 47 | +`scale` is the scaling factor mentioned above. Because fixed-point numbers are |
| 48 | +especially useful when you know the actual value is guaranteed to be within a |
| 49 | +range, Taichi allows you to simply set `max_value` and will calculate the |
| 50 | +scaling factor accordingly. |
| 51 | + |
| 52 | +### Quantized floating-point numbers |
| 53 | + |
| 54 | +[Floating-point numbers](https://en.wikipedia.org/wiki/Floating-point_arithmetic) |
| 55 | +are the standard way to represent real numbers on modern computers. A |
| 56 | +floating-point number is composed of exponent bits, fraction bits, and a sign |
| 57 | +bit. There are various floating-point formats: |
| 58 | + |
| 59 | + |
| 60 | + |
| 61 | +In Taichi, you can define a *quantized floating-point number* with arbitrary |
| 62 | +combination of exponent bits and fraction bits (the sign bit is made part of |
| 63 | +fraction bits): |
| 64 | + |
| 65 | +```python |
| 66 | +float_type_a = ti.types.quant.float(exp=5, frac=10) # 15-bit signed (default) floating-point type with 5 exponent bits |
| 67 | +float_type_b = ti.types.quant.float(exp=6, frac=9, signed=False) # 15-bit unsigned floating-point type with 6 exponent bits |
| 68 | +``` |
| 69 | + |
| 70 | +### Compute types |
| 71 | + |
| 72 | +All the parameters you've seen above are specifying the *storage type* of a |
| 73 | +quantized data type. However, most quantized data types have no native support |
| 74 | +on hardware, so an actual value of that quantized data type needs to convert to |
| 75 | +a primitive type ("*compute type*") when it is involved in computation. |
| 76 | + |
| 77 | +The default compute type for quantized integers is `ti.i32`, while the default |
| 78 | +compute type for quantized fixed-point/floating-point numbers is `ti.f32`. You |
| 79 | +can change the compute type by specifying the `compute` parameter: |
| 80 | + |
| 81 | +```python |
| 82 | +i21 = ti.types.quant.int(bits=21, compute=ti.i64) |
| 83 | +bfloat16 = ti.types.quant.float(exp=8, frac=8, compute=ti.f32) |
| 84 | +``` |
| 85 | + |
| 86 | +## Data containers for quantized data types |
| 87 | + |
| 88 | +Because the storage types are not primitive types, you may now wonder how |
| 89 | +quantized data types can work together with data containers that Taichi |
| 90 | +provides. In fact, some special constructs are introduced to eliminate the gap. |
| 91 | + |
| 92 | +### Bitpacked fields |
| 93 | + |
| 94 | +`ti.BitpackedFields` packs a group of fields whose `dtype`s are |
| 95 | +quantized data types together so that they are stored with one primitive type. |
| 96 | +You can then place a `ti.BitpackedFields` instance under any SNode as if each member field |
| 97 | +is placed individually. |
| 98 | + |
| 99 | +```python |
| 100 | +a = ti.field(float_type_a) # 15 bits |
| 101 | +b = ti.field(fixed_type_b) # 5 bits |
| 102 | +c = ti.field(fixed_type_c) # 6 bits |
| 103 | +d = ti.field(u5) # 5 bits |
| 104 | +bitpack = ti.BitpackedFields(max_num_bits=32) |
| 105 | +bitpack.place(a, b, c, d) # 31 out of 32 bits occupied |
| 106 | +ti.root.dense(ti.i, 10).place(bitpack) |
| 107 | +``` |
| 108 | + |
| 109 | +#### Shared exponent |
| 110 | + |
| 111 | +When multiple fields with quantized floating-point types are packed together, |
| 112 | +there is chance that they can share a common exponent. For example, in a 3D |
| 113 | +velocity vector, if you know the x-component has a much larger absolute value |
| 114 | +compared to y- and z-components, then you probably do not care about the exact |
| 115 | +value of the y- and z-components. In this case, using a shared exponent can |
| 116 | +leave more bits for components with larger absolute values. You can use |
| 117 | +`place(x, y, z, shared_exponent=True)` to make fields `x, y, z` share a common |
| 118 | +exponent. |
| 119 | + |
| 120 | +#### Your first program |
| 121 | + |
| 122 | +You probably cannot wait to write your first Taichi program with quantized data |
| 123 | +types. The easiest way is to modify the data definitions of an existing example. |
| 124 | +Assume you want to save memory for |
| 125 | +[examples/simulation/euler.py](https://github.com/taichi-dev/taichi/blob/master/python/taichi/examples/simulation/euler.py). |
| 126 | +Because most data definitions in the example are similar, here only field `Q` is |
| 127 | +used for illustration: |
| 128 | + |
| 129 | +```python |
| 130 | +Q = ti.Vector.field(4, dtype=ti.f32, shape=(N, N)) |
| 131 | +``` |
| 132 | + |
| 133 | +An element of `Q` now occupies 4 x 32 = 128 bits. If you can fit it in |
| 134 | +64 bits, then the memory usage is halved. A direct and first attempt is to |
| 135 | +use quantized floating-point numbers with a shared exponent: |
| 136 | + |
| 137 | +```python |
| 138 | +float_type_c = ti.types.quant.float(exp=8, frac=14) |
| 139 | +Q_old = ti.Vector.field(4, dtype=float_type_c) |
| 140 | +bitpack = ti.BitpackedFields(max_num_bits=64) |
| 141 | +bitpack.place(Q_old, shared_exponent=True) |
| 142 | +ti.root.dense(ti.ij, (N, N)).place(bitpack) |
| 143 | +``` |
| 144 | + |
| 145 | +Surprisingly, you find that there is no obvious difference in visual effects |
| 146 | +after the change, and you now successfully finish a Taichi program with |
| 147 | +quantized data types! More attempts are left to you. |
| 148 | + |
| 149 | +#### More complicated quantization schemes |
| 150 | + |
| 151 | +Here comes a more complicated scenario. In a 3D Eulerian fluid simulation, a |
| 152 | +voxel may need to store a 3D vector for velocity, and an integer value for cell |
| 153 | +category with three possible values: "source", "Dirichlet boundary", and |
| 154 | +"Neumann boundar". You can actually store all information with a single 32-bit |
| 155 | +`ti.BitpackedFields`: |
| 156 | + |
| 157 | +```python |
| 158 | +velocity_component_type = ti.types.quant.float(exp=6, frac=8, compute=ti.f32) |
| 159 | +velocity = ti.Vector.field(3, dtype=velocity_component_type) |
| 160 | + |
| 161 | +# Since there are only three cell categories, 2 bits are enough. |
| 162 | +cell_category_type = ti.types.quant.int(bits=2, signed=False, compute=ti.i32) |
| 163 | +cell_category = ti.field(dtype=cell_category_type) |
| 164 | + |
| 165 | +voxel = ti.BitpackedFields(max_num_bits=32) |
| 166 | +# Place three components of velocity into the voxel, and let them share the exponent. |
| 167 | +voxel.place(velocity, shared_exponent=True) |
| 168 | +# Place the 2-bit cell category. |
| 169 | +voxel.place(cell_category) |
| 170 | +# Create 512 x 512 x 256 voxels. |
| 171 | +ti.root.dense(ti.ijk, (512, 512, 256)).place(voxel) |
| 172 | +``` |
| 173 | + |
| 174 | +The compression scheme above allows you to store 13 bytes (4B x 3 + 1B) into |
| 175 | +just 4 bytes. Note that you can still use velocity and cell_category in the |
| 176 | +computation code, as if they are `ti.f32` and `ti.u8`. |
| 177 | + |
| 178 | + |
| 179 | + |
| 180 | +### Quant arrays |
| 181 | + |
| 182 | +Bitpacked fields are actually laid in an array of structure (AOS) order. |
| 183 | +However, there are also cases where a single quantized type is required to get |
| 184 | +laid in an array. For example, you may want to store 8 x u4 values in a single |
| 185 | +u32 type, to represent bin values of a histogram: |
| 186 | + |
| 187 | + |
| 188 | + |
| 189 | +Quant array is exactly what you need. A `quant_array` is a SNode which |
| 190 | +can reinterpret a primitive type into an array of a quantized type: |
| 191 | + |
| 192 | +```python |
| 193 | +bin_value_type = ti.types.quant.int(bits=4, signed=False) |
| 194 | + |
| 195 | +# The quant array for 512 x 512 bin values |
| 196 | +array = ti.root.dense(ti.ij, (512, 64)).quant_array(ti.j, 8, max_num_bits=32) |
| 197 | +# Place the unsigned 4-bit bin value into the array |
| 198 | +array.place(bin_value_type) |
| 199 | +``` |
| 200 | + |
| 201 | +:::note |
| 202 | +1. Only one field can be placed under a `quant_array`. |
| 203 | +2. Only quantized integer types and quantized fixed-point types are supported as |
| 204 | +the `dtype` of the field under a `quant_array`. |
| 205 | +3. The size of the `dtype` of the field times the shape of the `quant_array` |
| 206 | +must be less than or equal to the `max_num_bits` of the `quant_array`. |
| 207 | +::: |
| 208 | + |
| 209 | +#### Bit vectorization |
| 210 | + |
| 211 | +For quant arrays of 1-bit quantized integer types ("boolean"), Taichi provides |
| 212 | +an additional optimization - bit vectorization. It aims at vectorizing |
| 213 | +operations on such quant arrays under struct fors: |
| 214 | + |
| 215 | +```python |
| 216 | +u1 = ti.types.quant.int(1, False) |
| 217 | +N = 512 |
| 218 | +M = 32 |
| 219 | +x = ti.field(dtype=u1) |
| 220 | +y = ti.field(dtype=u1) |
| 221 | +ti.root.dense(ti.i, N // M).quant_array(ti.i, M, max_num_bits=M).place(x) |
| 222 | +ti.root.dense(ti.i, N // M).quant_array(ti.i, M, max_num_bits=M).place(y) |
| 223 | + |
| 224 | +@ti.kernel |
| 225 | +def assign_vectorized(): |
| 226 | + ti.loop_config(bit_vectorize=True) |
| 227 | + for i, j in x: |
| 228 | + y[i, j] = x[i, j] # 32 bits are handled at a time |
| 229 | + |
| 230 | +assign_vectorized() |
| 231 | +``` |
| 232 | + |
| 233 | +## Advanced examples |
| 234 | + |
| 235 | +The following examples are picked from the |
| 236 | +[QuanTaichi paper](https://yuanming.taichi.graphics/publication/2021-quantaichi/quantaichi.pdf), |
| 237 | +so you can dig into details there. |
| 238 | + |
| 239 | +### [Game of Life](https://github.com/taichi-dev/quantaichi/tree/main/gol) |
| 240 | + |
| 241 | + |
| 242 | + |
| 243 | +### [Eulerian Fluid](https://github.com/taichi-dev/quantaichi/tree/main/eulerian_fluid) |
| 244 | + |
| 245 | + |
| 246 | + |
| 247 | +### [MLS-MPM](https://github.com/taichi-dev/taichi_elements/blob/master/demo/demo_quantized_simulation_letters.py) |
| 248 | + |
| 249 | + |
0 commit comments