You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add support for Flash attention
* Fix attention type can be both sparse and flash
* Updates from running pre-commit on modified files
* Update README.md
Co-authored-by: Stella Biderman <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
Copy file name to clipboardexpand all lines: README.md
+5
Original file line number
Diff line number
Diff line change
@@ -99,6 +99,11 @@ from the repository root.
99
99
</aside>
100
100
101
101
102
+
### Flash Attention
103
+
104
+
To use [Flash-Attention](https://github.com/HazyResearch/flash-attention), install the additional dependencies in `./requirements/requirements-flashattention.txt` and set the attention type in your configuration accordingly (see [configs](./configs/)). This can provide significant speed-ups over regular attention on certain GPU architectures, including Ampere GPUs (such as A100s); see the repository for more details.
105
+
106
+
102
107
### Containerized Setup
103
108
104
109
We also provide a Dockerfile if you prefer to run NeoX in a container. To use this option, first build an image named `gpt-neox` from the repository root directory with `docker build -t gpt-neox -f Dockerfile .`. We also host pre-built images on Docker Hub at `leogao2/gpt-neox`.
So a 12 layer network with only global attention could be specified like:
340
340
[[[`global`], 12]]
@@ -345,6 +345,8 @@ Model Arguments
345
345
If none is specified, this defaults to
346
346
[[[`global`], n_layers]]
347
347
348
+
"flash" attention refers to optimized global attention for Ampere (and some other) generation GPUs described here [Flash-Attention](https://github.com/HazyResearch/flash-attention).
349
+
348
350
349
351
350
352
-**sparsity_config**: dict
@@ -950,7 +952,7 @@ Text Generation arguments
950
952
951
953
-**eval_results_prefix**: str
952
954
953
-
Default =
955
+
Default =
954
956
955
957
prefix to which to save evaluation results - final fp will be {eval_results_prefix}_eval_results_yy-mm-dd-HH-MM.json
956
958
@@ -1538,7 +1540,7 @@ Args for deepspeed config
1538
1540
1539
1541
Default = None
1540
1542
1541
-
1543
+
1542
1544
1543
1545
1544
1546
@@ -1670,6 +1672,4 @@ Args for deepspeed runner (deepspeed.launcher.runner).
1670
1672
-**comment**: str
1671
1673
1672
1674
Default = None
1673
-
1674
1675
Adds a `--comment` to the DeepSpeed launch command. In DeeperSpeed this is passed on to the SlurmLauncher as well. Sometime necessary for cluster rules, or so I've heard.
0 commit comments