EleutherAI ML Scalability & Performance Reading Group My annotated papers, slides, and meeting recordings for the EleutherAI ML Scalability & Performance research paper reading group. Sessions: Session 1 Intro to GPU architecture, CUDA, NCCL, and common ML performance bottlenecks Session 2 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Session 3 ZeRO: Memory Optimizations Toward Training Trillion Parameter Models Session 4 Sequence Parallelism: Long Sequence Training from System Perspective Blockwise Parallel Transformer for Large Context Models Ring Attention with Blockwise Transformers for Near-Infinite Context Length Session 5 Efficient Memory Management for Large Language Model Serving with PagedAttention Session 6 GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism PipeDream: Fast and Efficient Pipeline Parallel DNN Training Zero Bubble Pipeline Parallelism Session 7 DeepSeek V3 DeepSeek V2 Session 8 Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism