📘 About the Tutorial (NLPCC'2024 and CCMT'2024)
This tutorial, presented by Juntao Li and Zecheng Tang from the OpenNLG Group @ Soochow University, delves into the recent advancements and challenges in long-context modeling within the era of large language models (LLMs). It covers the latest research and practical implementations in the long-context model field.
📑 NLPCC'2024 Tutorial Slides [PDF] 📑 CCMT'2024 Tutorial Slides [PDF]
📌 Note We only cover the paper lists within the tutorial slides. For more papers, plz refer to https://github.com/Xnhyacinth/Awesome-LLM-Long-Context-Modeling
- Paper List:
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- Extending Context Window of Large Language Models via Positional Interpolation
- Implementation:
- Blog:
- Paper List
- Paper List
- LED: Lightweight and efficient end-to-end speech recognition using low-rank transformer
- Linformer: Self-attention with linear complexity
- Generating Long Sequences with Sparse Transformers
- Big Bird: Transformers for Longer Sequences
- Longformer: The Long-Document Transformer
- Selective Attention Improves Transformer
- RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
- Efficient Long-range Language Modeling with Self-supervised Causal Retrieval
- Paper List:
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- Efficiently Modeling Long Sequences with Structured State Spaces
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- Zoology: Measuring and Improving Recall in Efficient Language Models
- MemLong:MemLong: Memory-Augmented Retrieval for Long Text Modeling
- Revealing and Mitigating the Local Pattern Shortcuts of Mamba
- Implementation:
- Blog:
- Paper List
- Implementation:
- Blog:
- Paper List
- Data engineering for scaling language models to 128k context
- Effective long-context scaling of foundation models
- In-context Pretraining: Language Modeling beyond Document Boundaries
- Extending Llama-3's Context Ten-Fold Overnight
- Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
- Paper List
- L-Eval: Instituting Standardized Evaluation for Long Context Language Models
- LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
- RULER: What's the Real Context Size of Your Long-Context Language Models?
- Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models
- Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
- Long-form factuality in large language models
- LongGenBench: Long-context Generation Benchmark
- L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?
- Implementation