Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20 • 120
FastWan Collection models trained with video sparse attention: https://arxiv.org/abs/2505.13389 and distillation • 9 items • Updated 18 days ago • 10
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding Paper • 2502.05431 • Published Feb 8 • 6