ASAP Seminar Series

Advances in Sequence modeling from Algorithmic Perspectives

About The Seminar

The ASAP is a fully virtual seminar that, while emphasizing efficiency as its name suggests, takes a distinct approach from traditional ML systems seminars. As foundation models grow increasingly powerful through scaling of data and model size, their capabilities remain inherently limited by their architectures. This seminar explores novel architectural designs and paradigms for both training and inference, examining sequence modeling approaches from an algorithmic perspective. The seminar serves as a bridge between theoretical, algorithmic, and systems research communities, tackling fundamental challenges in Transformer models—including reasoning capabilities, generalization power, memorization, and expressiveness. By adopting first-principles approaches, it seeks solutions that address these limitations while maintaining computational efficiency on modern hardware, with the ultimate goal of motivating the next generation of foundation models that overcome current limitations.

Sources

Don't wait, join our discussion on Discord!

Join Zoom for our seminar!

Subscribe our Email List to get notified of the speaker and livestream link every week!

Watch the recordings of past talks on Youtube!

Want to present?

1. Get in touch

Contact us via email or any other method with the topic you'd like to present and your preferred time slot.

2. We add you

We will add you to an available time slot.

3. Present

Present your paper or a topic of interest.

Event Schedule

Time

Presenter

Topic

Source

13:00 PM ET
10/23/2025

Arshia Afzal

LION: Linear Attention for Efficient Bidirectional Sequence Modeling

Paper

14:00 PM ET
10/22/2025

Zhuoqing Song

Causal Attention with Lookahead Keys

Paper

16:00 PM ET
10/16/2025

Kwangjun Ahn

Making orthonormal updates more scalable

Paper

14:00 PM ET
10/14/2025

Lifan Yuan

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Paper

16:30 PM ET
10/10/2025

Konwoo Kim

Suhas Kotha

Pre-training under infinite compute

Paper

11:00 AM ET
10/8/2025

Fengzhuo Zhang

Muon Outperforms Adam in Tail-End Associative Memory Learning

Paper

14:00 PM ET
9/30/2025

Xavier Gonzalez

Leo Kozachkov

Parallelizing "Inherently Sequential" Processes: Parallel Newton methods for nonlinear state space models

Paper 1

14:00 PM ET
9/16/2025

Sabri Eyuboglu

Cartridges: lightweight and general-purpose language model memory via self-study

Paper

18:00 PM ET
9/10/2025

Kaiyue Wen

Fantastic Pretraining Optimizers and Where to Find Them

Paper

11:00 AM ET
9/5/2025

Jingwei Zuo

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper

10:00 PM ET
8/27/2025

Jinjie Ni

Diffusion Language Models are Super Data Learners

Blog

2:00 PM ET
8/26/2025

Mihir Prabhudesai

Diffusion Beats Autoregressive in Data-Constrained Settings

Paper

4:00 PM ET
8/21/2025

Hongyin Luo

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper

2:00 PM ET
8/13/2025

Korbinian Pöppel

pLSTM: parallelizable Linear Source Transition Mark networks

Paper

5:15 PM ET
8/5/2025

Jason Ansel

Helion: A high-level DSL for ML kernels

Document

2:00 PM ET
8/4/2025

Assaf Ben-Kish

Overflow Prevention Enhances Long-Context Recurrent Models

Paper

2:00 PM ET
7/29/2025

Jacob Buckman

Scaling Context Requires Rethinking Attention

Paper

2:00 PM ET
7/24/2025

Aurko Roy

Fast and Simplex: 2-Simplicial Attention in Triton

Paper

2:00 PM ET
7/22/2025

Aviv Bick

On the Transformer-SSM Gap (And the Role of the Gather-and-Aggregate Mechanism)

Paper

2:00 PM ET
7/1/2025

Zirui Liu

Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning

Paper

12:00 PM ET
6/27/2025

Mingyu Xu

DeltaFormer: breaking the expressivity of Transformer with delta rule

Paper

12:00 PM ET
6/26/2025

Piotr Nawrot

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Paper

2:00 PM ET
6/24/2025

Johannes von Oswald

Nino Scherrer

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

Paper

10:00 PM ET
6/18/2025

Runze Liu

Scaling Test-Time Compute of LLMs and PRMs for Mathematical Reasoning

Paper 1

14:00 PM ET
6/18/2025

Xinyu Yang

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Website

14:00 PM ET
6/9/2025

Tianyuan Zhang

Test-Time Training Done Right

Paper

14:00 PM ET
6/5/2025

Yifei Wang

Your Next-Token Prediction and Transformers Are Biased for Long-Context Modeling

Paper 1

22:00 PM ET
6/3/2025

Jianyu Zhang

AI for the open-world: the learning principles

Paper

14:00 PM ET
5/28/2025

Chenxiao Yang

PENCIL: Long Thoughts with Short Memory

Paper

15:00 PM ET
5/22/2025

Xiangming Gu

When Attention Sink Emerges in Language Models: An Empirical View

Paper

16:00 PM ET
5/21/2025

Shizhe Diao

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper

14:00 PM ET
5/6/2025

Siyan Zhao

Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Paper

10:30 AM ET
4/23/2025

Yihan Geng

Theoretical benefit and limitation of diffusion language models

Paper

02:00 PM ET
4/22/2025

Lin Zheng

EvaByte: Efficient Byte-level Language Models at Scale

Blog

11:00 AM ET
4/17/2025

Guanghan Wang

Remasking Discrete Diffusion Models with Inference-Time Scaling

Paper

02:00 PM ET
4/17/2025

Ali Behrouz

Titans: Learning to Memorize at Test Time

Paper

02:00 PM ET
4/10/2025

Babak Rahmani

Implicit Language Models are RNNs: Balancing Parallelization and Expressivity

Paper

02:00 PM ET
4/9/2025

Xin Dong

Hymba: Hybrid Heads, Meta Tokens, and Training of SoTA models

Paper

10:00 PM ET
3/27/2025

Jiezhong Qiu

MoBA: Mixture of Block Attention for Long-Context LLMs

Paper

2:00 PM ET
3/20/2025

Zhixuan Lin

Forgetting Transformer: Softmax Attention with a Forget Gate

Paper

1:00 PM ET
3/18/2025

Jonas Geiping

What's so interesting about models with recurrent depth?

Paper

1:30 PM ET
3/12/2025

Luca Zancato

B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory

Paper

1:30 PM ET
3/5/2025

Riccardo Grazzi

Julien Siems

State Tracking in Scalable Linear RNNs

Paper 1

1:30 PM ET
3/3/2025

Shawn Tan

Yikang Shen

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study

Paper

4:00 PM ET
2/24/2025

Yang Zhou

Beidi Chen

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?

Paper

1:30 PM ET
2/19/2025

Alex Wang

Test-time regression: a unifying framework for designing sequence models with associative memory

Paper

Organizers

This seminar is organized by

Speaker 3

Songlin Yang

Massachusetts Institute of Technology

Speaker 2

Malachy Yang

Carnegie Mellon University

Speaker 1

Han Guo

Massachusetts Institute of Technology

Speaker 1

Simran Arora

Stanford University