nanoEBM

Minimal Energy-based transformer

Links
AuthorSurya Dantuluri
Published
Views15 from Atlanta, San Francisco

This post is still being written — please check back later. Posted: May 2026.

What if next-token prediction wasn't a single forward pass, but a tiny optimization problem? nanoEBM is a ~10M-param character-level Transformer with a linear energy head that learns to think harder at inference time.

Implemented in under 400 lines, runs on your Mac or GPU. 67-token vocab, 6 layers, 384 dim, 6 heads — minimal and extensible by design.