99.5k tps on a B200

With llama-3.1 8b

AuthorSurya Dantuluri
Published
Views19 from San Francisco, Atlanta
99.5k tps on a B200

This post is still being written — please check back later. Posted: May 2026.

Hit 99.5k tokens per second on a single NVIDIA B200 running Llama 3.1 8B. A Modal notebook demonstrating near-theoretical-max throughput with optimized inference settings.