99.5k tps on a B200

With llama-3.1 8b

Links

AuthorSurya Dantuluri

PublishedNovember 2025

Views30 from Hayward, Hollister, Livermore

Hit 99.5k tokens per second on a single NVIDIA B200 running Llama 3.1 8B. A Modal notebook demonstrating near-theoretical-max throughput with optimized inference settings.