Home

99.5k tps on a B200

With llama-3.1 8b

AuthorSurya Dantuluri
Published

Hit 99.5k tokens per second on a single NVIDIA B200 running Llama 3.1 8B. A Modal notebook demonstrating near-theoretical-max throughput with optimized inference settings.