Hit 99.5k tokens per second on a single NVIDIA B200 running Llama 3.1 8B. A Modal notebook demonstrating near-theoretical-max throughput with optimized inference settings.
With llama-3.1 8b
Hit 99.5k tokens per second on a single NVIDIA B200 running Llama 3.1 8B. A Modal notebook demonstrating near-theoretical-max throughput with optimized inference settings.