Meta Llama 3.1 8B Instruct
Q4_K - Medium
8.0Bparams
COMPARE ACCELERATORS
119 accelerators tested
Select Accelerators
NVIDIA RTX 6000 Ada Generation
47GB
NVIDIA GeForce RTX 4090
24GB
NVIDIA GeForce RTX 5090
31GB
NVIDIA H100 PCIe
79GB
NVIDIA GeForce RTX 4080
16GB
Meta Llama 3.1 8B Instruct - Q4_K - Medium
LEADERBOARD
GPU / 47GB
PROMPT
6808
tokens/s
GENERATION
121
tokens/s
TTFT
199
ms
LOCALSCORE
1605
GPU / 48GB
PROMPT
5487
tokens/s
GENERATION
51.3
tokens/s
TTFT
252
ms
LOCALSCORE
1038
GPU / 16GB
PROMPT
4222
tokens/s
GENERATION
77.6
tokens/s
TTFT
316
ms
LOCALSCORE
1012
GPU / 16GB
PROMPT
4461
tokens/s
GENERATION
54.4
tokens/s
TTFT
301
ms
LOCALSCORE
931
GPU / 12GB
PROMPT
3526
tokens/s
GENERATION
56.7
tokens/s
TTFT
376
ms
LOCALSCORE
808
GPU / 20GB
PROMPT
2617
tokens/s
GENERATION
56.5
tokens/s
TTFT
518
ms
LOCALSCORE
658
PROMPT
2457
tokens/s
GENERATION
57.4
tokens/s
TTFT
540
ms
LOCALSCORE
639
GPU / 20GB
PROMPT
2013
tokens/s
GENERATION
44.6
tokens/s
TTFT
689
ms
LOCALSCORE
507
GPU / 8GB
PROMPT
1460
tokens/s
GENERATION
57.8
tokens/s
TTFT
880
ms
LOCALSCORE
458
GPU / 8GB
PROMPT
1223
tokens/s
GENERATION
51.3
tokens/s
TTFT
1.04
sec
LOCALSCORE
392
GPU / 16GB
PROMPT
1349
tokens/s
GENERATION
37.4
tokens/s
TTFT
1.01
sec
LOCALSCORE
368
GPU / 8GB
PROMPT
790
tokens/s
GENERATION
27.5
tokens/s
TTFT
1.69
sec
LOCALSCORE
234
GPU / 6GB
PROMPT
971
tokens/s
GENERATION
32.4
tokens/s
TTFT
4.69
sec
LOCALSCORE
232
GPU / 128GB
PROMPT
534
tokens/s
GENERATION
48.9
tokens/s
TTFT
2.16
sec
LOCALSCORE
230
PROMPT
290
tokens/s
GENERATION
4.3
tokens/s
TTFT
8.73
sec
LOCALSCORE
52