model icon

LLaMA v2

Q4_0

6.7Bparams
COMPARE ACCELERATORS

2 accelerators tested

Select Accelerators

Apple M4 Max 12P+4E+40GPU

64GB

Tesla P40

24GB

LLaMA v2 - Q4_0

LEADERBOARD
PROMPT
692
tokens/s
GENERATION
62.5
tokens/s
TTFT
1.73
sec
LOCALSCORE
293
GPU / 24GB
PROMPT
833
tokens/s
GENERATION
40.9
tokens/s
TTFT
1.52
sec
LOCALSCORE
282