Loading GPU database…
VRAM Size
Memory Type
Memory Bus
Slot
Release Year
Base Clock (MHz)
–
Boost Clock (MHz)
–
Memory Clock (MHz)
–
Mem. Bandwidth (GB/s)
–
FP32 (TFLOPS)
–
FP16 (TFLOPS)
–
FP64 (TFLOPS)
–
Pixel Rate (GP/s)
–
Texture Rate (GT/s)
–
TMUs
–
ROPs
–
RT Cores
–
L1 Cache (KB)
–
L2 Cache (MB)
–
About the Calculations
Estimates include model weights, KV cache memory (which scales with context length), and runtime overhead.
🟢 SAFE — Uses ≤85% GPU VRAM
🟡 TIGHT — Fits in GPU VRAM (tight, up to 100%)
🔵 OFFLOAD — Requires CPU/RAM offloading (slower)
🔴 WON'T RUN — Exceeds GPU + system RAM
How It Works
Running large language models (LLMs) locally requires enough GPU VRAM to store model weights and key-value (KV) cache memory. The required memory increases with larger models and longer context lengths.
This calculator estimates total GPU memory usage based on model size, quantization level (4-bit, 8-bit, or FP16), and context length. It helps determine whether a model runs entirely on GPU, requires system RAM offloading, or exceeds your hardware limits.
Supported models include Llama 3 (8B, 70B), Mistral 7B, Mixtral, and other popular open-source LLMs used for local inference.