Technical

Quantization

Definition

A technique that reduces model size and increases inference speed by using lower-precision number representations.

In-Depth Explanation

Quantization converts model weights from 32-bit floats to 16-bit, 8-bit, or even 4-bit integers. This dramatically reduces memory requirements and speeds up inference with minimal accuracy loss. Quantization enables running large models on consumer hardware.

Real-World Example

A 70B parameter model quantized to 4-bit can run on a high-end consumer GPU that could not handle the full-precision version.

2 views0 found helpful

Quantization

Definition

In-Depth Explanation

Real-World Example

Related Terms

Inference