Technical

GPT-Generated Unified Format(GGUF)

Definition

A file format for storing quantized large language models optimized for efficient local inference on consumer hardware.

In-Depth Explanation

GGUF replaced the older GGML format as the standard for running LLMs locally. It stores model weights in a compressed format that can run on CPUs and consumer GPUs. The format supports various quantization levels (Q4, Q5, Q8) trading off model size against quality. Tools like llama.cpp and Ollama use GGUF files.

Real-World Example

Downloading a 4-bit quantized Llama 2 7B model in GGUF format to run on a laptop with 8GB RAM.

2 views0 found helpful

GPT-Generated Unified Format(GGUF)

Definition

In-Depth Explanation

Real-World Example

Related Terms

Ollama

Quantization