Technical
Tokenization
Definition
The process of breaking down text into smaller units called tokens that AI models can process.In-Depth Explanation
Tokens can be words, subwords, or characters depending on the tokenizer. Most LLMs use subword tokenization (like BPE) which balances vocabulary size with the ability to handle unknown words. Token count affects both cost (API pricing) and context window usage.
Real-World Example
The word "unhappiness" might be tokenized as ["un", "happiness"] or ["un", "happi", "ness"].
0 views0 found helpful