Agents

Tokenizer

1 min read

Definition

The component that converts text into tokens that a language model can process.

Tokenizers are often overlooked but significantly impact model behavior and efficiency.

Types

  • BPE (Byte Pair Encoding)
  • WordPiece
  • SentencePiece
  • Character-level

Considerations

  • Vocabulary size
  • Handling of rare words
  • Multi-language support
  • Special tokens
agentstechnicalprocessing