Train a tokenizer

GPU-native Byte Pair Encoding on WebGPU compute shaders. Drop your text files, pick a vocabulary size, and train — everything runs on your GPU.

Drop files here or browse
Multiple files OK · browse folder for recursive scan

Tokenize text

Load a trained vocabulary or train one first, then encode text into BPE tokens to see compression in action.

No vocabulary loaded