@mikebabcock Quick guide to Unicode terminology:
- code units: the in-memory elements of the text encoding, i.e. bytes for UTF-8, 32-bit integers for UTF-32, etc
- codepoints: the numbers in the range 0–0x10FFFF that are mapped to abstract characters
- graphemes: the smallest functional units of a script, formed from one or more codepoints
- grapheme clusters: the things people usually would describe as ”a character” for the purpose of cursor motion, “the number of characters,” etc.