Enumeration for the Unicode "General Category" used to roughly classify codepoints into letters, punctuation etc.
Counts the number of grapheme clusters (character count) in a UTF string.
Retrieves the "General Category" of the first code point in some UTF-8 string. For broken UTF-8, the property is set to GeneralCategory.__ (0).
A customizable structure providing information on a code point. It consists of a Unicode property in the form of an enum (e.g. GeneralCategory) and a length in bytes of the code point in UTF-8.
Functions to work with the Unicode Transformation Format.
Grapheme clusters: A grapheme cluster is roughly speaking what the user would perceive as the smallest unit in a writing system. Their count can be thought of as a caret position in a text editor. In particular at grapheme cluster level, different normalization forms (NFC, NFD) become transparent. The default definition used here is independent of the user's locale.