- Add unicode-segmentation dependency for proper grapheme cluster support
- Replace chars() iteration with graphemes(true) for accurate character counting
- Fix counting of complex Unicode characters like emojis, combining characters, and multi-byte sequences
- Resolves TODO: 'do graphemes?' in document_statistics function
This change provides more accurate character counts for international text,
emojis with skin tones, combined characters, and other multi-codepoint graphemes.
Examples of improved accuracy:
- 👍🏾 now counts as 1 character instead of 2
- é (e + combining acute) counts as 1 character instead of 2
- 🧑💻 (person technologist) counts as 1 character instead of 4