@simontatham Pardon my straying from the joke, but FYI: #musl libc has a super compact representation for finding the rule for case mapping for a particular character. Under 5k for tables plus code. We don't have the car emoji mapping tho.
Conversation
Notices
-
Embed this notice
Rich Felker (dalias@hachyderm.io)'s status on Friday, 31-Jan-2025 21:48:44 JST Rich Felker
-
Embed this notice
Simon Tatham (simontatham@hachyderm.io)'s status on Friday, 31-Jan-2025 21:51:56 JST Simon Tatham
@dalias indeed, Unicode does carefully specify its own rule for translating between upper and lower case, which will tell you when there's no such mapping available, and also work correctly when there is one but it doesn't follow the 'xor with 0x20' rule.
(Fun fact: the 'xor with 0x20' rule works for half the Greek alphabet but not the other half, because the two cases of Greek are separated by 0x20 but offset by 0x10. E.g. the xor rule maps Γ to γ as you'd like, but Σ and σ each map to an unrelated thing.)
But if I'd used the proper Unicode case mapping rules then the joke wouldn't have worked :-)
-
Embed this notice