Embed Notice
HTML Code
Corresponding Notice
- Embed this notice@lanodan @lain I was reading TUHS and :mcilroy: is still on there and posts all the time, and he was talking about very early spelling checkers; you didn't have spare space for lists of all the words in the English language, so it would do something like¹
:mycomputer: tr -c '-a-zA-Z0-9' '\n' | awk '{a[$1]++}END{for(i in a)print a[i], i}' | sort -n
The reasoning was that most of the misspelled words would be relatively unique, so a word that only appears once is more likely to be a misspelling. This approach probably works really well for catching misspelled identifiers in codebases.
¹ This is an approximation; I don't remember the name of the spelling program. I dug up the C from one of the really old Unix source dumps, but it was nearly unreadable and didn't compile. Most of the old code still builds and is legible, like cal and ed, but this was kind of a mess. Apparently it got included in the Unix Writer's Workbench but wasn't written by Lorinda Cherry; statistics guys write really rough code for some reason.