I've been analyzing the public NPD data leak. Not *all* of it, only the most public ~277GB (uncompressed) corpus: https://www.troyhunt.com/inside-the-3-billion-people-national-public-data-breach/
Corporate media headlines said 2.9, 2.7 billion, "every American", etc, which raised questions in threads about it: https://noauthority.social/@ned/112962645178289749
Troy Hunt estimated ~899M unique SSNs, though I'm curious if his "100M samples" were random or sequential because I've noticed formatting differences between sections of the corpus, indicating different origins.
Cont...