I'll post an approximate count of total unique SSNs in this corpus when I have it.
I'd like to do some fraud analysis to determine if/how many "hot" SSNs may be being fraudulently abused by what appear to be multiple people. However his will be tricky because of recycled numbers, name changes, etc, so I'll have to experiment a bit and see how feasible this is. Processing "big data" is time consuming no matter how efficient you are.