So, here's where that leaves us: this data is a combination of information sourced from public LinkedIn profiles, fabricated emails address and in part (anecdotally based on simply eyeballing the data this is a small part), the other sources in the column headings above. But the people are real, the companies are real, the domains are real and in many cases, the email addresses themselves are real. There are over 1.8k HIBP subscribers in the data set and this is folks that have double opted-in so they've successfully received an email to that address in the past. Further, when the data was loaded into HIBP there were nearly a million email addresses that were already in the system so evidently, they were addresses that had previously been in use. Which stands to reason because even if every address was constructed by an algorithm, the pattern is common enough that there'll be a bunch of hits.