If you had code on GitHub at any point it looks like it might be included in a large dataset called “The Stack” — If you want your code removed from this massive “ai” training data go here:
https://huggingface.co/spaces/bigcode/in-the-stack
I found two of my old Github repos in there. Both were deleted last year and both were private. This is a serious breach of trust by Github and @huggingface.
Remove all your code from Github.
CONSENT IS NOT OPT-OUT.
Edit — thanks for all the replies. More context here: https://hachyderm.io/@joeyh/112105744123363587
Also the repos i found of mine i’m sure were private, but even if they were public at some point, for a brief time, in the past that isn’t my consent to use them for purposes beyond their intent.
---
Edit 2 -- I see this made it to HN, which is a level of attention I do not want nor appreciate....
For all those wondering about the private repo issue -- No, I am not 100% sure that these ancient repos weren't at some point public for a split second before I changed it. I do know that they were never meant for this and that one of them didn't even contain any code.
If my accidentally making a repo public for a moment just so happened to overlap with this scraping, then I guess that's possible. But it in no way invalidates the issues, and the anger that i feel about it.