The BigCode project (supported by Hugging Face) created an "AI" dataset with 67 TB of code, a lot of it from GitHub users who did not agree to this. Some even claim that private repositories are included. 91 of my repositories are in it, many without an open-source license, but no private ones. They provide an opt-out link, but only for "future versions", and it simply creates an issue in a GitHub repo. 99.8 % of them are still in "open" state, dating back to March 2023.