Lots of people have worried about CSAM in training sets, including the LAION team themselves, but David actually created a novel mechanism to detect it.
Hopefully this will change the way these training sets are created in the future.
Conversation
Notices
-
Embed this notice
Alex Stamos (alex@cybervillains.com)'s status on Thursday, 21-Dec-2023 11:32:04 JST Alex Stamos -
Embed this notice
Alex Stamos (alex@cybervillains.com)'s status on Thursday, 21-Dec-2023 11:32:05 JST Alex Stamos How does Stable Diffusion 1.5 know how to create CSAM? It turns out it was trained on thousands of illegal images contained in the extremely popular LAION-5B image set.
I’m so incredibly proud of my friend and colleague @detStory:
https://www.404media.co/laion-datasets-removed-stanford-csam-child-abuse/Paper:
https://stacks.stanford.edu/file/druid:kh752sm9123/ml_training_data_csam_report-2023-12-20.pdfMike McCue repeated this.
-
Embed this notice