The idea that you shouldn't be expected to know what data you're using to train your systems and that doing so is "an impossible task" is so normalized that its hard to know that this was not always the case even in the field of AI.
Data theft and scraping became completely normalized, along with the exploitation of crowdworkers, with the advent of photo sharing and other platforms and others like amazon mechanical turk.
So now NOT exploiting people and stealing data is the anomaly.