Alexandre Oliva (lxo@gnusocial.jp)'s status on Tuesday, 05-Nov-2024 06:08:18 JST
-
Embed this notice
I'm in no way related with OSI, and I know very little of current LLM tech, but I've been thinking a lot about this issue from a software freedom philosophical perspective, trying to figure out how essential training data is for users to have the four essential freedoms.
it's not obvious to me whether having access to the training data places users and developers at an advantage or at a disadvantage compared with those that don't have access to it. training data is so massive, and the link from any of it to the system's behavior is so subtle, that it seems conceivable to me that probing the system's behavior and relying on incremental training might be more efficient and more reliable than analysis of the training set, for at least some past, current and future technology.
since I don't know enough about current systems to tell, I set out to devise a method to find the answer to that question. I'm thinking that an adversarial setting, in which users/developers who have access to the training data compete with users/developers who don't to find answers to questions about how the system works, and to modify the system so that it does what is requested (these are analogous to freedom #1), with questions and change requests coming from adversarial proponents. this would be a kind of Turing test on whether any given system respects freedom #1 (the other freedoms are much easier to tell), and it could be applied to any future such systems as well. has any such thing been considered? does it seem worth doing, or even thinking more of?
cc: @joshuagay @zacchiro @freemo