Conversation

Notices

Embed this notice
⍨ (chaz@burn.capital)'s status on Saturday, 26-Oct-2024 23:38:09 JST ⍨

As the OSI prepares to make official its "open source AI" definition with a glaring lack of requirement that the actual source (training data) is made available, it's worth noting that their work is funded by google, meta, microsoft, salesforce, etc. What does open source even mean here if the literal source of the model isn't open? These companies are invested in making you think they're on your side while they boil the oceans to avoid paying human beings for labor.
The idea behind open source, as it grew out of the free software movement, has always been to water down software freedoms, to create something more palatable to corporate interests that *sounds* good but means very little. This continues that work for the current "gen AI" bubble. It's time to ditch open source as an ideal, and the OSI especially.
https://opensource.org/ai/drafts/the-open-source-ai-definition-1-0-rc2
#OpenSource #OpenSourceAI #OSI #OpenSourceInitiative #FreeSoftware #AI #GenAI #GenerativeAI

In conversation about 7 months ago from burn.capital permalink
- anban repeated this.
- Embed this notice
  ⍨ (chaz@burn.capital)'s status on Saturday, 02-Nov-2024 04:12:04 JST ⍨
  in reply to
  
  They posit you can still modify (tune) the distributed models without the training source. You can also modify a binary executable without its source code. Frankly that's unacceptable if we actually care about the human beings using the software.
  A key pillar of freedom as it relates to software is reproducibility. The ability to build a tool from scratch, in your own environment, with your own parameters, is absolutely indispensable to both learning how the tool works and changing the tool to better serve your needs, especially if your needs fall on the outskirts of the bell curve.
  There's also the issue of auditability. If you can't run the full build process yourself, producing your own results from scratch in a trusted environment to compare with what's distributed, it becomes exponentially harder to verify any claims about how a tool supposedly works.
  Without the training data, this all becomes impossible for AI models. The OSI knows this. They're choosing to ignore it for the sake of expediency for the companies paying their bills, who want to claim "open" because it sounds good while actually hiding the (largely stolen and fraudulently or non-consentually acquired) source material of their current models.
  Do we want a new definition of "open source" that actively thwarts analysis and tinkering, two fundamental requirements of software that respects human beings today? Reject this nonsense.
  #OpenSource #OpenSourceAI #OSI #OpenSourceInitiative #FreeSoftware #AI #GenAI #GenerativeAI
  In conversation about 7 months ago permalink
  Attachments
  1. Untitled attachment
- Embed this notice
  Alexandre Oliva (moving to @lxo@snac.lx.oliva.nom.br) (lxo@gnusocial.jp)'s status on Tuesday, 05-Nov-2024 06:08:18 JST Alexandre Oliva (moving to @lxo@snac.lx.oliva.nom.br)
  in reply to
  I'm in no way related with OSI, and I know very little of current LLM tech, but I've been thinking a lot about this issue from a software freedom philosophical perspective, trying to figure out how essential training data is for users to have the four essential freedoms.
  
  it's not obvious to me whether having access to the training data places users and developers at an advantage or at a disadvantage compared with those that don't have access to it. training data is so massive, and the link from any of it to the system's behavior is so subtle, that it seems conceivable to me that probing the system's behavior and relying on incremental training might be more efficient and more reliable than analysis of the training set, for at least some past, current and future technology.
  
  since I don't know enough about current systems to tell, I set out to devise a method to find the answer to that question. I'm thinking that an adversarial setting, in which users/developers who have access to the training data compete with users/developers who don't to find answers to questions about how the system works, and to modify the system so that it does what is requested (these are analogous to freedom #1), with questions and change requests coming from adversarial proponents. this would be a kind of Turing test on whether any given system respects freedom #1 (the other freedoms are much easier to tell), and it could be applied to any future such systems as well. has any such thing been considered? does it seem worth doing, or even thinking more of? cc: @joshuagay @zacchiro @freemo
  
  In conversation about 7 months ago permalink
  
  🎓 Doc Freemo :jpf: 🇳🇱 likes this.
- Embed this notice
  Stefano Zacchiroli (zacchiro@mastodon.xyz)'s status on Tuesday, 05-Nov-2024 19:10:20 JST Stefano Zacchiroli
  in reply to
  - Alexandre Oliva (moving to @lxo@snac.lx.oliva.nom.br)
  @lxo @chaz It's a good approach, but I don't think it's needed.
  If we start from first principles, there's no doubt that to fully exercise freedoms of study and modify, you need the training data. (You can exercise *some* of those freedoms even without training data, but in a suboptimal way. I can give precise examples if you're curious.)
  The "data is too big" problem is IMO a distraction. There are relevant ML systems that are small enough to make retraining them from scratch viable.
  
  In conversation about 7 months ago permalink
- Embed this notice
  Stefano Zacchiroli (zacchiro@mastodon.xyz)'s status on Tuesday, 05-Nov-2024 19:17:00 JST Stefano Zacchiroli
  in reply to
  - Alexandre Oliva (moving to @lxo@snac.lx.oliva.nom.br)
  @lxo @chaz AFAICT even OSI recognizes this. Their main arguments against mandating training data in OSAID are of a pragmatic nature, related to (1) the current state of the industry, (2) the legal regimes that apply to data, which are different from those of (free) software.
  I disagree with the decision taken based on those arguments, but I understand them. Either way, they don't call into question the fact that training data is *better* than no training data to exercise user freedoms.
  
  In conversation about 7 months ago permalink
- Embed this notice
  Alexandre Oliva (moving to @lxo@snac.lx.oliva.nom.br) (lxo@gnusocial.jp)'s status on Thursday, 07-Nov-2024 12:33:42 JST Alexandre Oliva (moving to @lxo@snac.lx.oliva.nom.br)
  in reply to
  - Stefano Zacchiroli
  I am curious, and I'd welcome both precise and imprecise examples ;-) thanks in advance
  
  In conversation about 6 months ago permalink
- Embed this notice
  mangeurdenuage :gnu: :trisquel: :gondola_head: 🌿 :abeshinzo: :ignucius: (mangeurdenuage@shitposter.world)'s status on Thursday, 07-Nov-2024 23:28:52 JST mangeurdenuage :gnu: :trisquel: :gondola_head: 🌿 :abeshinzo: :ignucius:
  in reply to
  
  @chaz
  AI isn't an issue imo, proprietary AI is tho.
  https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications
  In conversation about 6 months ago permalink
  Attachments
  1. No result found on File_thumbnail lookup.
    
    FSF is working on freedom in machine learning applications — Free Software Foundation — Working together for free software
    
    from //about/staff/
    
    The FSF is a charity with a worldwide mission to advance software freedom.
- Embed this notice
  ⍨ (chaz@burn.capital)'s status on Saturday, 09-Nov-2024 00:54:04 JST ⍨
  in reply to
  
  You don't have to take my word for it, here's Schneier himself saying this open source AI definition is "terrible":
  https://www.schneier.com/blog/archives/2024/11/ai-industry-is-trying-to-subvert-the-definition-of-open-source-ai.html
  
  In conversation about 6 months ago permalink
- Embed this notice
  fu (fu@libranet.de)'s status on Saturday, 09-Nov-2024 01:10:19 JST fu
  in reply to
  
  @chaz yet another example of why the open source development model isn't what is important. Respecting freedom is what matters. Open source is antithetical to Free Software.
  
  In conversation about 6 months ago permalink

Public

Conversation

Notices

Feeds