Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

lemmyreader@lemmy.ml · 3 months ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

Skull giver@popplesburger.hilciferous.nl · 3 months ago

AI companies are hoping for a ruling that says content generated from a model trained on content is not a derivative work. So far, the Sarah Silverman lawsuit seems to be going that way, at least; the claimants were set back because they’ve been asked to prove the connection between AI output and their specific inputs.

If this does become jurisprudence or law in one or more countries, licenses don’t mean jack. You can put the AGPL on your stuff and AI could suck it up into their model and use it for whatever they want, and you couldn’t do anything about it.

The AI training sets for all common models contains copyright works like entire books, movies, and websites. Don’t forget that most websites don’t even have a license, and that that unlicensed work is as illegal to replicate as any book or movie normally would be, including internet comments. If AI data sets need to comply with copyright, all current AI will need to be retrained (except maybe for that image AI by that stock photo company, which is exclusively trained on licensed work).

verassol@lemmy.ml · 3 months ago

the claimants were set back because they’ve been asked to prove the connection between AI output and their specific inputs

I mean, how do you do that for a closed-source model with secretive training data? As far as I know, OpenAI has admitted to using large amounts of copyrighted content, numberless books, newspaper material, all on the basis of fair use claims. Guess it would take a government entity actively going after them at this point.

Skull giver@popplesburger.hilciferous.nl · 3 months ago

The training data set isn’t the problem. The data set for many open models is actually not hard to find, and it’s quite obvious that works by the artists were included in the data set. In this case, the lawsuit was about the Stable Diffusion dataset, and I believe that’s just freely available (though you may need to scrape and download the linked images yourself).

For research purposes, this was never a problem: scientific research is exempted from many limitations of copyright. This led to an interesting problem with OpenAI and the other AI companies: they took their research models, the output of research, and turned them into a business.

The way things are going, I expect the law to be like this: datasets can contain copyrighted work as long as they’re only distributed for research purposes, AI models are derivative works, and the output of AI models is not a derivative work, and therefore the output AI companies generate is exempt of copyright. It’s definitely not what I want to happen, but the legal arguments that I thought would kill this interpretation don’t seem to hold water in court.

Of course, courts only apply law as it is written right now. At any point in time, governments can alter their copyright laws to kill or clear AI models. On the one hand, copyright lobbyists have a huge impact on governance, as much as big oil it seems, but on the other hand, banning AI will just put countries that don’t care about copyright to get an economic advantage. The EU has set up AI rules, which I appreciate as an EU citizen, but I cannot deny that this will inevitably lead to a worse environment to do business in compared to places like the USA and China.

verassol@lemmy.ml · 3 months ago

Thank you for sharing. Your perspective broadens mine, but I feel a lot more negative about the whole “must benefit business” side of things. It is fruitless to hold any entity whatsoever accountable when a whole worldwide economy is in a free-for-all nuke-waving doom-embracing realpolitik vibe.

Frankly, not sure what would be worse, economic collapse and the consequences to the people, or economic prosperity and… the consequences to the people. Long term, and from a country that is not exactly thriving in the scheme side of things, I guess I’d take the former.

bitfucker@programming.dev · 3 months ago

Yep. Can’t wait to overfit LLM to a lot of copyrighted work and share it to public domain. Let’s see if OpenAI will get push back from copyright owner down the road.