Email or username:

Password:

Forgot your password?
Top-level
mcc

I'm really concerned about the effect "generative AI" is going to have on the attempt to build a copyleft/commons.

As artists/coders, we saw that copyright constrains us. So we decided to make a fenced-off area where we could make copyright work for us in a limited way, with permissions for derivative works within the commons according to clear rules set out in licenses.

Now OpenAI has made a world where rules and licenses don't apply to any company with a valuation over $N billion dollars.

139 comments
mcc

(The exact value of "N" is not known yet; I assume it will be solidly fixed by some upcoming court case.)

mcc

In a world where copyleft licenses turn out to restrict only the small actors they were meant to empower, and don't apply to big bad-actor "AI" companies, what is the incentive to put your work out under a license that will only serve to make it a target for "AI" scraping?

With NFTs, we saw people taking their work private because putting something behind a clickwall/paywall was the only way to not be stolen for NFTs. I assume the same process will accelerate in an "AI" world.

pinkdrunkenelephants

@mcc They should just make a license that explicitly bans AI usage then.

mcc

@pinkdrunkenelephants The licenses already bar this because they govern derivative works. If you can make the derivative work non-derivative by defining it "AI", then if we add a nonsense clause banning "AI", the AI companies can simply rename "AI" to "floopleflorp" and say "Ah, but your license only bans 'AI'— it doesn't ban 'floopleflorp'!"

pinkdrunkenelephants

@mcc They can rename it Clancy for all it matters. AI is still AI and actions don't just lose meaning because of evil people playing with language.

mcc

@pinkdrunkenelephants But AI is not AI. The things that they're calling "AI" are just some machine learning statistical models. Ten years ago this wouldn't have been considered "AI".

pinkdrunkenelephants

@mcc Doesn't matter, what matters is the definition behind the word. That is what licenses ought to ban outright.

It's like saying rape is perfectly legal so long as we call it forced sex. Who would believe that that wasn't already predisposed to rape?

Don't fall for other people's manipulative mindgames?

mcc

@pinkdrunkenelephants The definition behind the law is, again, decided by humans, who are capable of inconsistency or poor decisions. Rape is legal in New York because rape there is legally defined by the specific use of certain specific genitals. See E. Jean Carroll v. Donald J. Trump

pinkdrunkenelephants replied to mcc

@mcc And no one accepts that because of what I'm saying. A rose by any other name would smell as sweet. People need to start recognizing that fact. That's the only way things will change.

mcc replied to pinkdrunkenelephants

@pinkdrunkenelephants Well, per my belief as to the meaning of words, ML statistical models are derivative works like any other, and my licenses which place restrictions on derivative works already apply to the ML statistical models

datarama

@pinkdrunkenelephants @mcc That doesn't work if copyright *itself* doesn't apply to AI training, which is what all those court cases are about. Licenses start from the assumption that the copyright holder reserves all rights, and then the license explicitly waives some of those rights under a set of given conditions.

But with AI, it's up in the air whether a copyright holder has any rights at all.

pinkdrunkenelephants

@datarama @mcc I don't see how it would be up in the air. Humans feed that data into AI and use the churned remains so it's still a human violating the copyright.

mcc

@pinkdrunkenelephants @datarama Because humans also are the ones who interpret and enforce laws and if the government does not enforce copyright against companies which market their products as "AI", then copyright does not apply to those companies.

pinkdrunkenelephants

@mcc @datarama I guess that's more of a bribery problem than a legal precedent one, then.

datarama

@pinkdrunkenelephants @mcc In the EU, there actually is some legislation. Copyright explicitly *doesn't* protect works from being used in machine learning for academic research, but ML training for commercial products must respect a "machine-readable opt-out".

But that's easy enough to get around. That's why eg. Stability funded an "independent research lab" who did the actual data gathering for them.

mcc replied to datarama

@datarama I consider this illegitimate and fundamentally unfair because I have already released large amounts of work under creative commons/open source licenses. I can't retroactively add terms to some of them because the plain language somehow no longer applies. If I add such opt-outs now, it would be like I'm admitting the licenses previously didn't apply to statistics-based derivative works

datarama replied to mcc

@mcc I consider it illegitimate and fundamentally unfair because it's opt-out.

pinkdrunkenelephants replied to datarama

@datarama @mcc I wonder why it is people don't just revolt and destroy their servers then. Or drag them into jail.

Why do people delude themselves into accepting atrocities?

datarama replied to pinkdrunkenelephants

@pinkdrunkenelephants @mcc I think if there was a simple clear-cut answer to that, the world would be a *very* different place.

datarama

@mcc There is no such incentive. There is a very, very strong incentive (namely, not wanting to empower the worst scumbags in tech) to *not* share your work publicly anymore.

This, to me, is the most harmful effect so far of generative AI.

Graham Spookyland🎃/Polynomial

@mcc it's kinda gross that the only (current) way to meaningfully and tangibly refuse to be exploited by the mass commercialised theft of the commons is to, well, commercialise the commons.

Graham Spookyland🎃/Polynomial

@mcc although if there's an angstrom-thick silver lining to this whole thing, it's that it has proved incontrovertibly that copyright law was only ever intended to be used as a cudgel by the wealthy and powerful, and never to protect the rights of the individual artist.

Hugo Mills

@gsuberland @mcc The artists occasionally tried using the cudgel, but the opponents brought an AK47 to the courtroom...

Markus Hofer

@mcc maybe we've all been wrong about NFTs and it's the future after all? 😉

margot

@mcc im wondering if the broader art and design worlds will end up in a similar situation to where industries like fashion and jewelry already are, where plagarism is essentially an expectation for the designers working there

margot

@mcc i guess this is less a solution or an endgame so much as a window into an area where copyright has been hurting small creators while being completely flaunted by others worth multi-billions

past oral no mad

@mcc Literally zero. I have a thing I've been hacking on for a while, niche shit, probably not interesting to many others. I was planning on releasing it, but once I realized it'd probably have 0-1 other human users, but end up in every LLM training set, I decided not to.

JP

@mcc "the legal system is ultimately a weapon wielded by those with more capital against those with less" is of course the punchline after every movement that has tried to use legal mechanisms like licenses to enact social change. it'd be nice if there were some deep pan-institutional awareness of and correction for this.

mcc

Did you see this? The whole thing with "the stack".

post.lurk.org/@emenel/11211101

Some jerks did mass scraping of open source projects, putting them in a collection called "the stack" which they specifically recommend other people use as machine learning sources. If you look at their "Github opt-out repository" you'll find just page after page of people asking to have their stuff removed:

github.com/bigcode-project/opt

(1/2)

mcc

…but wait! If you look at what they actually did (correct me if I'm wrong), they aren't actually doing any machine learning in the "stack" repo itself. The "stack" just collects zillions of repos in one place. Mirroring my content as part of a corpus of open source software, torrenting it, putting it on microfilm in a seedbank is the kind of thing I want to encourage. The problem becomes that they then *suggest* people create derivative works of those repos in contravention of the license. (2/2)

mcc

So… what is happening here? All these people are opting out of having their content recorded as part of a corpus of open source code. And I'll probably do the same, because "The Stack" is falsely implying people have permission to use it for ML training. But this means "The Stack" has put a knife in the heart of publicly archiving open source code at all. Future attempts to preserve OSS code will, if they base themselves on "the stack", not have any of those opted-out repositories to draw from.

mcc

Like, heck, how am I *supposed* to rely on my code getting preserved after I lose interest, I die, BitBucket deletes every bit of Mercurial-hosted content it ever hosted, etc? Am I supposed to rely on *Microsoft* to responsibly preserve my work? Holy crud no.

We *want* people to want their code widely mirrored and distributed. That was the reason for the licenses. That was the social contract. But if machine learning means the social contract is dead, why would people want their code mirrored?

Graham Spookyland🎃/Polynomial

@mcc I have generally come to the conclusion that this is an intended effect. All the things you feel compelled to do for the good of others, in an ordinarily altruistic sense, are essentially made impossible unless you accept that your works and your expressions will be repackaged, sold, and absorbed into commercialised datasets.

The SoaD line "manufacturing consent is the name of the game" has been in my head a lot lately.

Mark T. Tomczak

@gsuberland @mcc One almost wonders if the end-game is to stop pulling and try pushing.

Maybe instead of trying to claw back data we've made publicly crawlable because "I wanted it visible, but not like that" we ask why any of these companies get to keep their data proprietary when it's built on ours?

Would people be more okay with all of this if the rule were "You can build a trained model off of publicly-available data, but that model must itself be publicly-available?"

margot

@mcc have we considered starting a secret society with arcane rites devoted to preserving and protecting open source code

Mark T. Tomczak

@emaytch @mcc So there's a lot of stuff that Paul Graham says that I don't agree with (these days; used to be pretty bought in), but I think the point he made about the nature of copyright and patent protection ages ago rings true.

Paraphrasing without citation because I'm not going to go crawling around to find it right now: the alternative to IP protection isn't a magical utopia of shared ideas... It's guilds and secret knowledge protected with violence. We already tried society without intellectual property protection.

@emaytch @mcc So there's a lot of stuff that Paul Graham says that I don't agree with (these days; used to be pretty bought in), but I think the point he made about the nature of copyright and patent protection ages ago rings true.

Paraphrasing without citation because I'm not going to go crawling around to find it right now: the alternative to IP protection isn't a magical utopia of shared ideas... It's guilds and secret knowledge protected with violence. We already tried society without intellectual...

Aedius Filmania ⚙️🎮🖊️

@mcc

Please don't opt out all your repositories, leave the ones that didn't work or didnt compile or are full of security hole.

josh

@mcc i feel like we need llm opt out considerations in foss licenses tbh, then host code off github and nothing changes? Hard to enforce idk unlikely politicians will get it right, maybe the ftc will get lucky?

mcc

@josh I don't like this because (1) it means GPL2 is dead, and (2) it feels like admitting that an AI opt-out is something we specifically needed. Meanwhile, machine transformation of my work is something I generally want, I just want the license to be observed.

josh

@mcc yeah plus if the opt out for the stack stands it means they got everything in the past at least once so anyone with every version of it can combine them to get all the old stuff anyways. I HOPE that someone can get lucky and stop companies from shittifying everything, but it does kinda feel like this is the break in case of emergency that the clause in the gpl about adhering to future versions was made for

clacke: looking for something 🇸🇪🇭🇰💙💛
@josh @mcc Either copyright doesn't apply and then whatever you put in your license doesn't matter, or copyright does apply and then the existing copyleft licenses are enough.
datarama

@mcc That's also basically how LAION made the dataset for Stable Diffusion. They collected a bunch of links to images with descriptive alt-text.

(Are you taking time to write good alt-text because you respect disabled people? Congratulations, your good work is being exploited by the worst assholes in tech. Silicon Valley never lets a good deed go unpunished.)

Glyph

@mcc Did copyleft licenses ever meaningfully restrict the behavior of large corporations? Licenses are effectively a statement of intent with respect to future litigation, and if the copyright holder is not willing or able to actually *perform* that litigation, everyone gradually understands that this is a Mexican standoff where one side's guns aren't loaded.

mcc

@glyph Redhat establishing that its GPLed software is no longer available under the GPL seems to indicate the answer to your question is "no".

Glyph

@mcc IIRC, Sony did it much earlier. I cannot even find any record of this, but as I recall, Sony distributed a modified version of GCC as part of their early Playstation SDKs, in a way which clearly violated the GPL. FSF found out somehow, and the result was just that Sony said "oops, our bad, we forgot to contractually forbid members of our SDK program from talking to you" and then later switched to LLVM.

Glyph

@mcc Searching for this today only finds stories about the GPL code included in the Sony-published game ICO, which also didn't result in litigation.

henriquelalves

@mcc one thing that gives me hope is that we are reaching an inflection point on internet curatorship, in which AI is so pervasive that you have to actually dig into internet archives to find valuable information.

This gives me the impression that we are looking into a future that yes, AI will be very pervasive, but niche communities built on trust and self-curatorship (stuff like Web Rings) will be more common. Users will look for stuff written by humans, not AI.

Morten Hilker-Skaaning

@mcc you'd think big companies need both copyright and IP ownership. Otherwise they'd just keep stealing each others work once the free content was exhausted...

Carlos Solís
@mcc I've already seen the general public considering the concepts of copyleft and open-source as "failed", and clamor to return to a world where every commit and transfer of source code is approved by, and paid to, the original author.
Megan Fox

@mcc I'm waiting for the programmer equivalent of nightshade.

"This is a perfectly functional library for X, I've just made it delete your entire C drive, but DON'T WORRY you can remove the offending code in file ev6.cpp, line 45"

Irenes (many)

@mcc you're right to flag that, for sure

Leon

@mcc i've been working on a pretty chunky article about this but i think there's a massive disconnect between what people think public licences say and what they actually say

the former is 'this is to contribute back to my community of likeminded nerds who supported me as long as they share the wealth tee hee’

the latter is ‘i am permanently and publicly donating this to literally anyone who wants to murder me or anyone else with it with my blessing and without recourse, provided they share the modifications they made with the rest of the murderer community and display my name prominently’

@mcc i've been working on a pretty chunky article about this but i think there's a massive disconnect between what people think public licences say and what they actually say

the former is 'this is to contribute back to my community of likeminded nerds who supported me as long as they share the wealth tee hee’

May Likes Toronto

@mcc I feel like the only way we win is if we poison all of the training data with Disney trademarks.

bignose

@mcc
> Now OpenAI has made a world where rules and licenses don't apply to any company with a valuation over $N billion dollars.

This happens even in the face of *explicitly* setting license conditions to prevent exploitation.

#StackOverflow owners decided to hand over all the freely-given community contributions to #OpenAI, over the long and loud protests of that community. And, IMO, in violation of the #CreativeCommons #Attribution #ShareAlike license on all those contributions.

Apparently, with enough money on offer, #copyright violation doesn't matter.

theverge.com/2024/5/8/24151906

@mcc
> Now OpenAI has made a world where rules and licenses don't apply to any company with a valuation over $N billion dollars.

This happens even in the face of *explicitly* setting license conditions to prevent exploitation.

#StackOverflow owners decided to hand over all the freely-given community contributions to #OpenAI, over the long and loud protests of that community. And, IMO, in violation of the #CreativeCommons #Attribution #ShareAlike license on all those contributions.

Go Up