(The exact value of "N" is not known yet; I assume...

mcc's posts Post Back to profile

Top-level

mcc

(The exact value of "N" is not known yet; I assume it will be solidly fixed by some upcoming court case.)

Like 3 April at 20:15 | Open on mastodon.social

126 comments

mcc

In a world where copyleft licenses turn out to restrict only the small actors they were meant to empower, and don't apply to big bad-actor "AI" companies, what is the incentive to put your work out under a license that will only serve to make it a target for "AI" scraping?

With NFTs, we saw people taking their work private because putting something behind a clickwall/paywall was the only way to not be stolen for NFTs. I assume the same process will accelerate in an "AI" world.

3 April at 20:18 | Open on mastodon.social

pinkdrunkenelephants

@mcc They should just make a license that explicitly bans AI usage then.

3 April at 20:21 | Open on mastodon.social

mcc

@pinkdrunkenelephants The licenses already bar this because they govern derivative works. If you can make the derivative work non-derivative by defining it "AI", then if we add a nonsense clause banning "AI", the AI companies can simply rename "AI" to "floopleflorp" and say "Ah, but your license only bans 'AI'— it doesn't ban 'floopleflorp'!"

3 April at 20:22 | Open on mastodon.social

pinkdrunkenelephants

@mcc They can rename it Clancy for all it matters. AI is still AI and actions don't just lose meaning because of evil people playing with language.

3 April at 20:24 | Open on mastodon.social

mcc

@pinkdrunkenelephants But AI is not AI. The things that they're calling "AI" are just some machine learning statistical models. Ten years ago this wouldn't have been considered "AI".

3 April at 20:24 | Open on mastodon.social

pinkdrunkenelephants

@mcc Doesn't matter, what matters is the definition behind the word. That is what licenses ought to ban outright.

It's like saying rape is perfectly legal so long as we call it forced sex. Who would believe that that wasn't already predisposed to rape?

Don't fall for other people's manipulative mindgames?

3 April at 20:26 | Open on mastodon.social

mcc

@pinkdrunkenelephants The definition behind the law is, again, decided by humans, who are capable of inconsistency or poor decisions. Rape is legal in New York because rape there is legally defined by the specific use of certain specific genitals. See E. Jean Carroll v. Donald J. Trump

3 April at 20:28 | Open on mastodon.social

pinkdrunkenelephants replied to mcc

@mcc And no one accepts that because of what I'm saying. A rose by any other name would smell as sweet. People need to start recognizing that fact. That's the only way things will change.

3 April at 20:30 | Open on mastodon.social

mcc replied to pinkdrunkenelephants

@pinkdrunkenelephants Well, per my belief as to the meaning of words, ML statistical models are derivative works like any other, and my licenses which place restrictions on derivative works already apply to the ML statistical models

3 April at 20:32 | Open on mastodon.social

pinkdrunkenelephants replied to mcc

@mcc The situation is sad all-around.

3 April at 20:32 | Open on mastodon.social

Show 1 more reply

datarama

@pinkdrunkenelephants @mcc That doesn't work if copyright *itself* doesn't apply to AI training, which is what all those court cases are about. Licenses start from the assumption that the copyright holder reserves all rights, and then the license explicitly waives some of those rights under a set of given conditions.

But with AI, it's up in the air whether a copyright holder has any rights at all.

3 April at 20:23 | Open on hachyderm.io

pinkdrunkenelephants

@datarama @mcc I don't see how it would be up in the air. Humans feed that data into AI and use the churned remains so it's still a human violating the copyright.

3 April at 20:24 | Open on mastodon.social

mcc

@pinkdrunkenelephants @datarama Because humans also are the ones who interpret and enforce laws and if the government does not enforce copyright against companies which market their products as "AI", then copyright does not apply to those companies.

3 April at 20:25 | Open on mastodon.social

pinkdrunkenelephants

@mcc @datarama I guess that's more of a bribery problem than a legal precedent one, then.

3 April at 20:26 | Open on mastodon.social

datarama

@pinkdrunkenelephants @mcc In the EU, there actually is some legislation. Copyright explicitly *doesn't* protect works from being used in machine learning for academic research, but ML training for commercial products must respect a "machine-readable opt-out".

But that's easy enough to get around. That's why eg. Stability funded an "independent research lab" who did the actual data gathering for them.

3 April at 20:29 | Open on hachyderm.io

mcc replied to datarama

@datarama I consider this illegitimate and fundamentally unfair because I have already released large amounts of work under creative commons/open source licenses. I can't retroactively add terms to some of them because the plain language somehow no longer applies. If I add such opt-outs now, it would be like I'm admitting the licenses previously didn't apply to statistics-based derivative works

3 April at 20:31 | Open on mastodon.social

datarama replied to mcc

@mcc I consider it illegitimate and fundamentally unfair because it's opt-out.

3 April at 20:33 | Open on hachyderm.io

pinkdrunkenelephants replied to datarama

@datarama @mcc I wonder why it is people don't just revolt and destroy their servers then. Or drag them into jail.

Why do people delude themselves into accepting atrocities?

3 April at 20:31 | Open on mastodon.social

datarama replied to pinkdrunkenelephants

@pinkdrunkenelephants @mcc I think if there was a simple clear-cut answer to that, the world would be a *very* different place.

3 April at 20:36 | Open on hachyderm.io

Show 1 more reply

datarama

@mcc There is no such incentive. There is a very, very strong incentive (namely, not wanting to empower the worst scumbags in tech) to *not* share your work publicly anymore.

This, to me, is the most harmful effect so far of generative AI.

3 April at 20:21 | Open on hachyderm.io

Graham Spookyland🎃/Polynomial

@mcc it's kinda gross that the only (current) way to meaningfully and tangibly refuse to be exploited by the mass commercialised theft of the commons is to, well, commercialise the commons.

3 April at 20:22 | Open on chaos.social

Graham Spookyland🎃/Polynomial

@mcc although if there's an angstrom-thick silver lining to this whole thing, it's that it has proved incontrovertibly that copyright law was only ever intended to be used as a cudgel by the wealthy and powerful, and never to protect the rights of the individual artist.

3 April at 20:25 | Open on chaos.social

Morten Grøftehauge

@gsuberland @mcc Single atom silver-lining.

3 April at 20:28 | Open on sigmoid.social

Graham Spookyland🎃/Polynomial

@drgroftehauge @mcc yes, an angstrom.

3 April at 20:32 | Open on chaos.social

Hugo Mills

@gsuberland @mcc The artists occasionally tried using the cudgel, but the opponents brought an AK47 to the courtroom...

3 April at 20:29 | Open on mstdn.social

Markus Hofer

@mcc maybe we've all been wrong about NFTs and it's the future after all? 😉

3 April at 20:22 | Open on mastodon.gamedev.place

margot

@mcc im wondering if the broader art and design worlds will end up in a similar situation to where industries like fashion and jewelry already are, where plagarism is essentially an expectation for the designers working there

3 April at 20:24 | Open on mastodon.social

margot

@mcc i guess this is less a solution or an endgame so much as a window into an area where copyright has been hurting small creators while being completely flaunted by others worth multi-billions

3 April at 20:26 | Open on mastodon.social

past oral no mad

@mcc Literally zero. I have a thing I've been hacking on for a while, niche shit, probably not interesting to many others. I was planning on releasing it, but once I realized it'd probably have 0-1 other human users, but end up in every LLM training set, I decided not to.

3 April at 20:26 | Open on retro.social

@mcc "the legal system is ultimately a weapon wielded by those with more capital against those with less" is of course the punchline after every movement that has tried to use legal mechanisms like licenses to enact social change. it'd be nice if there were some deep pan-institutional awareness of and correction for this.

3 April at 20:35 | Open on mastodon.social

mcc

Did you see this? The whole thing with "the stack".

https://post.lurk.org/@emenel/112111014479288871

Some jerks did mass scraping of open source projects, putting them in a collection called "the stack" which they specifically recommend other people use as machine learning sources. If you look at their "Github opt-out repository" you'll find just page after page of people asking to have their stuff removed:

https://github.com/bigcode-project/opt-out-v2/issues

(1/2)

3 April at 20:36 | Open on mastodon.social

mcc

…but wait! If you look at what they actually did (correct me if I'm wrong), they aren't actually doing any machine learning in the "stack" repo itself. The "stack" just collects zillions of repos in one place. Mirroring my content as part of a corpus of open source software, torrenting it, putting it on microfilm in a seedbank is the kind of thing I want to encourage. The problem becomes that they then *suggest* people create derivative works of those repos in contravention of the license. (2/2)

3 April at 20:36 | Open on mastodon.social

mcc

So… what is happening here? All these people are opting out of having their content recorded as part of a corpus of open source code. And I'll probably do the same, because "The Stack" is falsely implying people have permission to use it for ML training. But this means "The Stack" has put a knife in the heart of publicly archiving open source code at all. Future attempts to preserve OSS code will, if they base themselves on "the stack", not have any of those opted-out repositories to draw from.

3 April at 20:39 | Open on mastodon.social

mcc

Like, heck, how am I *supposed* to rely on my code getting preserved after I lose interest, I die, BitBucket deletes every bit of Mercurial-hosted content it ever hosted, etc? Am I supposed to rely on *Microsoft* to responsibly preserve my work? Holy crud no.

We *want* people to want their code widely mirrored and distributed. That was the reason for the licenses. That was the social contract. But if machine learning means the social contract is dead, why would people want their code mirrored?

3 April at 20:39 | Open on mastodon.social

Graham Spookyland🎃/Polynomial

@mcc I have generally come to the conclusion that this is an intended effect. All the things you feel compelled to do for the good of others, in an ordinarily altruistic sense, are essentially made impossible unless you accept that your works and your expressions will be repackaged, sold, and absorbed into commercialised datasets.

The SoaD line "manufacturing consent is the name of the game" has been in my head a lot lately.

3 April at 20:48 | Open on chaos.social

Mark T. Tomczak

@gsuberland @mcc One almost wonders if the end-game is to stop pulling and try pushing.

Maybe instead of trying to claw back data we've made publicly crawlable because "I wanted it visible, but not like that" we ask why any of these companies get to keep their data proprietary when it's built on ours?

Would people be more okay with all of this if the rule were "You can build a trained model off of publicly-available data, but that model must itself be publicly-available?"

3 April at 20:53 | Open on mastodon.fixermark.com

Show 10 replies

Show 8 more replies

margot

@mcc have we considered starting a secret society with arcane rites devoted to preserving and protecting open source code

3 April at 20:49 | Open on mastodon.social

Hugo Mills

@emaytch @mcc I propose "The IlluminFTP".

3 April at 20:51 | Open on mstdn.social

Mark T. Tomczak

@emaytch @mcc So there's a lot of stuff that Paul Graham says that I don't agree with (these days; used to be pretty bought in), but I think the point he made about the nature of copyright and patent protection ages ago rings true.

Paraphrasing without citation because I'm not going to go crawling around to find it right now: the alternative to IP protection isn't a magical utopia of shared ideas... It's guilds and secret knowledge protected with violence. We already tried society without intellectual property protection.

Expand text...

3 April at 20:55 | Open on mastodon.fixermark.com

Show 1 reply

Foone🏳️‍⚧️

@emaytch @mcc why would we need two of those?

3 April at 20:57 | Open on digipres.club

Show 3 more replies

Aedius Filmania ⚙️🎮🖊️

@mcc

Please don't opt out all your repositories, leave the ones that didn't work or didnt compile or are full of security hole.

3 April at 20:40 | Open on lavraievie.social

josh

@mcc i feel like we need llm opt out considerations in foss licenses tbh, then host code off github and nothing changes? Hard to enforce idk unlikely politicians will get it right, maybe the ftc will get lucky?

3 April at 20:42 | Open on wetdry.world

mcc

@josh I don't like this because (1) it means GPL2 is dead, and (2) it feels like admitting that an AI opt-out is something we specifically needed. Meanwhile, machine transformation of my work is something I generally want, I just want the license to be observed.

3 April at 20:43 | Open on mastodon.social

josh

@mcc yeah plus if the opt out for the stack stands it means they got everything in the past at least once so anyone with every version of it can combine them to get all the old stuff anyways. I HOPE that someone can get lucky and stop companies from shittifying everything, but it does kinda feel like this is the break in case of emergency that the clause in the gpl about adhering to future versions was made for

3 April at 20:48 | Open on wetdry.world

clacke: looking for something 🇸🇪🇭🇰💙💛

@josh @mcc Either copyright doesn't apply and then whatever you put in your license doesn't matter, or copyright does apply and then the existing copyleft licenses are enough.

3 April at 20:47 | Open on libranet.de

Show 1 more reply

datarama

@mcc That's also basically how LAION made the dataset for Stable Diffusion. They collected a bunch of links to images with descriptive alt-text.

(Are you taking time to write good alt-text because you respect disabled people? Congratulations, your good work is being exploited by the worst assholes in tech. Silicon Valley never lets a good deed go unpunished.)

3 April at 20:39 | Open on hachyderm.io

Show 2 more replies

Glyph

@mcc Did copyleft licenses ever meaningfully restrict the behavior of large corporations? Licenses are effectively a statement of intent with respect to future litigation, and if the copyright holder is not willing or able to actually *perform* that litigation, everyone gradually understands that this is a Mexican standoff where one side's guns aren't loaded.

3 April at 20:51 | Open on mastodon.social

mcc

@glyph Redhat establishing that its GPLed software is no longer available under the GPL seems to indicate the answer to your question is "no".

3 April at 20:51 | Open on mastodon.social

Glyph

@mcc IIRC, Sony did it much earlier. I cannot even find any record of this, but as I recall, Sony distributed a modified version of GCC as part of their early Playstation SDKs, in a way which clearly violated the GPL. FSF found out somehow, and the result was just that Sony said "oops, our bad, we forgot to contractually forbid members of our SDK program from talking to you" and then later switched to LLVM.

3 April at 20:55 | Open on mastodon.social

Glyph

@mcc Searching for this today only finds stories about the GPL code included in the Sony-published game ICO, which also didn't result in litigation.

3 April at 20:56 | Open on mastodon.social

Show 26 replies

Show 1 more reply

Show 7 more replies

Show 8 more replies

henriquelalves

@mcc one thing that gives me hope is that we are reaching an inflection point on internet curatorship, in which AI is so pervasive that you have to actually dig into internet archives to find valuable information.

This gives me the impression that we are looking into a future that yes, AI will be very pervasive, but niche communities built on trust and self-curatorship (stuff like Web Rings) will be more common. Users will look for stuff written by humans, not AI.

3 April at 20:54 | Open on mastodon.gamedev.place

Show 3 more replies

Go Up