Email or username:

Password:

Forgot your password?
Top-level
mcc

Like, heck, how am I *supposed* to rely on my code getting preserved after I lose interest, I die, BitBucket deletes every bit of Mercurial-hosted content it ever hosted, etc? Am I supposed to rely on *Microsoft* to responsibly preserve my work? Holy crud no.

We *want* people to want their code widely mirrored and distributed. That was the reason for the licenses. That was the social contract. But if machine learning means the social contract is dead, why would people want their code mirrored?

31 comments
Graham Sutherland / Polynomial

@mcc I have generally come to the conclusion that this is an intended effect. All the things you feel compelled to do for the good of others, in an ordinarily altruistic sense, are essentially made impossible unless you accept that your works and your expressions will be repackaged, sold, and absorbed into commercialised datasets.

The SoaD line "manufacturing consent is the name of the game" has been in my head a lot lately.

Mark T. Tomczak

@gsuberland @mcc One almost wonders if the end-game is to stop pulling and try pushing.

Maybe instead of trying to claw back data we've made publicly crawlable because "I wanted it visible, but not like that" we ask why any of these companies get to keep their data proprietary when it's built on ours?

Would people be more okay with all of this if the rule were "You can build a trained model off of publicly-available data, but that model must itself be publicly-available?"

mcc replied to Mark T. Tomczak

@mark @gsuberland In my opinion, a trapdoor like "okay, well if copyright doesn't apply to the training data you stole, your model isn't copyrightable either" is no good. The US Gov has already said GenAI images and text are not copyrightable. It doesn't help. The thing about generative AI is it inherently takes heavy computational resources (disk space, CPU time, often-unacknowledged low-wage tagging work). Therefore, as a tool, it is inherently biased toward capital and away from individuals.

mcc replied to mcc

@mark @gsuberland If we say "AI is a new class of thing that is outside the copyright regime entirely", that is not a level playing field. The tool is designed in a way it inherently serves the powerful. "Machine learning models are inherently open" is the exact model I am afraid of— a world where copyright is something that applies to actors who have less than some specific amount of money, and anyone with more than that specific amount of money is liberated from it.

Graham Sutherland / Polynomial replied to mcc

@mcc @mark yes. the only real push back solution that levels the playing field would be to say that you are not allowed to unilaterally make money off it, which essentially just falls back to enforcing copyright law against the rich, which... yeah, exactly the problem.

jbaggs replied to mcc

@mcc @mark @gsuberland Every x number of years we get business people trying to circumvent law by claiming the old laws don't apply, because computers.

datarama replied to mcc

@mcc @mark @gsuberland Exactly.

Even if, say, GPT-4 wasn't covered by copyright, so what? Even if you could get it out of OpenAI's data centres in the first place, you couldn't run it with reasonable performance. And you *certainly* couldn't retrain it.

Oblomov replied to datarama

@datarama @mcc @mark @gsuberland there is one upside to forcing these models to be open and it's that it removes one of the, of not the primary, incentives in developing them in the first place. Yes, they could still sell its execution as a service, but if they lose control of the model itself, it becomes a considerably less profitable endeavor.

datarama replied to Oblomov

@oblomov @mcc @mark @gsuberland How, though?

Let's say that tomorrow, a judge rules that GPT-4 is not covered by copyright. What has actually changed? OpenAI isn't compelled to share it with anyone, and it's too big for anyone except large and wealthy corporations to actually do anything with.

Sure, you couldn't get sued if you got a bittorrent of it somehow. But you're not getting a bittorrent of a 1.76 trillion parameter neural network anyway.

Graham Sutherland / Polynomial replied to datarama

@datarama @oblomov @mcc @mark and you sure as shit can't afford a whole rack of H200 cards to make use of it, even if you and all your friends pitch in. it's only useful with people who have the capital to wield it.

crzwdjk ✅ replied to datarama

@datarama @oblomov @mcc @mark @gsuberland 1.76 trillion parameters is about a hard drive's worth of data, no?

datarama replied to crzwdjk ✅

@crzwdjk @oblomov @mcc @mark @gsuberland It is, but that's *still* beside the point. You can't actually do anything with it unless you have the resources of a large corporation.

And my other point was that just because it isn't copyrighted, they can still keep it secret.

datarama

@gsuberland @mcc This isn't why the AI craze has made me anxious, but it *is* why I have become terribly depressed.

I like writing code and making various weird computer programs, and sharing them with people for mutual entertainment and occasional enlightenment. Now I can't do that without accepting that everything I do will be appropriated and commoditized by some of the most horrible people in tech, unless I do it in secret.

And then what's the point?

asmaloney (Andy) 🌎 replied to datarama

@datarama @gsuberland @mcc > And then what's the point?

💯 I'm feeling exactly the same way and I'm really struggling with it.

Not just code but blog posts/tutorials as well. I've "lost" my main creative outlets.

datarama replied to asmaloney (Andy) 🌎

@asmaloney @gsuberland @mcc That's where I'm at too.

And I have never been as depressed as I have this last year. For every other awful period in my life, I always had creative computer things to fall back on - literally, that has been how I kept from going too crazy in the entire story from "tiny bullied autistic kid" to "middle-aged guy holed up all alone during a pandemic". There was always coding and writing.

datarama replied to datarama

@asmaloney @gsuberland @mcc Coding feels especially meaningless now. I try to convince myself that even after we all get fired and replaced with shitty AI, we could still do it for fun - but it's not fun when you know all you're really doing is providing more free training data for the same assholes who are actively working to destroy your life.

asmaloney (Andy) 🌎 replied to datarama

@datarama "middle-aged guy holed up all alone during a pandemic"

I feel seen (as the kids say these days). 😆

datarama replied to asmaloney (Andy) 🌎

@asmaloney I sometimes think about how much that particular experience has coloured the rest of my experience of this bleak, bleak decade. I sat at home with nearly no social contact for 1½ years (except what came in through Teams), and even if I'm a bit of an introvert, I'm sure it made me a bit crazy.

asmaloney (Andy) 🌎 replied to datarama

@datarama Me too. Really struggling to "dig out" of that and then all this other shit ("AI", wars, climate, politics, layoffs for even more profit, the shoddy state of software in general, etc.) just piles on.

I think we're very much in the same situation, so you aren't alone. I hope you find some peace or at least some outlet to move things in a positive direction.

I'm still lookin'... 😀

datarama replied to asmaloney (Andy) 🌎

@asmaloney I've been looking for a long, long time too. And I don't know the way out.

Every crisis is immediately followed by the next, without any of them being resolved. I am so tired.

margot

@mcc have we considered starting a secret society with arcane rites devoted to preserving and protecting open source code

Mark T. Tomczak

@emaytch @mcc So there's a lot of stuff that Paul Graham says that I don't agree with (these days; used to be pretty bought in), but I think the point he made about the nature of copyright and patent protection ages ago rings true.

Paraphrasing without citation because I'm not going to go crawling around to find it right now: the alternative to IP protection isn't a magical utopia of shared ideas... It's guilds and secret knowledge protected with violence. We already tried society without intellectual property protection.

@emaytch @mcc So there's a lot of stuff that Paul Graham says that I don't agree with (these days; used to be pretty bought in), but I think the point he made about the nature of copyright and patent protection ages ago rings true.

Paraphrasing without citation because I'm not going to go crawling around to find it right now: the alternative to IP protection isn't a magical utopia of shared ideas... It's guilds and secret knowledge protected with violence. We already tried society without intellectual...

✧✦✶✷Catherine✷✶✦✧ replied to Mark T. Tomczak

@mark @emaytch @mcc if this was true I could get documentation for any of the ASICs Broadcom sells and I can't

Peter Linss

@emaytch @mcc where each member chooses a repo to memorize. At the secret meetings in the woods we take turns reciting them back to each other…

StaringAtClouds

@emaytch @mcc Ossiris

Just 'cos it sounds fun & it's got OSS in it

Sorry it's a bit late here & brain isn't up to working out a proper acronym

bob

@mcc there would only be a cost to you as an open source author if LLM code generation worked, though

mcc

@bob Depends on what "works" means. I believe that LLMs are capable of substantially reproducing entire paragraphs, code functions or images from their training set under circumstances where the origin is not disclosed or easily traced back.

bob replied to mcc

@mcc in a world where people were already copy/pasting from stackoverflow all day does that make a difference?

Go Up