@gsuberland @mcc One almost wonders if the end-game...

@gsuberland @mcc One almost wonders if the end-game is to stop pulling and try pushing.

Maybe instead of trying to claw back data we've made publicly crawlable because "I wanted it visible, but not like that" we ask why any of these companies get to keep their data proprietary when it's built on ours?

Would people be more okay with all of this if the rule were "You can build a trained model off of publicly-available data, but that model must itself be publicly-available?"

Like 3 April at 20:53 | Wall-to-wall | Open on mastodon.fixermark.com

10 comments

mcc replied to Mark T. Tomczak

@mark @gsuberland In my opinion, a trapdoor like "okay, well if copyright doesn't apply to the training data you stole, your model isn't copyrightable either" is no good. The US Gov has already said GenAI images and text are not copyrightable. It doesn't help. The thing about generative AI is it inherently takes heavy computational resources (disk space, CPU time, often-unacknowledged low-wage tagging work). Therefore, as a tool, it is inherently biased toward capital and away from individuals.

3 April at 20:59 | Open on mastodon.social

mcc replied to mcc

@mark @gsuberland If we say "AI is a new class of thing that is outside the copyright regime entirely", that is not a level playing field. The tool is designed in a way it inherently serves the powerful. "Machine learning models are inherently open" is the exact model I am afraid of— a world where copyright is something that applies to actors who have less than some specific amount of money, and anyone with more than that specific amount of money is liberated from it.

3 April at 20:59 | Open on mastodon.social

Graham Spookyland🎃/Polynomial replied to mcc

@mcc @mark yes. the only real push back solution that levels the playing field would be to say that you are not allowed to unilaterally make money off it, which essentially just falls back to enforcing copyright law against the rich, which... yeah, exactly the problem.

3 April at 21:04 | Open on chaos.social

jbaggs replied to mcc

@mcc @mark @gsuberland Every x number of years we get business people trying to circumvent law by claiming the old laws don't apply, because computers.

3 April at 21:10 | Open on infosec.exchange

datarama replied to mcc

@mcc @mark @gsuberland Exactly.

Even if, say, GPT-4 wasn't covered by copyright, so what? Even if you could get it out of OpenAI's data centres in the first place, you couldn't run it with reasonable performance. And you *certainly* couldn't retrain it.

3 April at 21:15 | Open on hachyderm.io

Oblomov replied to datarama

@datarama @mcc @mark @gsuberland there is one upside to forcing these models to be open and it's that it removes one of the, of not the primary, incentives in developing them in the first place. Yes, they could still sell its execution as a service, but if they lose control of the model itself, it becomes a considerably less profitable endeavor.

3 April at 21:23 | Open on sociale.network

datarama replied to Oblomov

@oblomov @mcc @mark @gsuberland How, though?

Let's say that tomorrow, a judge rules that GPT-4 is not covered by copyright. What has actually changed? OpenAI isn't compelled to share it with anyone, and it's too big for anyone except large and wealthy corporations to actually do anything with.

Sure, you couldn't get sued if you got a bittorrent of it somehow. But you're not getting a bittorrent of a 1.76 trillion parameter neural network anyway.

3 April at 21:28 | Open on hachyderm.io

Graham Spookyland🎃/Polynomial replied to datarama

@datarama @oblomov @mcc @mark and you sure as shit can't afford a whole rack of H200 cards to make use of it, even if you and all your friends pitch in. it's only useful with people who have the capital to wield it.

3 April at 21:31 | Open on chaos.social

crzwdjk ✅ replied to datarama

@datarama @oblomov @mcc @mark @gsuberland 1.76 trillion parameters is about a hard drive's worth of data, no?

4 April at 0:23 | Open on mastodon.social

datarama replied to crzwdjk ✅

@crzwdjk @oblomov @mcc @mark @gsuberland It is, but that's *still* beside the point. You can't actually do anything with it unless you have the resources of a large corporation.

And my other point was that just because it isn't copyrighted, they can still keep it secret.

4 April at 5:33 | Open on hachyderm.io