Email or username:

Password:

Forgot your password?
Top-level
Duco

@yabellini SciHub makes papers public that are behind paywalls. I agree, that they shouldn't be behind paywalls, but it's completely different to OpenAI.
I think they used mostly sources that are public anyway, like Wikipedia, etc. They also didn't publish them but trained an AI with it, that creates new texts. So they did a remix in a way. Remixes are handled differently in copyright law.
"The corpus [GPT-2] was trained on, […] 40 [GB] of text from URLs shared in Reddit" en.wikipedia.org/wiki/OpenAI

8 comments
Duco

@yabellini I can not read the article as it's behind a paywall and the other document is 69 pages long. I will not read that. If you want to say something with it, say waht you want to say. Depending on what you will say, I will think about if I want to check that with the provided sources or not.

Skylarking Mullet

@duco @yabellini We can bypass paywalls by prepending "archive .is/" to the URL.

archive.is/YOFMJ

Duco

@yabellini @skylarkingmullet at least under German law, the author of any text has the rights on it. Every Wikipedia article has authors with copyright on it. But they licenced it under a free licence, so everyone can use it. So as every text written by a human, every photo taken by a human and every image painted by a human is copyrighted, OpenAI is correct, that they can not train the AI without that. That does not mean, that the texts are behind a paywall.

ink

@duco @yabellini @skylarkingmullet That's inaccurate. When you write on wikipedia you release any rights, you write under CC00, wikimedia however has CC 4.0 BY-SA on all wikipedia content.

Duco

@skylarkingmullet @yabellini so they sued OpenAI. Well people sued government for legislation of masks against Corona. Just because someone sues someone doesn't mean they are right. Let's wait for what the judges say. The second part seems to be about data protection, not copyright.

Go Up