@yabellini SciHub makes papers public that are behind...

Yani Bellini Saibene's posts Post Back to profile

@yabellini SciHub makes papers public that are behind paywalls. I agree, that they shouldn't be behind paywalls, but it's completely different to OpenAI.
I think they used mostly sources that are public anyway, like Wikipedia, etc. They also didn't publish them but trained an AI with it, that creates new texts. So they did a remix in a way. Remixes are handled differently in copyright law.
"The corpus [GPT-2] was trained on, […] 40 [GB] of text from URLs shared in Reddit" https://en.wikipedia.org/wiki/OpenAI

Like 10 January at 0:06 | Wall-to-wall | Open on norden.social

8 comments

Yani Bellini Saibene

@duco

https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html#:~:text=1.3k-,The%20Times%20Sues%20OpenAI%20and%20Microsoft%20Over%20A.I.,with%20it%2C%20the%20lawsuit%20said.

I recommend reading the lawsuit, it was not only written by lawyers who know the law but it is also very clear:

https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf

10 January at 0:15 | Open on fosstodon.org

Duco

@yabellini I can not read the article as it's behind a paywall and the other document is 69 pages long. I will not read that. If you want to say something with it, say waht you want to say. Depending on what you will say, I will think about if I want to check that with the provided sources or not.

10 January at 0:34 | Open on norden.social

Skylarking Mullet

@duco @yabellini We can bypass paywalls by prepending "archive .is/" to the URL.

https://archive.is/YOFMJ

10 January at 1:42 | Open on piaille.fr

Skylarking Mullet

@duco @yabellini

OpenAI has been sued for copyright infringement by authors Sarah Silverman, Matthew Butterick, Paul Tremblay and Mona Awad in July 2023.[214][215][216] The New York Times has also envisaged a lawsuit which it followed through with in late December 2023.[215][217] In September 2023, 17 authors, including George R. R. Martin, John Grisham, Jodi Picoult and Jonathan Franzen, joined the Authors Guild in filing a class action lawsuit against OpenAI, alleging that the company's technology was illegally using their copyrighted work.[218][219]

OpenAI has been sued for violating EU General Data Protection Regulations in August 2023.[220][221] In April 2023, the EU's European Data Protection Board (EDPB) formed a dedicated task force on ChatGPT "to foster cooperation and to exchange information on possible enforcement actions conducted by data protection authorities" based on the "enforcement action undertaken by the Italian data protection authority against Open AI about the Chat GPT service".[222]

10 January at 0:17 | Open on piaille.fr

Yani Bellini Saibene

@skylarkingmullet @duco And here is when they also say it’s “impossible” to create useful AI models without copyrighted material"

https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/

10 January at 0:22 | Open on fosstodon.org

Duco

@yabellini @skylarkingmullet at least under German law, the author of any text has the rights on it. Every Wikipedia article has authors with copyright on it. But they licenced it under a free licence, so everyone can use it. So as every text written by a human, every photo taken by a human and every image painted by a human is copyrighted, OpenAI is correct, that they can not train the AI without that. That does not mean, that the texts are behind a paywall.

10 January at 0:47 | Open on norden.social

ink

@duco @yabellini @skylarkingmullet That's inaccurate. When you write on wikipedia you release any rights, you write under CC00, wikimedia however has CC 4.0 BY-SA on all wikipedia content.

10 January at 10:14 | Open on gts.turtle.garden

Duco

@skylarkingmullet @yabellini so they sued OpenAI. Well people sued government for legislation of masks against Corona. Just because someone sues someone doesn't mean they are right. Let's wait for what the judges say. The second part seems to be about data protection, not copyright.

10 January at 0:40 | Open on norden.social