Email or username:

Password:

Forgot your password?
Top-level
TheDcoder

@ascendo @Codeberg AFAIK GitHub requires all public projects to be under a "forkable" license, so technically they're all free range.

I don't think the dataset can generate verbatim code samples from those projects... if it is doing something like that, then it's clearly theft (since it does not attribute).

I also get that it's somewhat unfair for them to be making off of this, but it is also one of the beautiful things about FOSS, you can make money and sustain from it!

4 comments
YurkshireLad

@TheDcoder @ascendo @Codeberg you can probably fork any open source project but that doesn't mean you can use the code in any way you choose. You still have to follow the original license.

TheDcoder

@YurkshireLad @ascendo @Codeberg Well, it depends on where you draw the line for "use", the AI does not execute the code or redistribute it in a meaningful way (aside from generating output which looks like that code plus the other trillion lines of code it trained on).

The same argument can be used for general-purpose GPT training sets as well, they just train on all text scoured from the web regardless what who owns the text.

ascendo

@TheDcoder @YurkshireLad @Codeberg I mean, there is a small chance that chatGPT returns 100% of your code to somebody else. Google itself ordered recently that their employers should not paste text in their own LLM due to this (and other reasons).

TheDcoder

@ascendo @YurkshireLad @Codeberg I guess that's possible with ChatGPT, I don't really know how they work so I'm hitting my knowledge barrier here.

Go Up