Email or username:

Password:

Forgot your password?
Top-level
TheDcoder

@Codeberg #hottake time, I do not oppose GitHub for using open-source code for training their AI co-pilot, after all it's open source code!

We humans learn to code in a very similar way, by looking at other peoples code, so why can't AI do the same?

I understand the sentiment behind the AI model being closed source, however I hope FOSS innovations in the future allow us to develop a similar AI which is trained on the same dataset... now that's freedom!

6 comments
ascendo

@TheDcoder @Codeberg I think the main reason people oppose the training of LLM on GitHub repos is, that they totally ignored the licences under which these projects were published. They just took all they can get and now make piles of money on the back of the community.

TheDcoder

@ascendo @Codeberg AFAIK GitHub requires all public projects to be under a "forkable" license, so technically they're all free range.

I don't think the dataset can generate verbatim code samples from those projects... if it is doing something like that, then it's clearly theft (since it does not attribute).

I also get that it's somewhat unfair for them to be making off of this, but it is also one of the beautiful things about FOSS, you can make money and sustain from it!

YurkshireLad

@TheDcoder @ascendo @Codeberg you can probably fork any open source project but that doesn't mean you can use the code in any way you choose. You still have to follow the original license.

TheDcoder

@YurkshireLad @ascendo @Codeberg Well, it depends on where you draw the line for "use", the AI does not execute the code or redistribute it in a meaningful way (aside from generating output which looks like that code plus the other trillion lines of code it trained on).

The same argument can be used for general-purpose GPT training sets as well, they just train on all text scoured from the web regardless what who owns the text.

ascendo

@TheDcoder @YurkshireLad @Codeberg I mean, there is a small chance that chatGPT returns 100% of your code to somebody else. Google itself ordered recently that their employers should not paste text in their own LLM due to this (and other reasons).

TheDcoder

@ascendo @YurkshireLad @Codeberg I guess that's possible with ChatGPT, I don't really know how they work so I'm hitting my knowledge barrier here.

Go Up