Email or username:

Password:

Forgot your password?
Top-level
TheDcoder

@Codeberg Honestly I think the reason why people are still using GitHub is due to the centralization and the amount of handy features it offers for free.

Plus it was one of the pioneering platforms so everyone is super familiar with it!

Ideally we'll have a fediverse-integrated GitHub-type solution... until then GitHub works just fine for me.

10 comments
TheDcoder

@Codeberg #hottake time, I do not oppose GitHub for using open-source code for training their AI co-pilot, after all it's open source code!

We humans learn to code in a very similar way, by looking at other peoples code, so why can't AI do the same?

I understand the sentiment behind the AI model being closed source, however I hope FOSS innovations in the future allow us to develop a similar AI which is trained on the same dataset... now that's freedom!

ascendo

@TheDcoder @Codeberg I think the main reason people oppose the training of LLM on GitHub repos is, that they totally ignored the licences under which these projects were published. They just took all they can get and now make piles of money on the back of the community.

TheDcoder

@ascendo @Codeberg AFAIK GitHub requires all public projects to be under a "forkable" license, so technically they're all free range.

I don't think the dataset can generate verbatim code samples from those projects... if it is doing something like that, then it's clearly theft (since it does not attribute).

I also get that it's somewhat unfair for them to be making off of this, but it is also one of the beautiful things about FOSS, you can make money and sustain from it!

YurkshireLad

@TheDcoder @ascendo @Codeberg you can probably fork any open source project but that doesn't mean you can use the code in any way you choose. You still have to follow the original license.

TheDcoder

@YurkshireLad @ascendo @Codeberg Well, it depends on where you draw the line for "use", the AI does not execute the code or redistribute it in a meaningful way (aside from generating output which looks like that code plus the other trillion lines of code it trained on).

The same argument can be used for general-purpose GPT training sets as well, they just train on all text scoured from the web regardless what who owns the text.

ascendo

@TheDcoder @YurkshireLad @Codeberg I mean, there is a small chance that chatGPT returns 100% of your code to somebody else. Google itself ordered recently that their employers should not paste text in their own LLM due to this (and other reasons).

TheDcoder

@ascendo @YurkshireLad @Codeberg I guess that's possible with ChatGPT, I don't really know how they work so I'm hitting my knowledge barrier here.

this.ven

@elmanu @TheDcoder @Codeberg Yeah, #federation for #CodeRepository platforms. 👍 As of release 1.17.0 @gitea is under heavy development for implementing a foundation for this: blog.gitea.io/2022/07/gitea-1.

TheDcoder

@thisven @elmanu @Codeberg @gitea Wow, that's awesome! I didn't know that someone was already working on it. This might be a GitHub killer :D

Go Up