@feoh @glyph I seriously doubt that it only trains on the code already in the repo… these LLM style networks take an absolute ton of training data.
In fact, their privacy page explicitly says that it *does not* use your code for training.
It seems exactly identical to Copilot in terms of copyright ramifications.
@mort @feoh @glyph how it is usually done is: the model is trained on a bunch of publicly available and/or private data, and later fine-tuned based on your own data to yield more relevant results for your own use case. I doubt that anyone's own code output is enough to train a good large language model.