@mort @feoh @glyph how it is usually done is: the model is trained on a bunch of publicly available and/or private data, and later fine-tuned based on your own data to yield more relevant results for your own use case. I doubt that anyone's own code output is enough to train a good large language model.
@durchaus @mort @feoh @glyph according to their website "Trained exclusively on permissive open-source repositories"
You can optionally adapted with your own code base, where they promise the code won't be exposed.
I must say does seem useful and legit https://www.tabnine.com/