Email or username:

Password:

Forgot your password?
Top-level
Feoh

@glyph Agreed. We're using Tab9 - tabnine.com/ which trains only on the code in your repository and doesn't treat your code like a publicly exploitable commodity.

Also? I think it produces vastly more useful, if less audacious in certain terms, results.

I've found it saves me probably around 30-40m a day in boilerplate I don't have to type.

6 comments
mort

@feoh @glyph I seriously doubt that it only trains on the code already in the repo… these LLM style networks take an absolute ton of training data.

In fact, their privacy page explicitly says that it *does not* use your code for training.

It seems exactly identical to Copilot in terms of copyright ramifications.

Stephan

@mort @feoh @glyph how it is usually done is: the model is trained on a bunch of publicly available and/or private data, and later fine-tuned based on your own data to yield more relevant results for your own use case. I doubt that anyone's own code output is enough to train a good large language model.

Gerbrand van Dieyen

@durchaus @mort @feoh @glyph according to their website "Trained exclusively on permissive open-source repositories"
You can optionally adapted with your own code base, where they promise the code won't be exposed.

I must say does seem useful and legit tabnine.com/

mort

@gerbrand @durchaus @feoh @glyph As was pointed out already (mastodon.gamedev.place/@Doomed), “permissively licensed” doesn’t mean public domain. Permissive licenses still have terms, such as the requirement to include a copyright notice.

Callie

@feoh "Tabnine models only train on open source code with permissive licenses"

Daniel Gibson

@pidgeon_pete @feoh
Even most "permissive" licenses requires you to keep the copyright header in the code intact (e.g. zlib license, boost license), and often also in the documentation (BSD, MIT, ...).
Or is it exclusively trained on public domain/CC0/Unlicense/WTFPL/... code?

Go Up