@mattb @dalias i asked an LLM engineer about this and he basically said nobody cares about it because it requires a lot of work (domain expertise) so i'm vaguely confident that especially if you define a generative model not in terms of crass next-token prediction but using existing methods of program synthesis via a parse tree or ideally an IR of some sort you could generate a significantly better form of autocomplete trained on e.g. just the code in a small monorepo, or just all the code checked out on your own machine. i think part of the reason copilot didn't release tiered versions according to license (would have been so. fucking. easy. but their goal is to destroy copyright enforcement not to build anything useful) is because it really sucks unless it has a ridiculous amount of data
@hipsterelectron @mattb @dalias this here.
Hand waving some of the infra improvements and some reasoning capabilities: LLMs are just Markov chains with all their pre-computed in lookup tables and loaded into memory.
This is why they will never beat expert systems at reasoning - because that's not what next token prediction is.
Side note: I love the idea of building a AST-based model to query than a token based one.