Email or username:

Password:

Forgot your password?
Top-level
Alaric Snell-Pym

@carbontwelve @david_chisnall this also matches my expectations, and I've seen people mention studies in teams showing no productivity gain, too.

So I'm intrigued by the few people who DO report that LLMs help them code, though (eg @simon ). Is there something different about how their brains work so LLMs help? Or (cynically) are they jumping on the bandwagon and trying hard to show the world they've cracked how to use them well, to sell themselves as consultants or something?

24 comments
jt7d

@kitten_tech @carbontwelve @david_chisnall @simon Something I've found LLMs useful for, and I've seen Simon say something similar, is writing code in a language or situation that I _kinda_ know. I might not have bothered writing it if I had to climb boilerplate mountain first, but the LLM serves as guardrails and crack-filler. And since I _kinda_ know the thing, I don't fall for appalling hallucinations, but I get a chance to learn more about the thing fairly painlessly.

Dana Fried

@jt7d @kitten_tech @carbontwelve @david_chisnall @simon this is a really good point - if you treat it as a intern who knows a little about the topic but might be wrong, it can still point you to the correct solution.

This is, perhaps the meatspace equivalent of the caching application described elsewhere in the replies to this thread.

This is how I also use the (mandatory šŸ¤·šŸ»ā€ā™€ļø) LLM results in Google search - as a way to find links and or search terms that will get me to the actual answer.

Martijn Faassen

@tess

@jt7d @kitten_tech @carbontwelve @david_chisnall @simon

Yeah, earlier I mentioned copilot can help me with the flow. It generates some bs and I am that's not what I want, I want this and do it. Or it gives me ideas I hadn't had before. (Besides autocompleting shorter or simpler bits I can easily approve)

Kris Hardy šŸ§

@kitten_tech @carbontwelve @david_chisnall @simon This m matches my experience, and I gave up after trying on four small nontrivial projects. The people I personally know using it successfully only use it in limited cases as a kind of hint system to help remind of things or point out new ways of doing things.

Simon Willison

@kitten_tech @carbontwelve @david_chisnall I'm actually getting more coding work done directly in the Claude and ChatGPT web interfaces and apps vs using Copliot in my editor

The real magic for me at the moment is Claude Artifacts and ChatGPT Code Interpreter - I wrote a bunch about Artifacts here: simonwillison.net/tags/claude-

Here are all of my general notes on AI-assisted programming: simonwillison.net/tags/ai-assi

Stephen J. Anderson

@simon @kitten_tech @carbontwelve @david_chisnall How would you avoid or deal with the issues that David encountered? Specifically, subtle bugs that the process of debugging make the whole process less efficient than writing it yourself. Is there one of your notes that deals with that already?

Alaric Snell-Pym

@utterfiction @carbontwelve @david_chisnall

He has a few examples where he felt something in the output didn't look right, or ran it and found bugs, and had the LLM try again.

Most of his examples are relatively simple things of the form "I didn't want to spend time reading API docs for this quick task", though. I don't find that sort of thing a bottleneck in what I do - and I quite enjoy reading docs, and building a mental model of a tool I can then use to know what its...

Alaric Snell-Pym

@utterfiction @carbontwelve @david_chisnall ... limitations and capabilities are.

The bits of programming that eat my time, which I'd love a tool to help with, are usually understanding a bug in an undocumented and under commented ball of hundreds of kloc of code, too big for an LLM's context window, and where going and quizzing the people who wrote bits of it is essential to success.

The bits Simon gets LLMs to do look like the tasks I do to cheer myself up after that :-)

Stephen J. Anderson

@kitten_tech @carbontwelve @david_chisnall Yeah. A lot of my professional time is spent extending logic, adding new features that follow an existing pattern, refactoring when re-usable abstractions are discoveredā€¦ so far, theyā€™re just not very good at that. And I donā€™t think pure LLMs ever will be - limited token windows and no genuine symbolic representation of knowledge.

Martijn Faassen

@kitten_tech

@utterfiction @carbontwelve @david_chisnall

If you can suddenly create small throwaway applications far more quickly than before, applications that might be too boring or bothersome to create otherwise, that might allow new ways of working altogether.

Simon Willison

@utterfiction @kitten_tech @carbontwelve @david_chisnall you have to assume that the LLM will make weird mistakes all the time, so your job is all about code review and meticulous testing

I still find that a whole lot faster then writing all the code myself

Here's just one of many examples where I missed something important: simonwillison.net/2023/Apr/12/

Simon Willison

@utterfiction @kitten_tech @carbontwelve @david_chisnall but honestly, the disappointing answer is that most of this comes down to practice and building intuition for tasks the models are likely to do well vs mess up

Manipulating some elements in the HTML DOM with JavaScript? They'll nail that every time

Implementing something involving MDIO registers? My guess is there are FAR less examples relating to that in the (undocumented, unlicensed) training data so much more likely to make mistakes

Major Denis Bloodnok

@kitten_tech @carbontwelve @david_chisnall Wouldn't touch it with a bargepole myself, but I think a third possibility is that at least some people reporting that haven't had it write a sufficiently hilarious bug _yet_. After all, the OP hit one every four months - one could easily get lucky if that's a typical frequency.

Martijn Faassen

@denisbloodnok

@kitten_tech @carbontwelve @david_chisnall

I am a couple of years in with copilot. No such bugs yet. Context: Rust. I write lots of tests along with my code (as OP appears to do too), and currently can rely on a massive external test suite.

The one ridiculously hard to debug bug I got is when I had to debug a codebase I ported from Java to Rust and I transliterated bits wrong as the human, and had no incremental tests built up along with the code. No LLM to blame.

Moof is on Sabbatical

@kitten_tech @carbontwelve @david_chisnall @simon so, Iā€™ve found a 10-20% productivity boost with Copilot, mostly when dealing with boilerplate and small stuff. I mostly code python and AL (an ERP-specific language, which doesnā€™t get much if any boost).

What does work: ending statements when you start them, it infers enough for me to want to let it finish the line, maybe the next two-three lines. Sometimes it gets the logic completely wrong, but then you donā€™t accept the suggestion. Sometimes it comes up with edge cases I hadnā€™t considered.

What is more dodgy: explaining what you want and getting it to write the code. That can get quite dodgy, and I rarely accept those suggestions, unless itā€™s boilerplate.
1/3

@kitten_tech @carbontwelve @david_chisnall @simon so, Iā€™ve found a 10-20% productivity boost with Copilot, mostly when dealing with boilerplate and small stuff. I mostly code python and AL (an ERP-specific language, which doesnā€™t get much if any boost).

What does work: ending statements when you start them, it infers enough for me to want to let it finish the line, maybe the next two-three lines. Sometimes it gets the logic completely wrong, but then you donā€™t accept the suggestion. Sometimes it...

Moof is on Sabbatical

That being said, I have some experience working with code submitted by less skilled programmers who blindly copy and paste stack exchange for a living, from before the prevalence of LLMs, and I am somewhat used to reviewing code of that standard. I find the longer LLM-built code is similar to review as that style of code, and in some cases is approaching that level of code quality.

I am tempted to try one of these ā€œcode your own mobile appā€ demo things, as itā€™s a platform Iā€™m unfamiliar with, and I have some itches to scratch.

I believe both my coding style and my speed have been affected by using Copilot, both with modest boosts to productivity.

Could I work without Copilot? Absolutely! Would I want to? I think Iā€™d miss the speed boost in a long python project

2/3

That being said, I have some experience working with code submitted by less skilled programmers who blindly copy and paste stack exchange for a living, from before the prevalence of LLMs, and I am somewhat used to reviewing code of that standard. I find the longer LLM-built code is similar to review as that style of code, and in some cases is approaching that level of code quality.

Moof is on Sabbatical

One place where my colleagues (and to a lesser extent, myself) have found LLMs to be useful is in multilingual situations.

When English is not your first language, but your coding standard requires things to be programmed in English, sometimes you can struggle to use the correct name for a variable, especially when those words are false friends, or are concepts that arenā€™t one word in English. The LLM can make sensible suggestions for variable and function names and the like. Iā€™ve had to do fewer refactorings of colleagueā€™s work due to inappropriate name use since they started with Copilot.

Similarly, pasting a description in Spanish into one of these things and asking for an outline onto which to hang your code on in English has helped, with proper code review.

This stuff is not panacea, but it can help when applied with a healthy dose of scepticism. @kitten_techā€™s OP conclusion is valid, as the benefits are still marginal.

3/3

One place where my colleagues (and to a lesser extent, myself) have found LLMs to be useful is in multilingual situations.

When English is not your first language, but your coding standard requires things to be programmed in English, sometimes you can struggle to use the correct name for a variable, especially when those words are false friends, or are concepts that arenā€™t one word in English. The LLM can make sensible suggestions for variable and function names and the like. Iā€™ve had to do fewer...

David Chisnall (*Now with 50% more sarcasm!*)

@moof

I am tempted to try one of these ā€œcode your own mobile appā€ demo things, as itā€™s a platform Iā€™m unfamiliar with, and I have some itches to scratch.

I wrote my first Android app a couple of months ago. I did it in Android Studio, which didnā€™t have Copilot set up. It took half a day (having it touched Java for 6-8 years, and then mostly only to write test cases when hacking on the internals of a JVM). I went from nothing to a working app in under a day.

The things that took time were:

- Googleā€™s CADT problem meant that a lot of things in the build system had changed from the time tutorials were written and figuring out the differences always annoying.
- The MQTT library I was using needed some extra things for compatibility with older SDKs and they were enabled by default, the instructions for turning them off were documented but figuring out that this was the problem took time.
- I spent ages debugging a connection problem that I assumed was a permissions issue. It turned out that the MQTT server was down (but its status page was not).

I donā€™t think an LLM would have helped with any of these problems.

Android development is so much worse then OpenStep development in 1992 (iOS is a cleaned up version of OpenStep tuned for touchscreens and systems with more than 8 MiB of RAM, so I presume itā€™s better). Adding LLMs wonā€™t fix that, thinking about APIs before you ship a thing that you likely have to support for a decade or so would. In spite of it being a truly terrible platform for developers, it was pretty easy to build something that worked.

Twenty years ago, we were building minimal-code platforms where you could build CRUD web and desktop apps with a few dozen lines of code for your business logic. A lot of frameworks seem to have massively regressed since then. If anything, relying on LLMs to fill in the code that shouldnā€™t be necessary in the first place will make this worse.

@moof

I am tempted to try one of these ā€œcode your own mobile appā€ demo things, as itā€™s a platform Iā€™m unfamiliar with, and I have some itches to scratch.

I wrote my first Android app a couple of months ago. I did it in Android Studio, which didnā€™t have Copilot set up. It took half a day (having it touched Java for 6-8 years, and then mostly only to write test cases when hacking on the internals of a JVM). I went from nothing to a working app in under a day.

Moof is on Sabbatical

@david_chisnall I do miss the era when you could just code up an app with minimal thinking about the common cases that were covered by frameworks. The idea that everyone needs to have their own interface developed is something that the new web era has foisted on us, and is definitely a step back. Electron and company has just made it worse. And donā€™t get me started on my thoughts on WASM.

I agree that LLMs will not help there. Or if they do, it shouldnā€™t be like that.

Either way, I feel that the best way to get a feel for a tool is to use it. You have done so, and come to valid conclusions, and I thank you for sharing.

I expect my next job to be the sort where I will have to battle pressure both from above and below for use of LLMs as a way to accelerate or replace developers. I need to have arguments that sound authoritative in order to battle the massive propaganda^Wmarketing effort being made to sell this as the best thing since Jesus fed the 5k with sliced bread

@david_chisnall I do miss the era when you could just code up an app with minimal thinking about the common cases that were covered by frameworks. The idea that everyone needs to have their own interface developed is something that the new web era has foisted on us, and is definitely a step back. Electron and company has just made it worse. And donā€™t get me started on my thoughts on WASM.

David Chisnall (*Now with 50% more sarcasm!*)

@moof

I need to have arguments that sound authoritative

If you're looking for plausible and authoritative-sounding pronouncements, you've come to the right place!

jincy quones

@moof What kind of boilerplate are you having to write so often that any decent snippet engine couldn't handle perfectly well without the litany of issues of an LLM? We *already have* tools for boilerplate, I don't understand why people are so entranced by LLM's ability to deal with it in an absurdly, grossly inefficient way.

Martijn Faassen

@kitten_tech

@carbontwelve @david_chisnall .

Note that how @simon reports using this to generate little projects is an entirely different mode of working with them. I have used copilot for a few years now and like it myself, which is mostly context sensitive autocomplete.

A Q&A session to create code for a CLI tool or web app is a very different way of working I started exploring more recently. It's surprisingly capable for little projects and requires a different approach.

Go Up