@thomasfuchs They don't necessarily need to. In some...

@thomasfuchs They don't necessarily need to. In some contexts, the promise (which may or may not work) is that they can be used to replace expensive human experts with cheaper "proofreader" roles. Instead of solving interesting technical problems, the human's role degrades to verifying that the LLM didn't screw it up.

And in some contexts, that latter part can be at least partially done using non-LLM software. (Eg. formal verification tools in programming.)

Like 27 January at 18:51 | Wall-to-wall | Open on hachyderm.io

4 comments

datarama

@thomasfuchs The former part of this was *exactly* what the Hollywood writers went on strike over. They *absolutely didn't* want that to happen to their trade.

(Both because of the pay cut, and the likelihood that they'd end up rewriting the whole thing from scratch anyway - but also because even when it *does* work, they didn't want to cede the *creative* part of their job to LLMs, leaving them only with drudgery.)

27 January at 18:53 | Open on hachyderm.io

Magnus Ahltorp

@datarama @thomasfuchs Have anyone put any thought at all into how you maintain or change these LLM-generated sourcecodeless systems, even *when* it produces somewhat usable code?

“Let’s throw out everything we know about software development, that will probably not cause any problems”

27 January at 19:07 | Open on mastodon.nu

datarama

@ahltorp @thomasfuchs I doubt it's going to work for large-scale systems anytime soon. Imagine the kind of "natural language" specification you'd need to produce something like eg. Firefox, Unreal Engine, or the Linux kernel.

But for the small LLM-generated apps people are producing today (where the LLM iterates based on error messages from the compiler), you change them by changing the natural-language prompt that generated them.

...

27 January at 19:14 | Open on hachyderm.io

datarama

@ahltorp @thomasfuchs ...but at least right now, this has the major problem that you can't be sure it didn't also change something else.

(And we *know* that they don't currently work well for larger-scale system maintenance: Their performance in the SWE-Bench benchmark, where they're given actual Github issues on actual Github repos rather than leetcode problems, is *abysmal*, 0-4% success rate.)

27 January at 19:16 | Open on hachyderm.io

Go Up