@alyssa Respectfully, I don't agree. LLMs are super...

Alyssa Rosenzweig 💜's posts Post Back to profile

@alyssa Respectfully, I don't agree. LLMs are super at helping out when you can *know* without the tiniest sliver of doubt that the results are correct, and when you treat the results like a suggestion to be vetted, corrected and massaged and not a completed final deliverable.

Like 10 Jun 2023 at 22:31 | Wall-to-wall | Open on oldbytes.space

6 comments

SnoopJ

@feoh @alyssa Respectfully, the list of people I trust to actually do this post-facto vetting when using one is very short.

10 Jun 2023 at 22:32 | Open on hachyderm.io

Feoh

@SnoopJ @alyssa I don't wish to argue, but let me give you a very concrete example:

"Write pytest unit tests for this code".

It spews out a page full of code, including all the necessary boilerplate for test setup, database setup, etc. etc.

I then take that and add the higher value tests that the LLM doesn't write.

For another example, I am a bit of a windbag. I take a block of business prose, pass it to the LLM, and say "Rewrite this for conciseness and professional tone."

If you *know english* you can validate the correctness of the prose it generates in terms of conveying intent, and if you care you can even use other tools to validate grammatical correctness.

@SnoopJ @alyssa I don't wish to argue, but let me give you a very concrete example:

"Write pytest unit tests for this code".

It spews out a page full of code, including all the necessary boilerplate for test setup, database setup, etc. etc.

I then take that and add the higher value tests that the LLM doesn't write.

Expand text...

10 Jun 2023 at 22:38 | Open on oldbytes.space

Alyssa Rosenzweig 💜

@feoh @SnoopJ Personally, I am uncomfortable using (current) LLMs for those.

For boilerplate - If a system requires large amounts of boilerplate, that's a red flag to me (as an Opinionated developer) about the solution. I would prefer to improve the ergonomics than repeat boilerplate. I realize that's not always possible, but there's enough crap software out there, I'd rather we didn't generate more. The affordance of IntelliJ proliferation is variable and function names becoming more verbose (that may or may not be good). I suspect the affordance of boilerplate generating tools is... systems requiring more boilerplate. ("It's so easy to generate, what's wrong? You don't like code audits that are needlessly difficult? Upset that defect counts are roughly proportional to quantity of code?")

For both - the issue @SnoopJ raises - the current UIs and marketing work together to discourage vetting and instead trust the generated output. Would you catch a subtle bug in the generated boilerplate that caused tests to pass unconditionally? Would you catch a subtle shift in message from the professionalized text?

For both - would you catch plagiarism or open source license violations in the unattributed generated output?

Maybe your eye is more keen than mine. But I suspect with Copilot my brain would be on Autopilot.

I can't trust the output of these tools, the way I can trust YouCompleteMe and proselint. That's reason enough for me to stay away. If I can't trust them for my own work, I don't know how I could trust what people who do trust the output claim / commit / send.

It's tempting to say the problem is misuse. As an expert on GPUs (but not LLMs), I know that the query in question is unanswerable for current LLMs. The honest response I'd expect asking a human is "I don't know, sorry". Instead, apparently GPT confidently spewed wrong info. Was the asker misusing the LLM? Maybe, but it seems that's what the UX encourages.

The point of this thread isn't a moral judgement. It's just that, looking at other people's use of the tools (and the creative ways it can go terribly wrong), it's becoming clear to me that the emperor has no clothes.

@feoh @SnoopJ Personally, I am uncomfortable using (current) LLMs for those.

Expand text...

10 Jun 2023 at 23:52 | Open on social.treehouse.systems

SnoopJ

@alyssa @feoh to me, the larger UX threat is the knowing misrepresentation of LLMs as expert systems for every use case.

I do see the use-case @feoh is talking about, and I've given it a try a few times at the encouragement of others. It's… fine.

But I agree that the overall effect of these systems is corrosive on trust, because as you say, it only takes one such failure to cast a shadow on everything else, even the stuff that isn't LLM output.

11 Jun 2023 at 0:14 | Open on hachyderm.io

Alyssa Rosenzweig 💜

@SnoopJ @feoh usually I like corrosive 🦀

11 Jun 2023 at 0:37 | Open on social.treehouse.systems

Feoh

@SnoopJ @alyssa Oh I totally agree, but I think the oness for that falls squarely at the feet of the people using and relying on these tools in WILDLY inappropriate contexts where they have no business.

I suspect you folks might agree with that :)

11 Jun 2023 at 0:58 | Open on oldbytes.space