We accidentally invented computers that can lie to...

We accidentally invented computers that can lie to us and we can't figure out how to make them stop

Like 5 Apr 2023 at 4:22 | Open on fedi.simonwillison.net

16 comments

(If you don't think it's possible for a computer to deliberately lie, take a look at "sycophancy" and "sandbagging" in the field of large language models! https://simonwillison.net/2023/Apr/5/sycophancy-sandbagging/ )

5 Apr 2023 at 13:36 | Open on fedi.simonwillison.net

Glyph

@simon a computer cannot deliberately lie because a computer cannot form intent. Sycophancy and sandbagging as you describe them here are emergent properties of an ML training regimen, things that you are training the model to do without the *humans* having the intent to do so, despite doing all the steps that predictably result in this behavior of the system

5 Apr 2023 at 14:38 | Open on mastodon.social

Carl M. Johnson

@glyph @simon “Lying” implies an intent to deceive and it seems more like the model is always bullshitting but sometimes it gets stuck in a semantic pocket that isn’t true. “Sycophancy” is probably fine as a jargon term, but “lie” has too much real world use to wash out the unintended connotations.

5 Apr 2023 at 15:17 | Open on mastodon.social

Simon Willison

@carlmjohnson @glyph I'm actually considering doubling down on "lying" as a term that's useful to use

"ChatGPT lies to you" is a clear and important message for people learning to use these systems

I'm not convinced the semantic debates over intent are genuinely helpful in getting this important message across

5 Apr 2023 at 15:26 | Open on fedi.simonwillison.net

Simon Willison

@carlmjohnson @glyph "ChatGPT can hallucinate" is I think a much less useful message to people just starting to explore these tools

5 Apr 2023 at 15:29 | Open on fedi.simonwillison.net

Carl M. Johnson

@simon I think “lying” is punchier but it encourages anthropomorphism. At this point we need more public xenopomorphism though. LLMs are weird! Maybe in the future a human-like AI will have an LLM module, but today it’s more helpful to know about the token window and HFRL and whatnot.

5 Apr 2023 at 15:42 | Open on mastodon.social

Simon Willison

@carlmjohnson much as I dislike the anthropomorphism - I really wish ChatGPT didn't use "I" or answer questions about its own opinions - I feel like that's a lost battle at this point

I'm happy to tell people "it has a bug where it will convincingly lie to you" while also emphasizing that it's just a mathematical language emulation, not an "AI"

5 Apr 2023 at 15:47 | Open on fedi.simonwillison.net

Glyph

@simon @carlmjohnson I guess I also object to this term because it doesn’t really have a bug—it isn’t really “malfunctioning” as I put it either. The goal that it’s optimizing towards is “believability”. Sycophancy and sandbagging are not *problems*, they’re a logical consequence and a workable minimum-resource execution of the target being optimized. It bugs me that so much breathless prose is being spent on describing false outputs as defects when bullshit is *what LLMs produce by design*

5 Apr 2023 at 16:30 | Open on mastodon.social

Glyph

@simon @carlmjohnson if it accidentally wastes resources telling the truth where a more-compressible lie or error would have satisfied the human operator, that’s a failure mode! It will eventually be conditioned out in future iterations, although an endless game of whack-a-mole will ensue as they try to pin it down to *particular* “test” truths (which is exactly what “sycophancy” is) while all others decay

5 Apr 2023 at 16:33 | Open on mastodon.social

Glyph

@simon @carlmjohnson it worries me a little bit that I, with just like, a passing familiarity with what gradient descent is and how ML model training works, can easily predict each new PR catastrophe and “misbehavior” of these models and the people doing the actually phenomenally complex and involved work of building them seem to be constantly blindsided and confused by how the tools that *they are making* behave

5 Apr 2023 at 18:11 | Open on mastodon.social

An Inhabitant of Carcosa

@simon @carlmjohnson @glyph Other people have pointed out that "bullshit" is more accurate than "lies", and honestly I think it's just as punchy and to the point.

6 Apr 2023 at 15:23 | Open on appalachian.town

sketchyTech

@simon I'm currently at the stage where I think claims that AI is potentially dangerous to humans is hype designed to sell it to the masses as being all powerful. What I believe is dangerous to humans is laziness and letting AI do the thinking for us. There seems to be this innate assumption that ChatGPT's answers are better than any others we'll find on the internet, places where we might have to read and think a little. But if we don't use our thinking muscles we'll lose the ability to problem solve, and that's when we'll be at its command.

Expand text...

5 Apr 2023 at 18:21 | Open on techhub.social

VivSmythe

@sketchytech @simon it seems that these LLM applications are now plausibly hallu-citating nonsense-generators, so they'll be putting political speechwriters out of business soon.

5 Apr 2023 at 21:24 | Open on mastodon.au

swi

@simon this feels related to the inner alignment problem Robert Miles described a few years back. I get the feeling it may be intractable. https://youtu.be/bJLcIBixGj8

6 Apr 2023 at 3:22 | Open on aus.social

Stuart Gray

@simon LLMs can’t lie, they can only ever output tokens according to statistical probability derived from their training.

It responds to its input exactly as it was trained to do with zero understanding or agency. Please don’t fall into the anthropomorphism trap like so many others.

This is a great, clear read on the differences between the ways in which humans think and LLMs predict, a short paper by Murray Shanahan https://arxiv.org/pdf/2212.03551.pdf

6 Apr 2023 at 9:47 | Open on mastodonapp.uk

Simon Willison

@StuartGray I'm not convinced by that

I think it's possible to use the term "lying" while also emphasizing that these are not remotely human-like entities

https://fedi.simonwillison.net/@simon/110146906375675620

6 Apr 2023 at 15:05 | Open on fedi.simonwillison.net