@simon I think “lying” is punchier but it encourages...

@simon I think “lying” is punchier but it encourages anthropomorphism. At this point we need more public xenopomorphism though. LLMs are weird! Maybe in the future a human-like AI will have an LLM module, but today it’s more helpful to know about the token window and HFRL and whatnot.

Like 5 Apr 2023 at 15:42 | Wall-to-wall | Open on mastodon.social

4 comments

Simon Willison

@carlmjohnson much as I dislike the anthropomorphism - I really wish ChatGPT didn't use "I" or answer questions about its own opinions - I feel like that's a lost battle at this point

I'm happy to tell people "it has a bug where it will convincingly lie to you" while also emphasizing that it's just a mathematical language emulation, not an "AI"

5 Apr 2023 at 15:47 | Open on fedi.simonwillison.net

Glyph

@simon @carlmjohnson I guess I also object to this term because it doesn’t really have a bug—it isn’t really “malfunctioning” as I put it either. The goal that it’s optimizing towards is “believability”. Sycophancy and sandbagging are not *problems*, they’re a logical consequence and a workable minimum-resource execution of the target being optimized. It bugs me that so much breathless prose is being spent on describing false outputs as defects when bullshit is *what LLMs produce by design*

5 Apr 2023 at 16:30 | Open on mastodon.social

Glyph

@simon @carlmjohnson if it accidentally wastes resources telling the truth where a more-compressible lie or error would have satisfied the human operator, that’s a failure mode! It will eventually be conditioned out in future iterations, although an endless game of whack-a-mole will ensue as they try to pin it down to *particular* “test” truths (which is exactly what “sycophancy” is) while all others decay

5 Apr 2023 at 16:33 | Open on mastodon.social

Glyph

@simon @carlmjohnson it worries me a little bit that I, with just like, a passing familiarity with what gradient descent is and how ML model training works, can easily predict each new PR catastrophe and “misbehavior” of these models and the people doing the actually phenomenally complex and involved work of building them seem to be constantly blindsided and confused by how the tools that *they are making* behave

5 Apr 2023 at 18:11 | Open on mastodon.social