@simon @carlmjohnson if it accidentally wastes resources telling the truth where a more-compressible lie or error would have satisfied the human operator, that’s a failure mode! It will eventually be conditioned out in future iterations, although an endless game of whack-a-mole will ensue as they try to pin it down to *particular* “test” truths (which is exactly what “sycophancy” is) while all others decay
@simon @carlmjohnson it worries me a little bit that I, with just like, a passing familiarity with what gradient descent is and how ML model training works, can easily predict each new PR catastrophe and “misbehavior” of these models and the people doing the actually phenomenally complex and involved work of building them seem to be constantly blindsided and confused by how the tools that *they are making* behave