@simon a computer cannot deliberately lie because a computer cannot form intent. Sycophancy and sandbagging as you describe them here are emergent properties of an ML training regimen, things that you are training the model to do without the *humans* having the intent to do so, despite doing all the steps that predictably result in this behavior of the system
@glyph @simon “Lying” implies an intent to deceive and it seems more like the model is always bullshitting but sometimes it gets stuck in a semantic pocket that isn’t true. “Sycophancy” is probably fine as a jargon term, but “lie” has too much real world use to wash out the unintended connotations.