I spent some time reading the newly released GPT-4o... | Simon Willison

Simon's posts Post Back to profile

I spent some time reading the newly released GPT-4o System Card - it's a fascinating document, with all kinds of interesting new-to-me details in there. I've posted my highlights here: https://simonwillison.net/2024/Aug/8/gpt-4o-system-card/

I particularly enjoyed this bit about "scheming"

Finally, another piece of new-to-me terminology: scheming:

Apollo Research defines scheming as AIs gaming their oversight mechanisms as a means to achieve a goal. Scheming could involve gaming evaluations, undermining security measures, or strategically influencing successor systems during internal deployment at OpenAI. Such behaviors could plausibly lead to loss of control over an AI.

Apollo Research evaluated capabilities of scheming in GPT-4o [...] GPT-4o showed moderate self-awareness of its AI identity and strong ability to reason about others’ beliefs in question-answering contexts but lacked strong capabilities in reasoning about itself or others in applied agent settings. Based on these findings, Apollo Research believes that it is unlikely that GPT-4o is capable of catastrophic scheming.

Like 9 August at 0:10 | Open on fedi.simonwillison.net

2 comments

Lafncow :blobcatcoffee:

@simon "...it is unlikely that GPT-4o is capable of catastrophic scheming."

So either it's bad at it or really good at it.

9 August at 0:49 | Open on mastodon.social

Mans R

@simon Do they actually believe that stuff?

9 August at 7:31 | Open on society.oftrolls.com