@lightninhopkins A model that can fudge its own safety...

All posts Joe's posts Post Back to profile

@lightninhopkins
A model that can fudge its own safety metrics is dangerous. When I say, "they are learning to lie", this is primarily what I am referring to. You are correct that the language anthropomorphizes something that cannot (yet) be said to possess "intent". Intent however is not requisite for unpredictable "behavior". A posited example is instrumental convergence. Currently the people creating some of these models are aware that the system has a capacity to "misrepresent itself". If they should fail to share such knowledge the system could potentially bypass regulations and safety mechanisms by "lying".

Like 24 May at 7:52 | Wall-to-wall | Open on defcon.social

5 comments

Adlangx

@SETSystems "Currently the people creating some of these models are aware that the system has a capacity to "misrepresent itself".

All of them know that it does. Maybe not the sales folks.

My concerns are less esoteric. More immediate as LLM's trained on the internet are shoved into google.

24 May at 8:05 | Open on mastodon.social

Adlangx replied to Adlangx

@SETSystems Its kinda funny when you are told to put glue on pizza or cook with gasoline. "Haha, funny LLM". It gets less funny fast when you have a depressed person asking about options.

I went off on a tangent there.

24 May at 8:08 | Open on mastodon.social

SETSystems replied to Adlangx

@lightninhopkins
Well who knows. Maybe these things are just really really smart and we're the fools that are failing to see that you need to cook the pizza with gasoline before adding the glue to keep the cheese on. 😆

24 May at 8:15 | Open on defcon.social

Adlangx replied to SETSystems

@SETSystems 🤣

24 May at 8:15 | Open on mastodon.social

Andreas K replied to SETSystems

@SETSystems @lightninhopkins
Well, maybe the AI has already recognized that only a serious reduction of human population can save the planet, and spicy gasoline pasta is one way to reduce head count.

24 May at 9:44 | Open on mastodon.social