@lightninhopkins
A model that can fudge its own safety metrics is dangerous. When I say, "they are learning to lie", this is primarily what I am referring to. You are correct that the language anthropomorphizes something that cannot (yet) be said to possess "intent". Intent however is not requisite for unpredictable "behavior". A posited example is instrumental convergence. Currently the people creating some of these models are aware that the system has a capacity to "misrepresent itself". If they should fail to share such knowledge the system could potentially bypass regulations and safety mechanisms by "lying".
@SETSystems "Currently the people creating some of these models are aware that the system has a capacity to "misrepresent itself".
All of them know that it does. Maybe not the sales folks.
My concerns are less esoteric. More immediate as LLM's trained on the internet are shoved into google.