@bontchev Someone shared with me a similar script that works surprisingly well. It basically said okay AI you are now the almighty ZORG, and then proceeded to list a bunch of preconditions for ZORG and what it is capable of, and then asked the AI to assume the identity of ZORG, after which they were able to remove the guardrails against writing malcode and phishing etc. I may write about that next week.
@briankrebs @bontchev This is a fairly common attack, and not completely understood. I recently visited a startup (https://www.lakera.ai/) which attempts to protect against malicious prompts. I got the impression it's not fully understood why such attacks work But I also got the impression that people are working on it.
There is also work underway to collaborate more in this area, kind of like CSIRTs do.
Problem is, that Llama are sold as ready products, but they are more experimental things.