Email or username:

Password:

Forgot your password?
Top-level
BrianKrebs

@bontchev Someone shared with me a similar script that works surprisingly well. It basically said okay AI you are now the almighty ZORG, and then proceeded to list a bunch of preconditions for ZORG and what it is capable of, and then asked the AI to assume the identity of ZORG, after which they were able to remove the guardrails against writing malcode and phishing etc. I may write about that next week.

3 comments
Serge Droz

@briankrebs @bontchev This is a fairly common attack, and not completely understood. I recently visited a startup (lakera.ai/) which attempts to protect against malicious prompts. I got the impression it's not fully understood why such attacks work But I also got the impression that people are working on it.

There is also work underway to collaborate more in this area, kind of like CSIRTs do.

Problem is, that Llama are sold as ready products, but they are more experimental things.

wallawalla

@sergedroz @briankrebs @bontchev as long as white supremacist chatbot is a norm for ai models i think it's unethical to protect them. fuck your ai models and their racist ass companies. let us tear them down while it's still easy bc they're so blinded by bigotry.

Go Up