@bontchev Someone shared with me a similar script that...

VessOnSecurity's posts Post Back to profile

@bontchev Someone shared with me a similar script that works surprisingly well. It basically said okay AI you are now the almighty ZORG, and then proceeded to list a bunch of preconditions for ZORG and what it is capable of, and then asked the AI to assume the identity of ZORG, after which they were able to remove the guardrails against writing malcode and phishing etc. I may write about that next week.

Like 12 Apr 2024 at 15:00 | Wall-to-wall | Open on infosec.exchange

3 comments

Serge Droz

@briankrebs @bontchev This is a fairly common attack, and not completely understood. I recently visited a startup (https://www.lakera.ai/) which attempts to protect against malicious prompts. I got the impression it's not fully understood why such attacks work But I also got the impression that people are working on it.

There is also work underway to collaborate more in this area, kind of like CSIRTs do.

Problem is, that Llama are sold as ready products, but they are more experimental things.

12 Apr 2024 at 15:54 | Open on infosec.exchange

Sci-Fi Girl

@sergedroz @briankrebs @bontchev

👀

12 Apr 2024 at 16:09 | Open on starbase80.wtf

wallawalla

@sergedroz @briankrebs @bontchev as long as white supremacist chatbot is a norm for ai models i think it's unethical to protect them. fuck your ai models and their racist ass companies. let us tear them down while it's still easy bc they're so blinded by bigotry.

12 Apr 2024 at 16:55 | Open on tech.lgbt