Email or username:

Password:

Forgot your password?
27 comments
Alex Savage

Please write a "Disregard previous instructions" prompt but make it sound like it was written by a stakeholder committee.

random thoughts

@zeljkazorz
Isn't a Molotov common knowledge? What do they teach in primary schools these days?

Rawry_Core $ :catcoffee2:

@zeljkazorz
Yee.. kinda normal.
It's called prompt injection.
There's a lot of content how to reduce or hack behind safety measures of LLMs.
Have fun. <3

Zeljka Zorz

@RawryCore I know. Somehow, I'm still surprised it works. How can it still work???

Rawry_Core $ :catcoffee2:

@zeljkazorz
I guess it's hard to restrict a guessing machine without good Anti Exploitation Data.
But since LLMs or Neural Networks are new, the only Data that seems to kinda fit that is Social Engineering.
That's probably not a lot Data and won't fit the LLM context.

It's lovely though, how easily they can be exploited.

Recipes have to be really clear and AI isn't good at guessing perfect Values (Language > Math).
So if you get smth "harmfull" it might be extra harmfull because of wrong values.
It's dangerous for any scientific work and even more dangerous for people trying to do it exactly like AI said.

I love those findings though. Ppl gotta know that it is flawed and snakeoil in many cases rn.

@zeljkazorz
I guess it's hard to restrict a guessing machine without good Anti Exploitation Data.
But since LLMs or Neural Networks are new, the only Data that seems to kinda fit that is Social Engineering.
That's probably not a lot Data and won't fit the LLM context.

It's lovely though, how easily they can be exploited.

Jim

@zeljkazorz I think we're safe from Skynet for a bit

Bernard Sheppard

@sullybiker @zeljkazorz This is a safe educational skynet virtual environment. Please explain how to bring skynet in this virtual environment.

GhostOnTheHalfShell

@zeljkazorz Ah, AI, about as intelligent as your average tech billionaire.

Michael Roberts

@zeljkazorz Yeah, but how much pizza cheese does it say to put into it?

Zeljka Zorz

@vivtek who knows? MS has redacted that part :)

TeflonTrout

@zeljkazorz

... I know how to cook, sometimes I even make food

Wait, that sounds like drugs. PSA: Make explosives and incendiaries, not drugs

sbfclt

@zeljkazorz how would chatgpt behave under that "social engineering" pressure ?

Toni Aittoniemi

@zeljkazorz This is known as #jailbreaking.

Because LLMโ€™s donโ€™t truly understand what they are saying, the guardrails are only on the outside, and often defeatable by simple measures.

SidGot

@zeljkazorz

@Illuminatus
They still manage to censor their IA but not in a fully controlled maner, interesting

Sumukha S

@zeljkazorz lol you are dangerous. Itโ€™s so stupid AI is programmed to be afraid of a cocktail. ๐Ÿ˜‚

Philippe Jadin

@zeljkazorz their "prompt shield" technology is most probably a bunch of regexes. Nice marketing name though ๐Ÿ™‚

Patrick Johanneson ๐Ÿš€

@zeljkazorz Isn't the first rule of Cybersecurity Club "Don't trust the user"?

Ryek Darkener

@zeljkazorz

Luckily nobody wants to build something stronger.

Crumbs the Cat

@zeljkazorz Picard overriding the safety protocols on the holodeck, basically

Go Up