In case you do not know how GenAI works, here is a...

In case you do not know how GenAI works, here is a very abridged description:
First you train your model on some inputs. This is using some very fancy linear algebra, but can be seen as mostly being a regression of some sorts, i.e. a lower dimensional approximation of the input data.
Once training is completed, you have your model predict the next token of your output. It will do so by creating a list of possible tokens, together with a rank of how good of a fit the model considers the specific token to be. You then randomly select from that list of tokens, with a bias to higher ranked tokens. How much bias your random choice has depends on the "temperature" parameter, with a higher temperature corresponding to a less biased, i.e. more random selection.

Now obviously, this process consumes a lot of randomness, and the randomness does not need to be cryptographically secure, so you usually use a statistical random number generator like the Mersenne twister at this step.

So when they write "using a Gen AI model to produce 'true' random numbers", what they're actually doing is using a cryptographically insecure random number generator and applying a bias to the random numbers generated, making it even less secure. It's amazing that someone can trick anyone into investing into that shit.

Like 3 January at 20:54 | Open on infosec.exchange

20 comments

SnoopJ

@sophieschmieg "what if we did statistics, but poorly?"

3 January at 20:55 | Open on hachyderm.io

ask

@sophieschmieg there's also the noise introduced by the GPU scheduler doing the matrix multiplies in a different order which produces different results because float is not associative.

Surely they meant that... Right?...

But also probably that isn't true random either.

3 January at 20:57 | Open on infosec.exchange

Sophie Schmieg

@ask that noise would be considered true random noise, but I don't know how many bits it has. While float isn't associative, it's like "mostly" associative, so depending on the condition of the matrix, it should be fairly low.

In any case, if you wanted to use that noise for cryptographic purposes, you'd first have to debias it by running it through a DRBG, and at that point you could just harvest it directly from the GPU for higher quality and performance.

Or query your stupid hardware RNG that literally every modern CPU has built-in.

Expand text...

3 January at 21:11 | Open on infosec.exchange

Russell Cameron Thomas

@sophieschmieg Sign of a Bubble Economy?

3 January at 21:40 | Open on infosec.exchange

Joris Meys

@sophieschmieg oh fck. I thought they were joking, but they're actually serious??

3 January at 22:25 | Open on mstdn.social

entronid

@sophieschmieg also quantum computers can't fucking break rngs*

*before the end of the universe

3 January at 22:45 | Open on infosec.exchange

entronid

@sophieschmieg CSPRNGs not any PRNG

3 January at 22:46 | Open on infosec.exchange

Yuri Arabadji

He clearly stated "if interested".

Which means if you're not interested there's no world-first quantum-proof random generator.

Quite obvious.

4 January at 0:16 | Open on techhub.social

Cassander

@sophieschmieg If LLMs are snake oil, this "AI RNG" is meta-snake oil. It's like expecting a homeopathy distillation of horse dewormer will cure Covid.

It's so obviously fake that I can't even find a good metaphor to explain how bad it is.

4 January at 0:37 | Open on infosec.exchange

RealGene ☣️

@drsbaitso @sophieschmieg
I think this comes close:

An old Popular Science Magazine cover painted by Norman Rockwell with an "inventor" scratching his head while holding his non-working Perpetual Motion machine.

4 January at 3:36 | Open on hachyderm.io

niconiconi

@sophieschmieg@infosec.exchange Ironically most GenAI implementations have troubles on producing deterministic output due to floating point errors, inconsistent batching, etc. Not random enough for crypto, but random enough to create replication problems. It's what I call Murphy's Duality Law - In engineering, when a system can show both the property "A" and its negation "not A" depending on the specific context, it's always the opposite of what your application needs.

4 January at 2:20 | Open on mk.absturztau.be

Rich Felker

@sophieschmieg LMAO what??? There are ppl trying to use LLM output as RNG??? And thinking "I'm too stupid to understand how it works so that means it's secure!!!111" ??? 🤦

🤏 🎻 when they get pwned. I'm out of patience for the LLM fan 🤡 🚗

4 January at 3:02 | Open on hachyderm.io

Rich Felker

@sophieschmieg BTW not criticizing your choice of MT as an illustration because it's exactly the sort of thing these bozos would know by name, but it's utterly the worst choice of deterministic PRNG. Gratuitously large state, poor output quality. Even a 128bit or possibly even 64bit LCG throwing away lower bits is better.

4 January at 3:05 | Open on hachyderm.io

Beggar Midas

@sophieschmieg Claude has a response for ya. "You're oversimplifying. While language models do use probabilistic token selection, reducing them to "fancy RNGs" is like calling a brain "just electrical signals." The learned probability distributions capture complex semantic relationships and patterns from human knowledge. That said, your skepticism about AI hype is fair - there are plenty of overinflated claims worth challenging."
Not bad for a bucket of bolts 'rando number generator', eh?

4 January at 3:15 | Open on mastodon.social

Olivier

@sophieschmieg "Let's generate low quality random numbers about as fast as a grandma knitting socks using terra-watts of power in billion dollar data centers." - said no one ev... Oh wait.

4 January at 3:58 | Open on sunny.garden

Shannon Persists🌈

@sophieschmieg It doesn't look random at all. It looks like a crude airplane.

4 January at 5:29 | Open on mastodon.social

aardvark

@shannonpersists @sophieschmieg NCC-1701

4 January at 7:12 | Open on ioc.exchange

Elias Mårtenson

@sophieschmieg When you said the term, I just assumed they meant that they use an llm code generator o create a program that generates "cryptographically secure random numbers". I'm sure your standard llm can give you something that resembles this.

It'll take about 5 minutes, and then you can spend the investment capital on more interesting things (like private jets).

4 January at 6:12 | Open on functional.cafe

Matt Blaze

@sophieschmieg My pal Derek's essay seems relevant here: https://www.tetragrammaton.com/content/making-heads-or-tails

4 January at 6:32 | Open on federate.social

John Ripley

@sophieschmieg All of GenAI is the thought experiment "What if we did a shitty version that doesn't work and needs as much power as a city" except some bro did it for real, so this seems like a natural application.

4 January at 7:09 | Open on mastodon.social