David Chisnall (*Now with 50% more sarcasm!*)

David Chisnall (*Now with 50% more sarcasm!*)'s posts Post Back to profile

@carbontwelve I used machine learning in my PhD. The use case there was data prefetching. This was an ideal task for ML, because the benefits of a correct answer were high and the cost of an incorrect answer were low. In the worst case, your prefetching evicts something from cache that you need later, but a 60% accuracy in predictions is a big overall improvement.

Programming is the opposite. The benefits of being able to generate correct code faster 80% of the time are small but the costs of generating incorrect code even 1% of the time are high. The entire shift-left movement is about finding and preventing bugs earlier.

Like 21 December at 10:38 | Open on infosec.exchange

13 comments

Mx Autumn :blobcatpumpkin:

@david_chisnall that’s a nicely eloquent way to put both into perspective.

21 December at 10:45 | Open on notacult.social

David Clarke

@david_chisnall @carbontwelve this is what has been gnawing at the back of my brain. The purveyors of LLM's have been talking up the latest improvements in reasoning. A calculator that isn't 100% accurate at returning correct answers to inputs is 100% useless. We're being asked to conflate the utility of LLM's with the same kind of utility as a calculator. Would we choose to drive over a bridge designed using AI? How will we know?

21 December at 19:52 | Open on mastodon.nz

David Chisnall (*Now with 50% more sarcasm!*)

@zebratale @carbontwelve Calculators do make mistakes. Most pocket calculators do arithmetic in binary and so propagate errors converting decimal to binary floating point, for example not being able to represent 0.1 accurately. They use floating point to approximate rationals, so collect rounding errors for things like 1/3.

The difference is that you can create a mental model of how they fail and make sure that the inaccuracies are acceptable within your problem domain. You cannot do this with LLMs. They will fail in exciting and surprising ways. And those failure modes will change significantly across minor revisions.

Expand text...

21 December at 20:01 | Open on infosec.exchange

Glitzersachen.de

@david_chisnall @zebratale @carbontwelve

"do make mistakes" I wouldn't call that a mistake. The calculator does what it should do according to the spec how to approximate real numbers with a finite number of bits.

It's (as you explain) a rounding error. A "mistake" is what Pentiums with the famous Pentium bug made.

But maybe it's my understanding of English (as a second language) that is at fault here.

yesterday at 10:16 | Open on hachyderm.io

Pendell

@glitzersachen @david_chisnall @zebratale @carbontwelve the calculator /is/ doing exactly what it's been programmed to... and it is programmed to make specific and defined "mistakes" or errors in predictable and clear cut ways in order to make the pocket calculator run on as little power as possible.

An LLM, likewise, is also doing exactly what it was programmed to do... and that is to spew regurgitated nonsense it read off the internet.

yesterday at 15:18 | Open on mastodon.social

pasta la vida

@pendell @glitzersachen @david_chisnall @zebratale @carbontwelve floating point finance calculations is a common mistake...

today at 10:16 | Open on tech.lgbt

pasta la vida

@pendell @glitzersachen @david_chisnall @zebratale @carbontwelve programmers and CPU designers are just a tad sensitive and insecure when someone points out the calculator makes a mistake and isn't mathematically perfect 😅

today at 10:19 | Open on tech.lgbt

Martijn Faassen

@david_chisnall

@zebratale @carbontwelve

I do find myself building up intuitions for what an LLM does. It's far less reliable than a calculator but humans can build intuitions for other unreliable things that can fail excitingly.

yesterday at 21:43 | Open on fosstodon.org

Haelwenn /элвэн/ :triskell:

@david_chisnall @carbontwelve Well one thing where LLMs can make sense is spam filtering (sadly also for generating it, as we probably all know by now…).

Like rspamd tried GPT-3.5 Turbo and GPT-4o models against Bayes and got pretty interesting results: https://rspamd.net/misc/2024/07/03/gpt.html

Although as conclusion puts, one should use local LLMs for data privacy reasons and likely performance reasons (elapsed time for GPT being ~300s vs. 12s and 30s for bayes), which would also likely change results.

Expand text...

21 December at 22:37 | Open on queer.hacktivis.me

PR ☮ ♥ ♬ 🧑‍💻

@david_chisnall

This 👉 “The benefits of being able to generate correct code faster 80% of the time are small but the costs of generating incorrect code even 1% of the time are high.”

yesterday at 0:18 | Open on ioc.exchange

Arttu Iivari

@david_chisnall @carbontwelve This sounds a classic quality management problem: the cost of the shit.

yesterday at 11:48 | Open on mastodon.social

nebulossify

@david_chisnall @carbontwelve part of the AI hype brainrot seems to be a complete disregard to the possibility of failure. or if there's a consideration of which, then the next bigger, hungrier model will fix it...

yesterday at 13:31 | Open on comicscamp.club

jincy quones

@nebulos @david_chisnall @carbontwelve That's capitalism 101: Create a problem that didn't exist before; conjure a solution you can build a business model around; kneecap whatever regulatory mechanisms that would eliminate the problem for good; leech off society for life!

yesterday at 15:48 | Open on mast.shitpostbot.com