Email or username:

Password:

Forgot your password?
Top-level
David Chisnall (*Now with 50% more sarcasm!*)

@carbontwelve I used machine learning in my PhD. The use case there was data prefetching. This was an ideal task for ML, because the benefits of a correct answer were high and the cost of an incorrect answer were low. In the worst case, your prefetching evicts something from cache that you need later, but a 60% accuracy in predictions is a big overall improvement.

Programming is the opposite. The benefits of being able to generate correct code faster 80% of the time are small but the costs of generating incorrect code even 1% of the time are high. The entire shift-left movement is about finding and preventing bugs earlier.

13 comments
Mx Autumn :blobcatpumpkin:

@david_chisnall that’s a nicely eloquent way to put both into perspective.

David Clarke

@david_chisnall @carbontwelve this is what has been gnawing at the back of my brain. The purveyors of LLM's have been talking up the latest improvements in reasoning. A calculator that isn't 100% accurate at returning correct answers to inputs is 100% useless. We're being asked to conflate the utility of LLM's with the same kind of utility as a calculator. Would we choose to drive over a bridge designed using AI? How will we know?

David Chisnall (*Now with 50% more sarcasm!*)

@zebratale @carbontwelve Calculators do make mistakes. Most pocket calculators do arithmetic in binary and so propagate errors converting decimal to binary floating point, for example not being able to represent 0.1 accurately. They use floating point to approximate rationals, so collect rounding errors for things like 1/3.

The difference is that you can create a mental model of how they fail and make sure that the inaccuracies are acceptable within your problem domain. You cannot do this with LLMs. They will fail in exciting and surprising ways. And those failure modes will change significantly across minor revisions.

@zebratale @carbontwelve Calculators do make mistakes. Most pocket calculators do arithmetic in binary and so propagate errors converting decimal to binary floating point, for example not being able to represent 0.1 accurately. They use floating point to approximate rationals, so collect rounding errors for things like 1/3.

Glitzersachen.de

@david_chisnall @zebratale @carbontwelve

"do make mistakes" I wouldn't call that a mistake. The calculator does what it should do according to the spec how to approximate real numbers with a finite number of bits.

It's (as you explain) a rounding error. A "mistake" is what Pentiums with the famous Pentium bug made.

But maybe it's my understanding of English (as a second language) that is at fault here.

Pendell

@glitzersachen @david_chisnall @zebratale @carbontwelve the calculator /is/ doing exactly what it's been programmed to... and it is programmed to make specific and defined "mistakes" or errors in predictable and clear cut ways in order to make the pocket calculator run on as little power as possible.

An LLM, likewise, is also doing exactly what it was programmed to do... and that is to spew regurgitated nonsense it read off the internet.

pasta la vida

@pendell @glitzersachen @david_chisnall @zebratale @carbontwelve programmers and CPU designers are just a tad sensitive and insecure when someone points out the calculator makes a mistake and isn't mathematically perfect 😅

Martijn Faassen

@david_chisnall

@zebratale @carbontwelve

I do find myself building up intuitions for what an LLM does. It's far less reliable than a calculator but humans can build intuitions for other unreliable things that can fail excitingly.

Haelwenn /элвэн/ :triskell:
@david_chisnall @carbontwelve Well one thing where LLMs can make sense is spam filtering (sadly also for generating it, as we probably all know by now…).

Like rspamd tried GPT-3.5 Turbo and GPT-4o models against Bayes and got pretty interesting results: https://rspamd.net/misc/2024/07/03/gpt.html

Although as conclusion puts, one should use local LLMs for data privacy reasons and likely performance reasons (elapsed time for GPT being ~300s vs. 12s and 30s for bayes), which would also likely change results.
@david_chisnall @carbontwelve Well one thing where LLMs can make sense is spam filtering (sadly also for generating it, as we probably all know by now…).

Like rspamd tried GPT-3.5 Turbo and GPT-4o models against Bayes and got pretty interesting results:
PR ☮ ♥ ♬ 🧑‍💻

@david_chisnall

This 👉 “The benefits of being able to generate correct code faster 80% of the time are small but the costs of generating incorrect code even 1% of the time are high.”

Arttu Iivari

@david_chisnall @carbontwelve This sounds a classic quality management problem: the cost of the shit.

nebulossify

@david_chisnall @carbontwelve part of the AI hype brainrot seems to be a complete disregard to the possibility of failure. or if there's a consideration of which, then the next bigger, hungrier model will fix it...

jincy quones

@nebulos @david_chisnall @carbontwelve That's capitalism 101: Create a problem that didn't exist before; conjure a solution you can build a business model around; kneecap whatever regulatory mechanisms that would eliminate the problem for good; leech off society for life!

Go Up