Study Finds That 52 Percent of ChatGPT Answers to Programming...

Charlie's posts Post Back to profile

Charlie Stross

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

https://futurism.com/the-byte/study-chatgpt-answers-wrong

Like 24 May at 20:09 | Open on wandering.shop

44 comments

Hugo Mills

@cstross ... but how many human answers to programming questions are wrong?

(OK, probably not 52%, but I bet you it's higher than you first thought...)

24 May at 20:11 | Open on mstdn.social

Escaping Galt's Gorean Gulch

@darkling @cstross the thing is that we're being sold the lie that 99.9% of the answers from the glorified logistic regression are correct.

And that 0.1 is still big enough to kill billions of people.

24 May at 20:13 | Open on orbital.horse

Cheradenine Zakalwe

@darkling@mstdn.social @cstross@wandering.shop It's the wrong question. The correct question is, "What proportion of answers to programming questions given by programmers who understand the language and the question are wrong."

Ask almost ANY question of someone who doesn't actually understand the question or the subject, and the answer you receive is overwhelmingly likely to be wrong. (This is doubly true in America, where it seems to be considered almost a mortal sin to ever be heard to say "I don't know".)

24 May at 20:45 | Open on plasmatrap.com

Cybarbie

@darkling @cstross Indeed I wonder what the StackOverflow accepted answer fail rate is. It's quite subjective. Usually the second or third answer on SO is the correct one, the first usually being the product of some diseased brain that doesn't do real work.

24 May at 20:46 | Open on mastodon.social

mathew

@darkling @cstross An important difference is that on Stack Overflow, volunteers will usually have posted corrections.

Whereas with ChatGPT, people turn up in forums to ask someone else to do the work for them of determining whether the slop from the bot is correct or not.

24 May at 22:07 | Open on universeodon.com

unlucio 🌍 :mastodon:

@darkling @cstross I'd argue that if you hire a software engineer and they're wrong 52% of the time, that wasn't a good hire.

24 May at 22:56 | Open on mastodon.social

sabik

@unlucio @darkling @cstross
If you hire a software engineer and they're wrong 52% of the time, they may still be an excellent hire if they're a good learner, open to feedback, conscientious, etc

ChatGPT is not, it has no facility to learn or handle feedback beyond the session (if that), nothing

25 May at 0:56 | Open on rants.au

sabik

@unlucio @darkling @cstross
If I spend longer than it would have taken me to do it myself helping a junior engineer through a problem, I've helped them grow, to the benefit of them and the team

If I spend longer than it would have taken me to do it myself helping ChatGPT through a problem, I've wasted my time

25 May at 1:07 | Open on rants.au

Jargoggles

@darkling @cstross
The critical difference, something I don't think I saw anyone mention in this thread, is that human beings understand how to say "I don't know."

An LLM is an even worse version of some asshole that weighs in on *everything* and asserts wrong answers just as confidently as right answers

25 May at 3:50 | Open on kolektiva.social

FeralRobots

@cstross
how do we explain to folks that 52% wrong doesn't mean 48% correct?

24 May at 20:11 | Open on mastodon.social

Pete Alex Harris🦡🕸️🌲/∞🪐∫

@FeralRobots @cstross

52% wrong and 40% Not Even Wrong

24 May at 20:23 | Open on mastodon.scot

FeralRobots

@petealexharris
'it can't be wrong if it doesn't answer the question!'
@cstross

24 May at 20:42 | Open on mastodon.social

Bob Thomson

@FeralRobots @cstross don’t mention Brexit.

24 May at 20:39 | Open on mastodon.social

Marc Reeve

@cstross only 52%?

24 May at 20:12 | Open on sfba.social

Patrick Johanneson 🚀

@cstross 52% sounds low.

24 May at 20:14 | Open on mstdn.ca

Zeno

@cstross Still 2% worse of my previous answer machine

24 May at 20:24 | Open on mastodon.uno

Cheradenine Zakalwe

@ezeno@mastodon.uno @cstross@wandering.shop I once knew someone who managed to achieve a grade of 18% on a five-options multiple-choice exam...

24 May at 20:41 | Open on plasmatrap.com

Cheradenine Zakalwe

@cstross@wandering.shop Color me soooooooooo shocked.

My last (as in both most recent, and final) tech employer went really whole hog into using ChatGPT to write code and DB queries.

24 May at 20:24 | Open on plasmatrap.com

TomGregory

@cstross

I worked for over 30 years as a geophysicist in the oil business interpreting seismic data. Companies always tried to sell us artificial intelligence software to allow the computer do the interpretation for us going back as far as the 90s. Up until my retirement about 7 years ago I found that geophysicists would spend more time correcting the "interpretation" the machine did than it would have taken to do the interpretation themselves. I do not trust this "artificial" intelligence.

24 May at 20:25 | Open on mstdn.social

Cheradenine Zakalwe

@cstross@wandering.shop I see upon reading that the article states, "AI platforms like ChatGPT often hallucinate totally incorrectly [sic] answers out of thin air."

While this is true as far as it goes, I believe it misstates — and understates — the problem. A more accurate statement of the problem is, "Large language models hallucinate ALL of their responses. Some of the hallucinations merely happen to coincide well with reality." But you cannot obviously tell them from the ones that don't.

They do not understand anything. They are not designed for understanding. What they are designed to do is very specifically to generate grammatically correct output that looks convincing.

Expand text...

24 May at 20:39 | Open on plasmatrap.com

Justin Derrick

@cstross Every piece of sample code ever provided to me by a project manager or non-technical co-worker that used functions that didn’t exist, had obvious syntax errors, or did things I considered insane…. Turned out to be from an LLM. Their attempt to show me how easy it was to write the necessary code turned into a lesson in why programmers should just be left alone to do the necessary thing.

24 May at 20:45 | Open on mstdn.ca

Cheradenine Zakalwe

@JustinDerrick@mstdn.ca @cstross@wandering.shop this is a sufficiently well-known problem that there is now an established class of software attacks that is based upon predicting fictitious library names likely to be generated by ChatGPT or other LLMs, then publishing libraries under those names containing malicious code.

24 May at 20:49 | Open on plasmatrap.com

Sky UwU

@cstross I find it surprisingly good for rubber ducking or just getting search terms to find elsewhere, I really want to see these tools be a bit more open, is there anything that is good at that kind of problem that is properly open?

24 May at 20:51 | Open on tech.lgbt

okanogen TheEnemyFromWithin

@cstross
Yeah, but that 48%....

24 May at 20:57 | Open on mastodon.social

Gracious Anthracite

@cstross

I am thinking about all the people I see on Hacker News raving about how EFFICIENT talking to AIs is making them and giggling.

I am also hoping I never have to deal with any system they were involved in building...

24 May at 21:07 | Open on dragon.style

Weekend Editor

@cstross

Honestly, isn't it surprising it's that low?

I'd've thought it would bork a lot more questions than that.

24 May at 21:17 | Open on mathstodon.xyz

Pēteris Krišjānis

@cstross so basically at best it is coin flip. No, thank you.

24 May at 21:19 | Open on toot.lv

Sean Eric Fagan

@cstross So like a real programmer then.

24 May at 21:54 | Open on wandering.shop

JdeBP

@kithrup @cstross

Real Programmers don't use languages that computers can *output*. Systems that don't require a keyboard with at least 15 extra non-USB keys, and a specialized foot-pedal connected to the GPIO pins, are mere children's toys used by web developers and JavaScript vapers.

And Real Programmers don't interpret figures like 52/100 in anything other than octal.

(Heh! It has been a while since someone set up a Real Programmers joke.)

#RealProgrammers

24 May at 22:12 | Open on mastodonapp.uk

P J Evans

@JdeBP @kithrup @cstross
I've known people who had to fix their checkbooks, because there weren't any 8s or 9s in the numbers they saw.

24 May at 22:33 | Open on mastodon.social

Bee O'Problem

@JdeBP @kithrup @cstross A Real Programmer works by rewriting the flash memory on the SSD directly with a precise touch of their handheld electron tunneling device

24 May at 23:37 | Open on mastodon.gamedev.place

P J Evans

@kithrup @cstross
Well...we all know real programmers.

A red VW Beetle, with a license plate that reads FEATURE.

Old programmer joke: "That's not a bug, it's a feature!"

24 May at 22:32 | Open on mastodon.social

KanaMauna

@cstross

So flipping a coin is more accurate? Awesome.

24 May at 22:15 | Open on sauropods.win

aadmaa

@cstross It is useful for figuring out syntax of fairly popular languages that you are just learning or don't use often. It is not useful for writing code.

E.g., I have had good luck asking about basic Rust questions, including debugging and explaining my borrow-checker-fights; help writing basic RegEx (since I can never ever remember anything RegEx).

If I were learning TS or JS or SQL I'm sure it would be helpful. It can probably help write a tricky TS type for example, but I don't really need help with those things.

I found it quite bad at languages with less StackOverflow coverage, like Elixir. And forget about, say, the 2023 version of Elixir LiveView - the AI can't help you there.

Also once you get deeper than the basics, it doesn't keep up with the times very well. So one thing I haven't heard much is how it would like to create a tendency towards stagnation in the ecosystem.

By "it" I mean the GPT versions, 3.5 and 4.0. The Google ones are still only good at feeding you rocks.

@cstross It is useful for figuring out syntax of fairly popular languages that you are just learning or don't use often. It is not useful for writing code.

E.g., I have had good luck asking about basic Rust questions, including debugging and explaining my borrow-checker-fights; help writing basic RegEx (since I can never ever remember anything RegEx).

Expand text...

24 May at 22:24 | Open on mathstodon.xyz

Killa Koala

@cstross ChatGPT 5 will reduce that to 49%. Then everything will be fine.

24 May at 22:27 | Open on mastodon.au

⚛️Revertron

@cstross If you ask it to write a program in #Rustlang it fails 90% of tasks :)

24 May at 22:35 | Open on zhub.link

bubbajet 🥥🌴 VOTED

@cstross @lisamelton That’s CEO-level wrong, though.

24 May at 22:42 | Open on mastodon.world

Charlie the Anti-Fascist Dog

@cstross that's the fun part. companies are going to go deep on ai coding only to absolutely fuck themselves over. the code ai can generate is often remedial code that i would never run on a production server. i've never seen it write code that isn't shit. companies think engineers are expensive, they're about to fuck around and find out.

24 May at 22:46 | Open on kolektiva.social

CubeThoughts

@cstross In my (admittedly very limited) experience, I'm not even getting internally consistent answers, such that variables change name and other errors.

But the value, such that it is, has been in getting suggestions for new ways of solving something, which I then can do something with using actual reference documentation.

24 May at 22:56 | Open on mastodon.social

Adam Richter

@cstross I'm legit shocked it's that low.

25 May at 0:06 | Open on mastodon.social

Havyhh2

@cstross ...one way to be rid of AI garbage...
https://saturation.social/@clive/112493382329631597

25 May at 0:18 | Open on mas.to

enoch_exe_inc

@cstross …which is why I don’t use it for that purpose.

Of course, I’ve asked ChatGPT for nonessential programming, like making a Quine in 6502 assembly, and it succeeded in doing that. But for normal work, I don’t dare touch it because if it makes a mistake, then I will have no idea how to fix it.

25 May at 0:43 | Open on mastodon.social

Pepperbike

@cstross i'm surprised it is only 52% and not much higher.

25 May at 3:10 | Open on mastodon.social

Pooblemoo

@cstross Hope healthcare, transportation aren't using AI for it because that's a whole lot of risk. They'd better lawyer up for the bugs that cause accidents and death.

25 May at 3:15 | Open on mastodon.scot