Email or username:

Password:

Forgot your password?
165 comments
A Person

@atoponce probably getting messed up by the Yosemite decimal system.

Locksmith

@timdesuyo @atoponce
And this statement must be said while making a clever frown and scratching the chin with the fingers.

A Person

@locksmithprime @atoponce i dont know that i am capable of looking clever, but i swear the beard scratch increases processing capability by at least 10%

Kierkrampusgaanks regretfully

@atoponce This sounds like talking about the economics of cybertruck with fanboys

Estarriol, Cat owned Dragon

@atoponce that is dangerous garbage, next it will apply bodmas....

Becky

@atoponce CWGAAS: Confident White Guy As A Service.

Shiri Bailem

@atoponce ... so you prove "someone" lacks all intelligence by challenging their math disability?

It only has a handful of cognitive functions, none of which are good at math. Honestly, it does better than I expected.

DELETED

@atoponce

Aha! But ChatGPT is playing 4D chess, you see.

Because in Semantic Versioning, 9.11 IS bigger than 9.9

:blobfoxsmug:

NEVER FORGET

L

@ethanjstark @atoponce Ah! This is why I was confused for a moment.

I knew the right answer but my brain derp-hesitated for half a second, now I understand why 😅

Gersonzao

@atoponce@fosstodon.org And then, chatgpt said "it's bullshitting time" and he bullshitted all over the place, generating smooth-sounding garbage

Alex The Silly Kitty :neocat_happy_blep:

@atoponce@fosstodon.org LMAO what's next, a quarter pounder being larger than a third? 💀

ENIGMATICO :heartbleed:

@atoponce@fosstodon.org You should have asked to express both of them as a fraction and compare them, but I guess that too would fail catastrophically.

TSource Engine Query
@atoponce of course that glorified markov chain can't do math
janet_catcus

@atoponce i think it does exactly what it is supposed to and mimics our intelligence to a T.

DELETED

@janet_catcus Hah, I've said this, too. The most "human" thing ChatGPT does is never admit it doesn't have an answer.

Violet Rose

@atoponce
And an answer from Copilot just now:

A screenshot of Microsoft Copilot answering the question "9.9 and 9.11 - which is bigger". It says 9.11 is bigger, the logic being "The decimal portion in 9.11 (the 0.11) is larger than the decimal portion in 9.9 (which is just 0.9)."

It also describes 9.11 as "nine point eleven."

It appears to be comparing numerical value, and doing so incorrectly.
Violet Rose

@atoponce
This time I asked which number has the greater numerical value. It gives a correct explanation, but stalls out before concluding that 9.9 is greater, almost as if it doesn't want to contradict its previous answer.

A screenshot of Microsoft Copilot answering whether 9.9 or 9.11 has the greater numerical value. It provides the equivalent values in terms of fractions of 100, but stops before giving a final answer.
GhostOnTheHalfShell

@violet @atoponce

Had to pause before to burn down another forest.

xyhhx :PunkFelix:

@GhostOnTheHalfShell @violet @atoponce this thread evaporated two liters of water in ai queries alone

Mᴀʀᴋ VᴀɴᴅᴇWᴇᴛᴛᴇʀɪɴɢ

@violet @atoponce Truly, we have been freed from the labor of our demonstrating our own ignorance, biases, and stupidity, and the end must be near.

Philip Wilson

@violet @atoponce

I asked Gemini. It was wrong too.
I asked it for its drafts. It got it right in 1 of 3.

Image of Google's Gemini AI answering the question, "9.11and 9.9 -- which is bigger?" and showing 3 draft answers, of which only the 3rd is correct.
Iwillyeah

@PhilipVWilson @violet @atoponce I see it's offering no explanation for why the correct answer is correct though. That's interesting.

Ian McGecKobold (Octobold)

@violet
Ah, the classic "Ah, the classic" that instantly gives away AI slop

tocisz

@atoponce I agree, 9.11 is a big deal. Never heard of 9.9. You haven't specified which metric to use when comparing.

tocisz

@atoponce but it got lost in the explanation for sure

tocisz

@atoponce but at least it's politically correct

johnaldis

@atoponce It’s not “garbage” exactly. It’s not correct maths, but for example Michael Rosen’s “Hairy Tales and Nursery Crimes” is full of “factual errors”, and no-one should think that is “garbage”. The only issue here is if you try to use generative AI to give you correct answers, which is like trying to get an oboe to tell you who composed Handel’s Messiah.

DearFox

@atoponce
I'm also trash, I'm bad at math and don't know the multiplication table, let alone multiplying and dividing fractions :blobfoxcomfysmirk: :blobfoxgooglymlem:

DearFox

@atoponce
Well, for the sake of truth, they write it themselves "ChatGPT can make mistakes. Check important info."
:blobfox_w_:

TheClothesHaveNoEmperor

@atoponce
Thinks exactly like a human. But with massively more power consumption per error.

Alessandro Corazza

@atoponce maybe one day we can invent a machine that's good at doing math, but until then this is the state of the art!

Michael ☕️

@atoponce
I don't know what you're talking about.
9.11 is clearly bigger than 9.9.
9.11 is 4 characters, and 9.9 is only 3.
#EverythingIsAString

Xe :verified:

@atoponce This is actually a tokenization error. 9.11 looks larger than 9.9 because 11 tokenizes as a single unit and 11 is usually larger than 9.

Shiri Bailem
@cadey @atoponce I'm really curious about this because frankly I'm surprised it was even conceptually close
Gustavo

@cadey @atoponce In other words, despite all efforts to make math work better with LLMs, like adding Python support, it's still bad at it. Also it inherited the overconfidence from the dataset, which should include Reddit.

Andrew Benedict-Nelson

@atoponce hey look it’s a word calculator not a calculator calculator

Kelsey Jordahl

@albnelson @atoponce yeah, why would we expect a huge, optimized linear algebra machine to be able to do arithmetic?

Andrew Benedict-Nelson

@kajord @atoponce see, algebra is where we went wrong in the first place. Never should have mixed up letters and numbers.

Darrin West

@atoponce They want to hook this up to gene sequencing machines. Why do these tech bros exist? Nature may try to eliminate them by eliminating us all.

Lewis Edwards

@atoponce "Gee, I wonder why our probe slammed into the surface of the planet?"

ry_

@atoponce it’s not garbage, it’s not intelligent. It has useful applications, but arithmetic isn’t one apparently. It’s not wholly surprising. LLMs model natural language. Arithmetic isn’t natural language.

Tor Iver Wilhelmsen

@ry_ @atoponce That would basically mean Copilot in Excel is nearly pointless, unless it does something smarter than the rest of Copilot for M365

ry_

@toriver @atoponce I’ve not used copilot, but I assume its numerical analysis output is not directly via the attention mechanism of an llm. Eg it could be using a llm to predict the context to data which is then fed into routines, or use llms to offer code suggestions. None of these things are llms directly doing maths.

Noah Cook

@atoponce I wonder: you know how virtual assistants are given feminine names and voices (Siri, Alexa)? And you know how there is a persistant false belief that women are somehow worse at math than men?

I have to wonder whether that combination of biases has any influence on the programmers who create these LLMs? I mean on top of all of the other biases and misunderstandings they already have about neuroscience and language? Are they creating their own stereotype of a ditzy secretary?

Bill Zaumen

@UncivilServant @atoponce It would help if "virtual assistants" used a name and voice appropriate to a 5 year old child.
There was an old TV program called "Kids say the Darndest Things" with clips of it shown on The Bill Crosby show (well before he was arrested):
youtube.com/watch?v=G1voLZyI0S

flere-imsaho

@UncivilServant this has nothing to do with biases; llms don't produce correct answers, they produce statistically-probable text completion. @atoponce

masukomi

@atoponce i had my head in Semantic Versioning land just before reading this so i was like "yup. 9.11 is bigger than 9.9" while simultaneously thinking "something's not right here"

flere-imsaho

@Hexa there's always one promptfondler in the thread that doesn't understand that you can't get fully repeatable answers from the confabulation engine, and that any answer to that question is a valid answer within the llm paradigm, no matter if it's incorrect or not.

(there's also another promptfondler who thinks that the problem is just in one particular llm, not in the way llm works)

@atoponce

Ryan Dormanesh

@mawhrin @atoponce fair point. I apologize. Also “promptfondler” 😆 I’ve never heard that one.

Emma (has_many_books of old)

@mawhrin @Hexa @atoponce came for the promptfondler, stayed for the confabulation engine

squalouJenkins

@atoponce c'est sûr que si il a été entraîné sur les copies des élèves ...

lp0 on fire :unverified:

Its initial response is ‘correct’, but only if the items being compared are version strings.

DELETED

@atoponce

Funny that a lot of people are trying to justify this as "well, it's not made for that"...

But other people are relying on it for these things, so it *IS* a problem. We have real world evidence that the majority of people are not understanding any of this, plus it's being marketed as "a tool that you can use for anything" (in your OS, in your phone, in your browser, etc.).

unusual zone of infecundity
@atoponce i read this entire thing thinking it was going to be a funny joke about versioning :menheraSob:
Bill Zaumen

@atoponce For the example - ChatGPT botching arithmetic - it actually passed the Turing test. Once in a store, I ordered 2.2 lb of some deli item, and the scale registered 2.02. The guy behind the counter called 2.20 "two point twenty" and 2.02 "two point two". The scale always showed two digits past the decimal point. This guy basically made the same mistake as ChatGPT.

flere-imsaho

@bzdev nah. don't antropomorphise a statistical engine.

AVerRoA

@atoponce

Je crois que les économistes qui nous annoncent que le programme de LFI sera une catastrophe, se sont renseignés auprès du Chat Gépété.

Johnny Peligro 🍅
@atoponce in this case maybe, but as a chatbot and as art generator is pretty good
Ritschi Alpenstern :antifa:

@atoponce Oh Great, we have a tool that uses energy to simulate stupid. As if we hadn’t enough.

echopapa ☑️

@RAlpenstern @atoponce

in the meantime, they have fixed this issue. But I think, we only have to dig a little bit deeper now.

Ritschi Alpenstern :antifa:

@echopapa @atoponce I just tried it on chatGPT yesterday asking it to calculate 9.9 - 9.10 and it tried to convince me it was -0.2

Stefan Baur 6 * 💉

@RAlpenstern @atoponce This is a frackin' IL-Series Cylon from the old Battlestar Galactica series.

Mathieu Génois

@atoponce
This is exactly what I got when some students of mine tried to do their math homework with ChatGPT.
@dlbasile

shironeko
@atoponce this has been an hot meme in China apparently, lots of people can't tell which is bigger 13.11 or 13.8
qwertzalotl

@atoponce

This is gonna be soo much fun....

br00t4c

@atoponce Pseudo profound bullshit is the scourge of our age.

Erik Jonker

@atoponce why would anybody think a large language (!) Model should be able to do math.

flere-imsaho

@ErikJonker the authors, apparently, and it is being sold to the public as an universal answering and search engine. @atoponce

Erik Jonker

@mawhrin @atoponce by using ChatGPT-4 a lot, you quickly find out what it's good for and what not, in my experience, it's limited but does have added value.

youen

@trzyglow @atoponce I'm pretty sure it does not interface with python. There must just be in its learning database some content about subtractions in python, maybe with these numbers, or maybe it is able to replace numbers in an example with other numbers and redo the math.

If it was actually executing python code (or any language?), I'm sure someone would have already broken it by asking for the result of malicious code.

magic💫

@keithpjolley @atoponce well yeah how else would 1.9 be greater than 9.1?

:blahaj: Why Not Zoidberg? 🦑

@atoponce As someone else on Mastodon pointed out: Companies has spent BILLIONS to make a program that can simulate a computer that cannot do math.

sharky5740

@atoponce Well, Python version 3.11 is a later version than 3.9.
Q.E.D.

Lina

@atoponce no logical rules, no mathematical rules, just stupid statistics for the result :blobcatfacepalm:

Jim Carr

@atoponce I'm seeing lots of "but that's not what generative AI is good at" rebuttals, but AI is marketed as a be-all, end-all tool, so this is a real danger. I use it quite a bit for programming assistance, but I always double-check my results.

Jim Carr

@atoponce And for what it's worth, I tried the same exercise in Claude 3 (via the DDG AI Chat) and I got the same wrong answer.

Screenshot of AI chatbot being bad at math.
Leonard

@atoponce well, linux 6.10 is newer than linux 6.9.

So that's probably where the confusion stems from, somewhat? That sometimes it does get counted up that way?

I mean in the end it's somewhat of a parrot just evaluating the most likely to be reasonable answer and responding with that. Doesn't have to make sense to you or me, just gotta make a good way through the neural net.

The python-thing is pretty hilarious. I know I'm right, even if I'm wrong... and here's why

Nazo

@atoponce I think the problem here is it may be considering that like a program version number. Which is one thing I *HATE* about the way version numbers are done. In program versions that's actually 9 and 11 versus 9 and 9 rather than just over 9.1 versus 9.9. I wish that would go away as a thing.

That said, yeah, it's still messing that up and should have figured it out better than that anyway. (There are some models that try to produce better math results, but since they're not actually doing math, results still vary.)

I find it incredibly ironic that 300+ watts are being spent to do things a calculator could do at 0.1 watts and still not getting it right a huge percentage of the time when all they truly need to do is pass it through a simple mathematical function.

@atoponce I think the problem here is it may be considering that like a program version number. Which is one thing I *HATE* about the way version numbers are done. In program versions that's actually 9 and 11 versus 9 and 9 rather than just over 9.1 versus 9.9. I wish that would go away as a thing.

That said, yeah, it's still messing that up and should have figured it out better than that anyway. (There are some models that try to produce better math results, but since they're not actually doing...

Jona Joachim

@atoponce
It's a large *language* model, not a math model
@otto

otto@openbsd

@jaj @atoponce it's a large language model that claims to understand the language of math

Jona Joachim

@otto
Does it 🤔 I thought it was a well known limitation. It does know about theorems of course but numerical calculation I wouldn't think so
@atoponce

otto@openbsd

@jaj @atoponce looks at the example. It claims to understand the computation and even "explains" the python result.

Jona Joachim

@otto
I'm not sure you can say it claims something, it's just an algorithm that spits out numbers which get mapped to letters and words and neural networks can be very wrong when used to extrapolate outside of the training domain. We have to stop treating these algortithms as if they were sentient. And we need to find a way to put condidence intervals on the outputs of the model
@atoponce

marinheiro

@atoponce The 'like' button is not enough, I really want a 'lol' button, cos I did

Ken S

@atoponce … until AI masters methodology, it should not be responsible for generating.

Erik

@atoponce I don't understand that python bit, but I'm quite sure chatty doesn't either

Acvaristul_Lenes

@atoponce it certaiy doesn't have math talent. Like many people I know.
It helps me a lot in nutrition, sports physiology and performance.
It also saves me time at some sysadmin and PC technician tasks.
It has spotless grammar in all 3 languages.
Aaron, had you not known who you're chatting with, what would have been your reaction?

Melroy van den Berg

@atoponce fun fact: when devs are creating releases with numbers, 9.11 in actually newer / bigger than 9.9. Yeah, I know, right? Often including a patch number as well would be: 9.11.0 and 9.9.0.

Mike Farley :verified:

@atoponce well done, OpenAI, you've developed a computer that cannot compute.

KielKontrovers

@atoponce AI is Just the Mr. Knowitall. He has a lot of mansplaining to do. Which sometimes is cute.

Dan O'Ginnec

@atoponce

hessen.social/@atoponce@fossto

If one uses a LLM (like #chatgpt to do math (or any other "logic"-based Problems) it will fail. it generates answers not by thinking (or #logic
generally) but by searching a text-database.

Wrong Tool.

Look at this: i let chatGPT create a #fantasy #map for a roleplaying Game:

While it knows a lot about Maps, and creates a map, mountains are in the Sea as well as the river flows through the sea and there'a also a marine forest.

@atoponce

hessen.social/@atoponce@fossto

If one uses a LLM (like #chatgpt to do math (or any other "logic"-based Problems) it will fail. it generates answers not by thinking (or #logic
generally) but by searching a text-database.

Wrong Tool.

Look at this: i let chatGPT create a #fantasy #map for a roleplaying Game:

Surreal Map Created by using ChatGPT to make a fantasy Map
Dan O'Ginnec

It can, by the way, create a simple #house (as svg-xml), and even isnt THAT bad at creating a #human figure (a #venus ) if prompted correctly:

#chatgpt #xml #svg #graphic #design #art

@atoponce

fedithom

@atoponce

Kewl. Now, can we collectively stop using this shit that's burning our planet?

social.saarland/@fedithom/1124

Gianluca

@atoponce Generative AI (in this case a LLM) ist not garbage, it is what it is: A large-language model (LLM) and not a calculator. It does what it is supposed to do - create text.

Gianluca

@atoponce It's like using a calculator hoping to get a letter done and then calling it garbage because it does not spit out words and sentences.

Fardels Bear

@atoponce

This AI is quite human in the way it simultaneously rationalises and doubles down on mistakes.

Jochen Garcke

@atoponce the answer also differs if one uses an or instead of and. With the or the system seemingly corrects itself in the explanation

Simon Brooke

@atoponce sadly, the whole of #ChatGPT has been deleted, after someone forgot to turn off the garbage collector.

Abhinav Tushar

@atoponce I gave your screenshot to GPT4o.

Fun aside, I think it's nice of them to write 'ChatGPT can make mistakes. Check important info.' in the bottom.

Peter Bloem

@atoponce This is most likely an issue with tokenization, rather than something fundamental. That is, the model can't see the structure of the numbers the way we do.

If you write the numbers more unconventionally, they don't get tokenized, and the model can perform the task.

The same task, but written in emoji.
The same problem written out in words.
The same problem, with the numbers written out as lists.
Maarten Sneep

@atoponce @Marrekoo Apple “threatening” to delay or not make available “Apple Intelligence” and ChatGPT in the EU is not the threat it thinks it is.
I can do without a bullshit generator in my pocket.

JDS

@atoponce @ai6yr gigawatt powered magic 8 balls that are less accurate than 50 year old portable calculators 👏

Taylor R Campbell

@atoponce You can tell ChatGPT is right because it set the math in Computer Modern! I bet Python doesn't do that.

ToucanIan

@atoponce oddly enough initially I was thinking of the definition/theorem numbering format where 9.11 is bigger than 9.9.

Max (Lyrenhex)

@atoponce I love that it justifies it as a "small precision error".

Floating point isn't that bad!

dbread

"Much to learn, you still have." #ai

@atoponce

tom jennings

@atoponce @tante

strlen ("9.11") is .GT. strlen ("9.9") however.

Uddelhexe

@atoponce

So they made an algorithm with the www at it's disposal to solve problems and it is not able to function as a calculator
😂😂😂

sollat

@atoponce
I want to gouge my eyes out so I never see anything that stupid again.

Go Up