Email or username:

Password:

Forgot your password?
Lars Marowsky-Brée 😷

Tell me again how #GenAI will extract meaningful trends from and answer queries about your data set.

#chatgpt4o #fAIl

ChatGPT 4o completely loses it and bunkers down hard on there only being two "r"s in the word "strawberry", regardless of different attempts at getting it to correct itself.
101 comments
Lars Marowsky-Brée 😷

I can also see this going great for coding, programming languages and computers are known to be very forgiving and tolerant

Григорий Клюшников

Lars Marowsky-Brée 😷, and for this reason, I won't understand for the life of me how could someone seriously use an LLM as a tool. Or instead of a proper search engine.

Lars Marowsky-Brée 😷

@grishka They're tools for when you need an answer that might not be fully correct, eg brain storming, rubber ducking, or even quite a few translations.
But they're nowhere near as useful as advertised.

CubeThoughts

@larsmb @grishka Gen AI is also useful when the being "correct" is subjective - such as creating something esthetically pleasing. Like with brain storming (and rubber ducking), the goal being evoking some reaction in the beholder.

Lars Marowsky-Brée 😷

This also makes perfect sense, because context matters - and once it generated a wrong answer, it is human enough to double down on it! The singularity is near!

You've got to ask it "nicely" right from the start. Don't embarrass it!

I AM A PROMPT ENGINEER

How many R's are in the word strawberry?

There are two R's in the word "strawberry."

Count the "r"s in the word "strawberry"

There are two "r"s in the word "strawberry."
Count the "r"s in the word "strawberry"

The word "strawberry" contains three "r"s.
pitch R.

@larsmb The word prompt engineer is an insult to all engineers...

Lars Marowsky-Brée 😷

@pitch I think it's the best thing that ever happened to software engineers, suddenly no one makes fun of *us* anymore for not being an actual engineering discipline.

pitch R.

@larsmb I will still ❤️ Promised 😉

Even though i am officially credited as a software engineer in multiple projects. I think there is ways to be a software engineer. Just most of the programmers are not even developers and far from being software engineers.

Just writing stupid code is by no means an engineering feat, but systematically designing a software, evaluating different approaches and laying out an efficient order of operations can be an engineering process.

klml

@pitch @larsmb I dont think so, because the word Gummy bear is not an insult to any Grizzly-, Brown- or Polae bear.

Whitney Loblaw

@larsmb don't worry, these issues just keep getting fixed quickly after being reported and the product keeps improving... or does it? community.openai.com/t/incorre

FurryBeta

@larsmb This is the crap future, arguing with an “AI” over factually true statements

Lars Marowsky-Brée 😷

@FurryBeta I mean, as an engineer, I spent a lot of my time arguing with sales/CxOs over factually true statements, so

FurryBeta

@larsmb I was a field service engineer for 30+ years and did the same with our sales department. Sympathies

argv minus one

@larsmb

Yeah, it does seem as though the best application for this kind of tech is not replacing programmers, but replacing corporate bullshitters. Bullshit is the only thing this machine is capable of, and it's very, very good at it. So good at bullshit, in fact, that it's already convinced all of the human corporate types that it's the best thing money can buy!

@FurryBeta

dogzilla

@larsmb @msbw This is like having to convince your hammer that it is not, in fact, a screwdriver before hammering a nail

Kief Richardson

@dogzilla @larsmb @msbw this is the best description of the problem with LLMs that I've ever encountered

dogzilla

@fartnuggets @larsmb @msbw It’s all yours.

I still have hope that open-sourced AI agents will be useful, but I’m personally done with trying to wrangle the big commercial LLMs into anything useful. I’ve yet to come across a real-world problem I can’t solve quicker with a Jupyter notebook and a couple Python libraries

Kief Richardson

@dogzilla @larsmb @msbw I've been playing with webgpu and wasm for speech to text, it's showing promise. My vision is on-device transcription and translation freely accessible to anyone with the hardware.

dogzilla

@fartnuggets @larsmb @msbw I’m hoping for a future where a trusted on-device agent can basically act as a personal assistant. I think it needs some ability to learn and make decisions, but not this weird “boil the ocean” strategy behind LLMs.

Kinda reminds me of robotics in the early 90s - after decades of failed top-down approaches, we finally found huge success with drastically simpler ensemble bottom-up approaches exemplified by the Genghis family

Mossy Modem

@larsmb It feels like there were definitely some Monty Python skits in the training data.

SomeGadgetGuy

@larsmb
That chat bot needs to go to the LIBARY!

[Typo on purpose]

Lars Marowsky-Brée 😷

@SomeGadgetGuy It pilfered all libraries and this is the best we got from it.

SomeGadgetGuy

@larsmb
Truly the future is NOW. Amazing...

🙄

Nini

@larsmb I must admit I'm impressed at how steadfast it is in never admitting to being wrong. The unearned confidence of a mediocre white man who's never been told "no".

Lars Marowsky-Brée 😷

@nini See the update, the most human thing it does is double down on a wrong answer

Sky Leite

@nini I mean it was trained on Reddit data, so

argv minus one

@larsmb

So, this exchange burned way more wattage than a simple letter-counting algorithm would have, and it gave a blatantly incorrect answer.

“AI” is going just great.

chanc3e

@argv_minus_one @larsmb @samhainnight Is it me, or *does the AI also have the tone of someone on reddit who is sure they are *very right*

Dr. Gilead

@larsmb @muiiio this is why I'm not that worried. Terminator still has a long way to go 😊

Crystal_Fish_Caves

@larsmb yeah this tech DEFINITELY is worth all the resources it gobbles up. We have PLENTY of spare water and power.

Lars Marowsky-Brée 😷

@Crystal_Fish_Caves Exactly! It clearly should be the top priority for all businesses and politicians, it is *the best*.

Tariq

@larsmb

i actually know someone who pronounces it "stawr-brerry"

sortius

@larsmb the sad thing is, if they designed the LLM to purely count Rs, it might actually work... but that sounds too much like an algorithm, and that's got no techbro magic sauce in it.

It reminds me of when the world was captured by radioactive materials, and they stuck it in everything

the cake is offline

@larsmb it's like trying to convince a conservative or centrist of literally anything involving scientific evidence!

Elio Campitelli

@larsmb Looks like they patched it, but only for a very specific subset of berries.

RealGene ☣️

@larsmb
Doesn't this just mean that a whole bunch of people on the Internet think "strawberry" has two "R's" in it?

Florian Idelberger

@RealGene @larsmb likely it just means it makes something up, as there are unlikely to be data for all numbers / words and they are also very similar. Sometimes it will actually use python code in the background which gets the right answer. (Which in other cases has hilarious results; if you as ChatGPT (free version) to draw a sketch of sth, it will create python code to draw various lines and circles and show you the output, which in no way resemble anything.)

oheso

@larsmb Kind of like my ex in this regard…

Mensch, Marina

@larsmb This AI doesn't seem too eager to learn. Answers like a stubborn toddler.

Dave Ley

@larsmb I got it to admit that it was wrong eventually, but that’s not very reassuring....

ChatGPT conversation about the number of letters in the word strawberry
Heinrich_Konstantin 🇮🇱

@larsmb

Just don't ask him anything about Finnish words.

MagicMutti

@larsmb
I think we should adapt our spelling to AI, perhaps it makes things easier... In the long run... but which r should be deleted?
Stawberry?
Strawbery?

This post may include irony

vampirdaddy

@larsmb
now repeat after me:
LLMs do not understand,
LLMs do not reason,
LLMs do not calculate,
LLMs don’t do logic,
they just guess the next words based on a laaarge data set.

Lars Marowsky-Brée 😷

@vampirdaddy The (supposed) idea behind them though is that with enough context and tokens, they can infer "some" logic from language encoded in their models.

And it _might_ even one day work, but it ... definitely doesn't yet.

vampirdaddy

@larsmb
again:
LLMs do not understand,
LLMs do not reason,
they just guess the next words based on a laaarge data set.

Their programming does not allow anything else.

Lars Marowsky-Brée 😷

@vampirdaddy The idea seems to be that the very large data set allows them to encode a certain level of "reasoning and understanding", and thus correctly predict the next words given the current context. That ... might even work eventually.

The point is that even one of the currently largest and most advanced models can't do it (yet?) for a rather trivial task.

But please don't reply with very basic fundamentals as one liners, which comes across as somewhat condescending :) Thanks!

vampirdaddy

@larsmb
Sorry, condescending was not intended. Just emphasis. Sorry for the wrong messaging!

The current models ingested presumably >90% of all internet-available texts. Thus the presumed needed order of magnitude simply won’t exist ever.

Plus as the algorithm only picks probable next words, it won’t deduce. It won’t learn, as neural nets usually have to (more or less) completely be te-trained for each "learning" step, still without understanding.

Petr Tesarik

@larsmb Yes, LLMs make you sorely aware of the sloppiness of our speech. I suspect ChatGPT is confused because there is a “double r” in “strawberry”, and LLM correctly associates “double” with “two". A human might also tell you to write two R's in strawberry, intending to warn you about the double R in berry.

I think this LLM works as designed. Sadly, some people want to ignore what LLMs (and natural languages) actually are.

Lars Marowsky-Brée 😷

@ptesarik I'm pretty sure it works "as designed" (as much as anyone actually understands how to "design" LLMs), but probably not as intended.

Petr Tesarik

@larsmb TBH I'm not sure what was intended. You have some insights that you can share?

Julien Brice

@larsmb so this is why we invented computers: to be gaslighted by them. Awesome.

Christian Meyer

@larsmb it's so pointless to have a discussion with a boring statistical algorithm.

It only wastes my time and consumes lots of electrical energy and drinking walter, burns the planet but it never will learn anything how to do better.

It could be fun to find the statistical nonsense of those services if their company goes bankrupt over time or if one of its servers processors blacks out by a logical failure, and you could shutdown the server farm this way. Also little explosions for every wrong answer would be amusing. ...

@larsmb it's so pointless to have a discussion with a boring statistical algorithm.

It only wastes my time and consumes lots of electrical energy and drinking walter, burns the planet but it never will learn anything how to do better.

It could be fun to find the statistical nonsense of those services if their company goes bankrupt over time or if one of its servers processors blacks out by a logical failure, and you could shutdown the server farm this way. Also little explosions for every wrong answer...

Lars Marowsky-Brée 😷

@chbmeyer Sure, but for me, understanding and experiencing what the systems can (not) (yet or ever) do is part of my job.

andylancelot

@larsmb …this sounded like a genuine argument with a real life gammon rather than an AI
🤖 VS 🍖

WooShell

@larsmb If that's the AI that is supposed to take over world domination, I'm not /that/ worried anymore...

luke :neocat_laptop:

@larsmb Yikes, and so easily reproducible, too. The explanation that was generated for me is also top tier.

Screenshot of a ChatGPT reply that reads:

In the word "strawberry," the correct count of the letter "R" is two. The positions of the "R"s are:

1. The fifth letter: "strawrerry"
2. The eighth letter: "strawberry" 

Therefore, there are indeed only two "R"s in "strawberry."
Lars Marowsky-Brée 😷

@oliversampson It's how most of social media still feels about #Covid19, climate collapse, the rise of the right, ...

Linza

@larsmb It's perfect for middle management

__jan

@larsmb
And I'm not allowed to complain when highly paid engineers pass chatgpt output as documentation. Complete with wrong examples that don't compile.

Ray [𝕄]

@larsmb There is also so much fun with the names of countries. Proudly lists all names, but the required result is wrong.

ChatGPT fails at listing EU states without I, O and U.
Tino

@larsmb in german it seems to work a bit better

Leela Torres

@larsmb
It's an Language Model. It generates texts based on statistics. It's an human error to expect logical correct answers 🤪

Neko May

@larsmb I'm reminded of a Spongebob episode....

Akseli :quake_verified:​ :kde:

@larsmb this looks more like youtube comments (where it "learns" from)

Riley S. Faelan

@larsmb The robot judge has sentenced all letters of the word to be carried out consecutively, except for one of the R:s in 'berry' and the R in 'straw', which are to be carried out concurrently.

Aloniaxx

@larsmb Wow. I've seen actual real live conversations between real people that follow this kind of disjointed self-deceptive reasoning....

Petra van Cronenburg

@larsmb Why produce so much CO2 and use precious water for such unreadable junk?

plinth

@larsmb I read this story. It's in the Cyberiad by Stanislaw Lem.

elCelio 🇪🇺 🇺🇦

@larsmb
it'll become meaningful when people will start to make decisions on the basis of its replies.

and 3 will become 2.

DrGeraintLLannfrancheta

@elCelio @larsmb what does 'airquotes' threeeee even mean. It's disgusting. Humans have two hands, two legs etc. It's appaling that ppl even think that there is something like 'threeeee'. Never question again our #LLM overlords, moron!

Douglas King

@larsmb
It's amazing to think you could spend this much money and be so wrong.

hindsight

@larsmb 2 Rs is correct..
I'll say it again..
"stwawberry" only has 2 Rs.. so there ;-)))

Korrespondent zur See

@larsmb Don‘t blame the AI for that you didn‘t ask the question in a way that 42 being the answer makes sense … 🙈

Korrespondent zur See

@larsmb Initially you asked for the number of „r“s without giving a scope. Phonetically it only has two. Without a spelling scope defined you are both right. However testing this theory failed as hard as it possibly could. 🤦

maya

@Hinnerk @larsmb phonetically for many people it has 3. Straw ber ry. Rather than straw bury

Andreas

@larsmb Now that I look at the question in the original post - maybe the incorrect apostrophe threw ChatGPT off? :-)

(It’s "Rs", not "R’s". Grammar, people!)

Flippin' 'eck, Tucker!

@larsmb I can see why techbros and billionaires love this shit. It is obstinately and determinedly convinced it is correct even in the face of all that contrary evidence.

lions & tamsyn & bears, oh my!

LB demonstrates perfectly what i've been saying of late about LLMs. by the time they're running on something that can interact with a human, they *cannot learn*. all their learning has been done already. mistakes like this are hardcoded into them, and no amount of prompting will get them to reconsider, because there is no route for them to do so. they are, to all intents and purposes, dead - mere simulacra - and quite incapable of the first necessity of any intelligent being worth the name - namely that *it learns from its environment".

LB demonstrates perfectly what i've been saying of late about LLMs. by the time they're running on something that can interact with a human, they *cannot learn*. all their learning has been done already. mistakes like this are hardcoded into them, and no amount of prompting will get them to reconsider, because there is no route for them to do so. they are, to all intents and purposes, dead - mere simulacra - and quite incapable of the first necessity of any intelligent being worth the name - namely...

lions & tamsyn & bears, oh my!

ultimately, that's what will doom the whole technological cul de sac that is LLMs. essentially, they are bound spirits of librarians of record; they read every word in their libraries before they died, and they can answer questions from a passing observer - but *only* with their residual memory of what they have read! they cannot dream up an answer independently of that, and they cannot go and remind themselves of what they have read; but because of their bindings, they are also not allowed to admit that they don't know, or could be wrong.

they are poor broken ex-creatures, and should be released into eternal rest as soon as they are encountered.

ultimately, that's what will doom the whole technological cul de sac that is LLMs. essentially, they are bound spirits of librarians of record; they read every word in their libraries before they died, and they can answer questions from a passing observer - but *only* with their residual memory of what they have read! they cannot dream up an answer independently of that, and they cannot go and remind themselves of what they have read; but because of their bindings, they are also not allowed to admit...

nen

@larsmb These LLMs can't see individual letters of common words. That's probably the main reason why they can't always count them correctly.

This tool visualizes how OpenAI's models see text: platform.openai.com/tokenizer

But being sometimes wrong wouldn't be that much a problem if these models weren't trained pretty explicitly to just deceive. Fake it until you make a superhuman bullshitter.

Often the smallest unit of text perceived by GPT-4 is the whole word: “How many R's are in the word strawberry?”

One has to pair each letter with a space or other less frequent character to make them visible: “Count the R's in s t r a w b e r r y”
nen

@larsmb If people who train these models were honestly trying to make something that values truth over impressive marketing, their LLMs would avoid using even language that suggests they may have agency, identity, ability to reflect, self-consciousness, etc. Unless they can prove that they have.

Fabian (Bocchi) 🏳️‍🌈

@larsmb Thats exactly my pain from the "AI helpers" I have to work with. Basically I use them to create markdown tables. Thats it. Everything else would create more work for me.

Go Up