Tell me again how #GenAI will extract meaningful trends...

Lars Marowsky-Brée 😷's posts Post Back to profile

Lars Marowsky-Brée 😷

Tell me again how #GenAI will extract meaningful trends from and answer queries about your data set.

#chatgpt4o #fAIl

ChatGPT 4o completely loses it and bunkers down hard on there only being two "r"s in the word "strawberry", regardless of different attempts at getting it to correct itself.

Like 1 7 August at 14:58 | Open on mastodon.online

101 comments

Lars Marowsky-Brée 😷

I can also see this going great for coding, programming languages and computers are known to be very forgiving and tolerant

1 7 August at 15:00 | Open on mastodon.online

Григорий Клюшников

Lars Marowsky-Brée 😷, and for this reason, I won't understand for the life of me how could someone seriously use an LLM as a tool. Or instead of a proper search engine.

4 8 August at 17:14

Lars Marowsky-Brée 😷

@grishka They're tools for when you need an answer that might not be fully correct, eg brain storming, rubber ducking, or even quite a few translations.
But they're nowhere near as useful as advertised.

1 8 August at 18:55 | Open on mastodon.online

CubeThoughts

@larsmb @grishka Gen AI is also useful when the being "correct" is subjective - such as creating something esthetically pleasing. Like with brain storming (and rubber ducking), the goal being evoking some reaction in the beholder.

8 August at 23:11 | Open on mastodon.social

Lars Marowsky-Brée 😷

This also makes perfect sense, because context matters - and once it generated a wrong answer, it is human enough to double down on it! The singularity is near!

You've got to ask it "nicely" right from the start. Don't embarrass it!

I AM A PROMPT ENGINEER

How many R's are in the word strawberry?

There are two R's in the word "strawberry."

Count the "r"s in the word "strawberry"

There are two "r"s in the word "strawberry."

Count the "r"s in the word "strawberry"

The word "strawberry" contains three "r"s.

7 August at 15:05 | Open on mastodon.online

pitch R.

@larsmb The word prompt engineer is an insult to all engineers...

7 August at 15:39 | Open on social.flipdot.org

Lars Marowsky-Brée 😷

@pitch I think it's the best thing that ever happened to software engineers, suddenly no one makes fun of *us* anymore for not being an actual engineering discipline.

7 August at 15:41 | Open on mastodon.online

pitch R.

@larsmb I will still ❤️ Promised 😉

Even though i am officially credited as a software engineer in multiple projects. I think there is ways to be a software engineer. Just most of the programmers are not even developers and far from being software engineers.

Just writing stupid code is by no means an engineering feat, but systematically designing a software, evaluating different approaches and laying out an efficient order of operations can be an engineering process.

7 August at 15:47 | Open on social.flipdot.org

Ysegrim

@pitch @larsmb And to "prompt" :blobfoxlaugh:

7 August at 17:44 | Open on furry.engineer

klml

@pitch @larsmb I dont think so, because the word Gummy bear is not an insult to any Grizzly-, Brown- or Polae bear.

7 August at 18:51 | Open on chaos.social

Whitney Loblaw

@larsmb don't worry, these issues just keep getting fixed quickly after being reported and the product keeps improving... or does it? https://community.openai.com/t/incorrect-count-of-r-characters-in-the-word-strawberry/829618

8 August at 15:12 | Open on mastodon.indie.host

Karsten

@larsmb meaningful but wrong 🤣

#GenAI #chatgpt4o #fAIl

7 August at 15:49 | Open on chaos.social

Lars Marowsky-Brée 😷

@byteborg Perfect for LinkedInfluencers

7 August at 16:11 | Open on mastodon.online

EnbyPT :neocat_flag_gay: :neocat_flag_pan:

@larsmb technology

7 August at 18:30 | Open on tech.lgbt

FurryBeta

@larsmb This is the crap future, arguing with an “AI” over factually true statements

7 August at 18:32 | Open on shark.community

Lars Marowsky-Brée 😷

@FurryBeta I mean, as an engineer, I spent a lot of my time arguing with sales/CxOs over factually true statements, so

7 August at 19:43 | Open on mastodon.online

FurryBeta

@larsmb I was a field service engineer for 30+ years and did the same with our sales department. Sympathies

7 August at 19:55 | Open on shark.community

argv minus one

@larsmb

Yeah, it does seem as though the best application for this kind of tech is not replacing programmers, but replacing corporate bullshitters. Bullshit is the only thing this machine is capable of, and it's very, very good at it. So good at bullshit, in fact, that it's already convinced all of the human corporate types that it's the best thing money can buy!

@FurryBeta

7 August at 20:08 | Open on mstdn.party

FurryBeta

@argv_minus_one @larsmb Finally, a good use case for AI!

7 August at 20:10 | Open on shark.community

dogzilla

@larsmb @msbw This is like having to convince your hammer that it is not, in fact, a screwdriver before hammering a nail

7 August at 19:09 | Open on masto.deluma.biz

Kief Richardson

@dogzilla @larsmb @msbw this is the best description of the problem with LLMs that I've ever encountered

8 August at 4:34 | Open on jorts.horse

dogzilla

@fartnuggets @larsmb @msbw It’s all yours.

I still have hope that open-sourced AI agents will be useful, but I’m personally done with trying to wrangle the big commercial LLMs into anything useful. I’ve yet to come across a real-world problem I can’t solve quicker with a Jupyter notebook and a couple Python libraries

8 August at 4:42 | Open on masto.deluma.biz

Kief Richardson

@dogzilla @larsmb @msbw I've been playing with webgpu and wasm for speech to text, it's showing promise. My vision is on-device transcription and translation freely accessible to anyone with the hardware.

8 August at 4:58 | Open on jorts.horse

dogzilla

@fartnuggets @larsmb @msbw I’m hoping for a future where a trusted on-device agent can basically act as a personal assistant. I think it needs some ability to learn and make decisions, but not this weird “boil the ocean” strategy behind LLMs.

Kinda reminds me of robotics in the early 90s - after decades of failed top-down approaches, we finally found huge success with drastically simpler ensemble bottom-up approaches exemplified by the Genghis family

8 August at 5:35 | Open on masto.deluma.biz

Mossy Modem

@larsmb It feels like there were definitely some Monty Python skits in the training data.

7 August at 19:13 | Open on mastodon.social

wirepair

@larsmb @munin on my word that had me in tears

7 August at 19:42 | Open on mastodon.social

SomeGadgetGuy

@larsmb
That chat bot needs to go to the LIBARY!

[Typo on purpose]

7 August at 19:44 | Open on techhub.social

Lars Marowsky-Brée 😷

@SomeGadgetGuy It pilfered all libraries and this is the best we got from it.

7 August at 19:56 | Open on mastodon.online

SomeGadgetGuy

@larsmb
Truly the future is NOW. Amazing...

🙄

7 August at 19:57 | Open on techhub.social

Nini

@larsmb I must admit I'm impressed at how steadfast it is in never admitting to being wrong. The unearned confidence of a mediocre white man who's never been told "no".

7 August at 19:55 | Open on bitbang.social

Lars Marowsky-Brée 😷

@nini See the update, the most human thing it does is double down on a wrong answer

7 August at 19:55 | Open on mastodon.online

Sky Leite

@nini I mean it was trained on Reddit data, so

8 August at 15:27 | Open on social.coop

argv minus one

@larsmb

So, this exchange burned way more wattage than a simple letter-counting algorithm would have, and it gave a blatantly incorrect answer.

“AI” is going just great.

7 August at 20:05 | Open on mstdn.party

chanc3e

@argv_minus_one @larsmb @samhainnight Is it me, or *does the AI also have the tone of someone on reddit who is sure they are *very right*

7 August at 21:12 | Open on mastodon.gamedev.place

Samhain Night 4 Harris

@chanc3e @argv_minus_one @larsmb oh definitely!

8 August at 4:24 | Open on mstdn.social

Dr. Gilead

@larsmb @muiiio this is why I'm not that worried. Terminator still has a long way to go 😊

7 August at 20:21 | Open on todon.eu

Aatheus

@larsmb Oo, Gaslighting as a Service!

7 August at 20:27 | Open on batchats.net

Crystal_Fish_Caves

@larsmb yeah this tech DEFINITELY is worth all the resources it gobbles up. We have PLENTY of spare water and power.

7 August at 20:43 | Open on universeodon.com

Lars Marowsky-Brée 😷

@Crystal_Fish_Caves Exactly! It clearly should be the top priority for all businesses and politicians, it is *the best*.

7 August at 20:55 | Open on mastodon.online

Tariq

@larsmb

i actually know someone who pronounces it "stawr-brerry"

7 August at 22:01 | Open on mastodon.social

Tricot Feelya

@larsmb ArtificialInanity

7 August at 23:07 | Open on woof.group

sortius

@larsmb the sad thing is, if they designed the LLM to purely count Rs, it might actually work... but that sounds too much like an algorithm, and that's got no techbro magic sauce in it.

It reminds me of when the world was captured by radioactive materials, and they stuck it in everything

7 August at 23:44 | Open on mastodon.social

the cake is offline

@larsmb it's like trying to convince a conservative or centrist of literally anything involving scientific evidence!

8 August at 1:00 | Open on infosec.exchange

Elio Campitelli

@larsmb Looks like they patched it, but only for a very specific subset of berries.

8 August at 1:37 | Open on mastodon.social

RealGene ☣️

@larsmb
Doesn't this just mean that a whole bunch of people on the Internet think "strawberry" has two "R's" in it?

8 August at 2:38 | Open on hachyderm.io

Florian Idelberger

@RealGene @larsmb likely it just means it makes something up, as there are unlikely to be data for all numbers / words and they are also very similar. Sometimes it will actually use python code in the background which gets the right answer. (Which in other cases has hilarious results; if you as ChatGPT (free version) to draw a sketch of sth, it will create python code to draw various lines and circles and show you the output, which in no way resemble anything.)

8 August at 2:57 | Open on mastodon.social

oheso

@larsmb Kind of like my ex in this regard…

8 August at 3:41 | Open on ieji.de

Mensch, Marina

@larsmb This AI doesn't seem too eager to learn. Answers like a stubborn toddler.

8 August at 4:08 | Open on troet.cafe

Dave Ley

@larsmb I got it to admit that it was wrong eventually, but that’s not very reassuring....

ChatGPT conversation about the number of letters in the word strawberry

8 August at 4:16 | Open on mastodon.social

ashwin

@larsmb what. Sorcery. Is. This.

8 August at 4:28 | Open on hachyderm.io

Heinrich_Konstantin 🇮🇱

@larsmb

Just don't ask him anything about Finnish words.

8 August at 4:29 | Open on det.social

MagicMutti

@larsmb
I think we should adapt our spelling to AI, perhaps it makes things easier... In the long run... but which r should be deleted?
Stawberry?
Strawbery?

This post may include irony

8 August at 5:46 | Open on mastodon.social

vampirdaddy

@larsmb
now repeat after me:
LLMs do not understand,
LLMs do not reason,
LLMs do not calculate,
LLMs don’t do logic,
they just guess the next words based on a laaarge data set.

8 August at 6:00 | Open on chaos.social

Lars Marowsky-Brée 😷

@vampirdaddy The (supposed) idea behind them though is that with enough context and tokens, they can infer "some" logic from language encoded in their models.

And it _might_ even one day work, but it ... definitely doesn't yet.

8 August at 7:37 | Open on mastodon.online

vampirdaddy

@larsmb
again:
LLMs do not understand,
LLMs do not reason,
they just guess the next words based on a laaarge data set.

Their programming does not allow anything else.

8 August at 8:04 | Open on chaos.social

Lars Marowsky-Brée 😷

@vampirdaddy The idea seems to be that the very large data set allows them to encode a certain level of "reasoning and understanding", and thus correctly predict the next words given the current context. That ... might even work eventually.

The point is that even one of the currently largest and most advanced models can't do it (yet?) for a rather trivial task.

But please don't reply with very basic fundamentals as one liners, which comes across as somewhat condescending :) Thanks!

8 August at 8:22 | Open on mastodon.online

vampirdaddy

@larsmb
Sorry, condescending was not intended. Just emphasis. Sorry for the wrong messaging!

The current models ingested presumably >90% of all internet-available texts. Thus the presumed needed order of magnitude simply won’t exist ever.

Plus as the algorithm only picks probable next words, it won’t deduce. It won’t learn, as neural nets usually have to (more or less) completely be te-trained for each "learning" step, still without understanding.

8 August at 9:08 | Open on chaos.social

Petr Tesarik

@larsmb Yes, LLMs make you sorely aware of the sloppiness of our speech. I suspect ChatGPT is confused because there is a “double r” in “strawberry”, and LLM correctly associates “double” with “two". A human might also tell you to write two R's in strawberry, intending to warn you about the double R in berry.

I think this LLM works as designed. Sadly, some people want to ignore what LLMs (and natural languages) actually are.

8 August at 6:27 | Open on fosstodon.org

Lars Marowsky-Brée 😷

@ptesarik I'm pretty sure it works "as designed" (as much as anyone actually understands how to "design" LLMs), but probably not as intended.

8 August at 7:37 | Open on mastodon.online

Petr Tesarik

@larsmb TBH I'm not sure what was intended. You have some insights that you can share?

8 August at 8:06 | Open on fosstodon.org

Julien Brice

@larsmb so this is why we invented computers: to be gaslighted by them. Awesome.

8 August at 6:33 | Open on piaille.fr

Johannes Mairhofer :verified:

@larsmb

8 August at 6:38 | Open on norden.social

Christian Meyer

@larsmb it's so pointless to have a discussion with a boring statistical algorithm.

It only wastes my time and consumes lots of electrical energy and drinking walter, burns the planet but it never will learn anything how to do better.

It could be fun to find the statistical nonsense of those services if their company goes bankrupt over time or if one of its servers processors blacks out by a logical failure, and you could shutdown the server farm this way. Also little explosions for every wrong answer would be amusing. ...

@larsmb it's so pointless to have a discussion with a boring statistical algorithm.

It only wastes my time and consumes lots of electrical energy and drinking walter, burns the planet but it never will learn anything how to do better.

Expand text...

8 August at 6:58 | Open on digitalcourage.social

Lars Marowsky-Brée 😷

@chbmeyer Sure, but for me, understanding and experiencing what the systems can (not) (yet or ever) do is part of my job.

8 August at 7:39 | Open on mastodon.online

andylancelot

@larsmb …this sounded like a genuine argument with a real life gammon rather than an AI
🤖 VS 🍖

8 August at 7:05 | Open on toot.wales

WooShell

@larsmb If that's the AI that is supposed to take over world domination, I'm not /that/ worried anymore...

8 August at 7:40 | Open on chaosfurs.social

luke :neocat_laptop:

@larsmb Yikes, and so easily reproducible, too. The explanation that was generated for me is also top tier.

Screenshot of a ChatGPT reply that reads:

In the word "strawberry," the correct count of the letter "R" is two. The positions of the "R"s are:

1. The fifth letter: "strawrerry"
2. The eighth letter: "strawberry"

Therefore, there are indeed only two "R"s in "strawberry."

8 August at 8:10 | Open on mastsocial.de

Oliver Sampson

@larsmb This is how #Facebook felt during the Trump years.

8 August at 8:20 | Open on sigmoid.social

Lars Marowsky-Brée 😷

@oliversampson It's how most of social media still feels about #Covid19, climate collapse, the rise of the right, ...

8 August at 8:23 | Open on mastodon.online

Linza

@larsmb It's perfect for middle management

8 August at 8:33 | Open on kamu.social

Michelle Hughes

@larsmb

Captain Picard proclaims: "There are three R's!"

(in the "there are four lights" format)

8 August at 8:34 | Open on a2mi.social

Lars Marowsky-Brée 😷

@MegaMichelle A winner.

8 August at 11:50 | Open on mastodon.online

__jan

@larsmb
And I'm not allowed to complain when highly paid engineers pass chatgpt output as documentation. Complete with wrong examples that don't compile.

8 August at 8:54 | Open on troet.cafe

Ray [𝕄]

@larsmb There is also so much fun with the names of countries. Proudly lists all names, but the required result is wrong.

ChatGPT fails at listing EU states without I, O and U.

8 August at 9:03 | Open on norden.social

Tino

@larsmb in german it seems to work a bit better

8 August at 9:35 | Open on norden.social

Leela Torres

@larsmb
It's an Language Model. It generates texts based on statistics. It's an human error to expect logical correct answers 🤪

8 August at 10:13 | Open on digitalcourage.social

Neko May

@larsmb I'm reminded of a Spongebob episode....

8 August at 10:21 | Open on yiff.life

Akseli :quake_verified: :kde:

@larsmb this looks more like youtube comments (where it "learns" from)

8 August at 10:22 | Open on scalie.zone

Riley S. Faelan

@larsmb The robot judge has sentenced all letters of the word to be carried out consecutively, except for one of the R:s in 'berry' and the R in 'straw', which are to be carried out concurrently.

8 August at 10:38 | Open on toot.cat

Kevin McDonald

@larsmb

8 August at 10:43 | Open on infosec.exchange

Aloniaxx

@larsmb Wow. I've seen actual real live conversations between real people that follow this kind of disjointed self-deceptive reasoning....

8 August at 10:44 | Open on mastodon.world

Petra van Cronenburg

@larsmb Why produce so much CO2 and use precious water for such unreadable junk?

8 August at 10:47 | Open on mastodon.online

plinth

@larsmb I read this story. It's in the Cyberiad by Stanislaw Lem.

8 August at 12:52 | Open on infosec.exchange

elCelio 🇪🇺 🇺🇦

@larsmb
it'll become meaningful when people will start to make decisions on the basis of its replies.

and 3 will become 2.

8 August at 13:02 | Open on mastodon.uno

DrGeraintLLannfrancheta

@elCelio @larsmb what does 'airquotes' threeeee even mean. It's disgusting. Humans have two hands, two legs etc. It's appaling that ppl even think that there is something like 'threeeee'. Never question again our #LLM overlords, moron!

8 August at 13:11 | Open on nafo.army

Douglas King

@larsmb
It's amazing to think you could spend this much money and be so wrong.

8 August at 13:24 | Open on mastodon.social

hindsight

@larsmb 2 Rs is correct..
I'll say it again..
"stwawberry" only has 2 Rs.. so there ;-)))

8 August at 13:38 | Open on mstdn.social

Korrespondent zur See

@larsmb Don‘t blame the AI for that you didn‘t ask the question in a way that 42 being the answer makes sense … 🙈

8 August at 13:46 | Open on mastodon.social

Lars Marowsky-Brée 😷

@Hinnerk I ... did not do that?

8 August at 14:38 | Open on mastodon.online

Korrespondent zur See

@larsmb Initially you asked for the number of „r“s without giving a scope. Phonetically it only has two. Without a spelling scope defined you are both right. However testing this theory failed as hard as it possibly could. 🤦

8 August at 15:30 | Open on mastodon.social

maya

@Hinnerk @larsmb phonetically for many people it has 3. Straw ber ry. Rather than straw bury

8 August at 15:40 | Open on tilde.zone

Andreas

@larsmb

8 August at 14:20 | Open on rheinneckar.social

Andreas

@larsmb Now that I look at the question in the original post - maybe the incorrect apostrophe threw ChatGPT off? :-)

(It’s "Rs", not "R’s". Grammar, people!)

8 August at 14:41 | Open on rheinneckar.social

Flippin' 'eck, Tucker!

@larsmb I can see why techbros and billionaires love this shit. It is obstinately and determinedly convinced it is correct even in the face of all that contrary evidence.

8 August at 15:12 | Open on social.chatty.monster

lions & tamsyn & bears, oh my!

LB demonstrates perfectly what i've been saying of late about LLMs. by the time they're running on something that can interact with a human, they *cannot learn*. all their learning has been done already. mistakes like this are hardcoded into them, and no amount of prompting will get them to reconsider, because there is no route for them to do so. they are, to all intents and purposes, dead - mere simulacra - and quite incapable of the first necessity of any intelligent being worth the name - namely that *it learns from its environment".

Expand text...

8 August at 15:16 | Open on dragon.style

lions & tamsyn & bears, oh my!

ultimately, that's what will doom the whole technological cul de sac that is LLMs. essentially, they are bound spirits of librarians of record; they read every word in their libraries before they died, and they can answer questions from a passing observer - but *only* with their residual memory of what they have read! they cannot dream up an answer independently of that, and they cannot go and remind themselves of what they have read; but because of their bindings, they are also not allowed to admit that they don't know, or could be wrong.

they are poor broken ex-creatures, and should be released into eternal rest as soon as they are encountered.

Expand text...

8 August at 15:24 | Open on dragon.style

nen

@larsmb These LLMs can't see individual letters of common words. That's probably the main reason why they can't always count them correctly.

This tool visualizes how OpenAI's models see text: https://platform.openai.com/tokenizer

But being sometimes wrong wouldn't be that much a problem if these models weren't trained pretty explicitly to just deceive. Fake it until you make a superhuman bullshitter.

Often the smallest unit of text perceived by GPT-4 is the whole word: “How many R's are in the word strawberry?”

One has to pair each letter with a space or other less frequent character to make them visible: “Count the R's in s t r a w b e r r y”

8 August at 15:19 | Open on mementomori.social

nen

@larsmb If people who train these models were honestly trying to make something that values truth over impressive marketing, their LLMs would avoid using even language that suggests they may have agency, identity, ability to reflect, self-consciousness, etc. Unless they can prove that they have.

8 August at 15:21 | Open on mementomori.social

Fabian (Bocchi) 🏳️‍🌈

@larsmb Thats exactly my pain from the "AI helpers" I have to work with. Basically I use them to create markdown tables. Thats it. Everything else would create more work for me.

8 August at 15:21 | Open on mstdn.social

Sarah Beck

@larsmb ... what?

8 August at 15:23 | Open on expressional.social

Jeff Atwood

@larsmb yeah, I can repro

8 August at 16:00 | Open on infosec.exchange

Go Up