Email or username:

Password:

Forgot your password?
jonny (good kind)

Helping someone debug something, said they asked chatgpt about what a series of bit shift operations were doing. He thought it was actually evaluating the code, yno like it presents itself as doing. Instead its example was a) not the code he put in, with b) incorrect annotations, and c) even more incorrect sample outputs. Has been doing this all day and had just started considering maybe chatGPT was wrong.

I was like first of all never do that again, and explained how chatGPT wasnt doing anything like what he thought it was doing. We spent 2 minutes isolating that code, printing out the bit string after each operation, and he immediately understood what was going on.

I fucking hate these LLMs. Empowerment is learning how to figure things out, how to make tools for yourself and how to debug problems. These things are worse than disempowering, teaching people to be dependent on something that teaches them bullshit.

Edit: too many ppl reading this as "this person bad at programming" - not what I meant. Criticism is of deceptive presentation of LLMs.

87 comments
DELETED

@jonny everyone needs to understand one important thing about LLMs:

garbage in, garbage out

answering questions with beautiful soothing words and content free bs

Level 98

@accretionist @jonny In this sense, I feel like LLM is short for "politician".

DELETED

@level98 @jonny they talk about all these careers that will be replaced by chatgpt. Why don't we just replace politicians with chatgpt. It would be an upgrade in the sense being lied to is coming from a soulless machine rather than a... soulless machine. Nevermind.

TheZoq2

@accretionist @jonny isn't it more like well formed question in, garbage out in this case?

takin' a break

@jonny I had the great joy of seeing some folks at MIT present on using LLMs as they're meant to be used, which is to say as interfaces to more complex computational processes that just, convert otherwise difficult-to-parse data into more human-friendly language

god i hope the popular consciousness realizes that's what they're supposed to be sooner rather than later

Bee O'Problem

@juliana @jonny in theory the OP's scenario is just that. Taking data (code) and expressing it in something more understandable for a person.

And it apparently failed quite badly.

takin' a break

@beeoproblem @jonny sorry, you don't seem to have understood what I'm saying. this isn't a situation where an LLM is acting on language alone and manipulating it to sound like it's producing human utterances; it's a situation where an LLM is being handed very carefully-controlled data that has been processed elsewhere and is simply being asked to express that in a more human-friendly way. so in this case rather than trying to convolute code itself, the LLM would instead pass it off to some sort of specially-trained neural network designed to, say, perform and analyze bitwise operations, then the LLM would present the results and explain what that network did

difficult to understand without the diagrams, i guess

also advice for the future: don't come into people's replies and condescend to them. that's rude.

@beeoproblem @jonny sorry, you don't seem to have understood what I'm saying. this isn't a situation where an LLM is acting on language alone and manipulating it to sound like it's producing human utterances; it's a situation where an LLM is being handed very carefully-controlled data that has been processed elsewhere and is simply being asked to express that in a more human-friendly way. so in this case rather than trying to convolute code itself, the LLM would instead pass it off to some sort of...

Elizabeth

@juliana @jonny out of interest, if use of LLMs were limited to this, would the energy demand per request be less, or is it just the number of requests would substantially decrease? (Does that question even make sense?)

takin' a break

@luminesce @jonny the energy consumption per demand would likely increase because there's more overall computation involved in the process. the LLM itself would be only one step in an arbitrarily complex chain of queries depending on the precise application

from what i understand, the primary energy consumption of LLMs comes from the training process, but i don't know the full details

the specific application i was seeing discussed was to create a way for doctors to use human language to query medical records and collate data to facilitate diagnosis. so while it is important to keep in mind the environmental impact of using LLMs, it's also important to weigh that against potential benefits as well.

frankly, i'd rather we weren't using LLMs at all, but i figure this is one of the less bad ways they can be used

@luminesce @jonny the energy consumption per demand would likely increase because there's more overall computation involved in the process. the LLM itself would be only one step in an arbitrarily complex chain of queries depending on the precise application

from what i understand, the primary energy consumption of LLMs comes from the training process, but i don't know the full details

jonny (good kind)

@juliana
@luminesce
Yes totally^ it depends on the application re: energy use. Chaining a bunch of models together would probably use more energy, but google et al want to use smaller models hooked up to knowledge graphs to make runtime inference feasible as a consumer product too, so that kind of application would be designed to use less.

The medical case has its own set of fun complications 💓
jon-e.net/surveillance-graphs/

@juliana
@luminesce
Yes totally^ it depends on the application re: energy use. Chaining a bunch of models together would probably use more energy, but google et al want to use smaller models hooked up to knowledge graphs to make runtime inference feasible as a consumer product too, so that kind of application would be designed to use less.

takin' a break

@jonny bestie how sad are you that xanadu never happened lol

jonny (good kind)

@juliana
Like on the scale of 0 to semantic web, about an 8 lol

jonny (good kind)

@juliana
Anytime Xanadu comes up I also think of @beka_valentine 's threads where I first learned why it would have been so good.

Elizabeth

@juliana @jonny yes, the key to limiting the environmental impacts is limiting use cases to those that are truly beneficial, rather than generating demand for trivial or mis-leading/chaos-inducing uses

jonny (good kind)

@luminesce
@juliana
Thats one of the reasons they're pursuing it, yes. Smaller models that can be conditioned on factual information/formal, domain specific models. Eg. See
arxiv.org/abs/2203.05115

Unfortunately this kind of application has its own kind of Really Bad Outcomes that the AI critical folks largely have not caught up to yet :(, see

jon-e.net/surveillance-graphs/

The tech could be really cool. The problem, as with everything, is capitalism.

@luminesce
@juliana
Thats one of the reasons they're pursuing it, yes. Smaller models that can be conditioned on factual information/formal, domain specific models. Eg. See
arxiv.org/abs/2203.05115

Unfortunately this kind of application has its own kind of Really Bad Outcomes that the AI critical folks largely have not caught up to yet :(, see

PiTau

@jonny My experience with ChatGPT as an aid in programming is that you end up tutoring ChatGPT no matter the language. Also ChatGPT is exquisitely bad at SQL. Maybe I should check how bad it is at VHDL. There is a potential for a very bad code here.

jonny (good kind)

@PiTau
Plz do show me any extremely cursed FPGA code u come up with lol

Bornach

@PiTau @jonny
#LargeLanguageModels are exquisitely bad at anything for which there is very little human curated training data. I asked GPT-4 via #BingChat to generate Vult-DSP for a fairly basic MIDI synthesizer and it very confidently spat out nonsense

shrimp eating mammal 🦐

@jonny how did he ever come to believe that chatgpt was capable of doing something like that? it's absurd to think that!

shrimp eating mammal 🦐

@jonny OK but you don't know that till after you've observed it awhile and i guess what I'm wondering is where one would get the idea in the first place? from someone else? from the internet? there's a stunningly wide gap in understanding if a person thinks chatgpt is evaluating code, both of chatgpt and of coding! how does that happen to a person? I'm fully failing to understand 😭

jonny (good kind)

@walruslifestyle
Their mental model is that "I can talk to this thing, when I give it some code it knows what that code is and can tell me about it in the same way that it seems to tell me about lots of things," and they are not so naïve in my opinion because products like copilot do advertise themselves as understanding code, so thinking the LLM is actually parsing it and reasoning about it rather than generating plausible text from some seed vector in its latent space is reasonable enough to me.

jonny (good kind)

@walruslifestyle
I dont disagree if u know a little bit about how these things work its ridiculous, but he is just following everything he's been told about what they can do!

Bornach

@jonny @walruslifestyle
Especially given the flood of YouTube videos demonstrating ChatGPT solving coding problems in minutes
youtu.be/jwpja9fcqaM

The implied assumption is that humans with their very limited ability to memorize answers would have to understand code in order to arrive at the correct answer. We apply that same assumption to LLMs at our own peril. Surely it couldn't have simply memorized all the answers and is simply applying pattern matching to generate the answer, right?

Alexander The 1st

@walruslifestyle @jonny There was a LinkedIn post by what as I recall was a promoted post, where someone was measuring ChatGPT versions based on the metric of "Code sample received from asking ChatGPT compiled on first try.".

Which is a pretty terrifying metric, even before you remember most programmers panic or joke about panicking when their code works the first try, or doesn't give an obvious error.

Cavyherd

@jonny

Oh dear god. I have a programmer coworker who just loves these LLMs for "coding assistance." I wonder how much time he spends banging his head against the wall fighting AI hallucinations.

Alistair K

@jonny so ... how did one become a code debugger, yet somehow come under the impression that ChatGPT isn't a LLM?

There seems to be an interesting story to uncover there. It genuinely intrigues me that a programmer would think that an LLM evaluates code for you, or that it isn't a LLM.

If we could figure out why a programmer thinks like that, maybe we could find a clue that'll help to rescue the rest of us.

Alistair K

@jonny It is indeed reasonable! But somehow the illusion is outweighing the facts about how LLMs work internally.

I've been concerned that much of the debunking is also targetted not at the LLM, but at the anthromorphism – I have colleagues who warn about how it "hallucinates" and "fabricates" and "lies", for instance. But it's not capable of any of that in the usual meanings of those words. And thus I worry that their language choices are making the problem worse.

jonny (good kind)

@libroraptor
I asked them about this! They do indeed see it as a tool, and thought it was doing a semantic code analysis, not running it per se, but something like static code analysis. Which again is I think reasonable because their IDE was showing tooltips with the value of the variable, at least the initial assignment, so why wouldnt the chatbot be able to do that?

Alistair K

@jonny I think that it's a brilliant tool. (And my colleagues do not like me to say this.)

But what does your programmer think LLMs do?

I offered a different conceptualisation to my colleagues by giving them Markov chains to play with, but they seemed to think even random prose generators were still creative, thinking agents, albeit of a less intelligent form.

I've been finding also that hardly anyone who complains about AI knows what a huge class of things it is. Language is troubling.

David Gerard

@libroraptor @jonny the term "artificial intelligence" has been marketing jargon since it was coined in 1954, it's never referred to any specific technology - it's selling the dream of your plastic pal who's fun to be with, especially when you don't have to pay him

Cheeseness

@jonny I had an interaction a few weeks back where somebody had asked in a public channel how to do something with ffmpeg. Another person gave a solution with a list of flags that was garbage, and only admitted that they'd "just fed that to the ol' GPT" after being told it didn't work.

When I called them out on it maybe not being a good idea to dispense LLM synthesised text as advice, they seemed surprised that a) the output wasn't helpful, and b) that what they'd done was objectionable.

jonny (good kind)

@Cheeseness I think there's a genuine mental model disconnect where a (surprisingly to me) reasonably large number of people that aren't in any sort of "mega pro-AI" ideological camp, but just regular tool using ppl don't see it as being any different from any other information source like stackexchange or wikipedia.

jonny (good kind)

@Cheeseness like, they would never imagine themselves as actually logging in and writing anything down on any of those other websites, so their social reality is totally meaningless. they just are always there and have always been there. having to type a question in chatGPT is just about as different as typing in a question form for ask jeeves is from typing in an abbreviated keyword imperative form for google

Cheeseness

@jonny 100% agree.

It also makes me a little uncomfortable with the role of Wikipedia or StackExchange within culture in terms of critical thinking vs just taking whatever at face value without any thought or consideration.

jonny (good kind)

@Cheeseness for the public record in case of any federation weirdness i typed this at the exact same time, mind twinning neuromatch.social/@jonny/11132

Cheeseness

@jonny Both messages show the exact same timestamp at my end :D

jonny (good kind)

@Cheeseness oh wait i forgot i mentioned stackexchange and wikipedia in the first post and thought we had both come up with those as examples of informational institutions we were uncomfortable with for this exact reason, so not as one in a million as i was thinking but still yes mind twins

Selena

@Cheeseness @jonny
There is something like overdoing critical thinking: 'question everything' sounds good in theory, but I'd much rather work with someone who believes wikipedia than someone who wants to constantly weigh it against other evidence (evidence from Facebook or Quora)

ChatGPT is a bit like StackOverflow in that it will turn up a lot of bullshit and half-truths and it's up to the user to sift through that and find the thing that's potentially useful: you usually can't just copy-past

Cheeseness

@Selena @jonny To be clear about where I'm coming from, I'd be wary of working with anybody who doesn't even think of glancing over cited sources for further reading when processing the content of a Wikipedia article. Wikipedia exists to summarise knowledge, and observing others assume that there isn't more to think about/learn is what makes me uncomfortable.

It's not quite "question everything" so much as "be interested/engaged with the stuff one is discovering."

Cheeseness

@Selena @jonny For synthesised text/images/whatever else, I can't imagine finding interest or value in it if I can't delve into the training corpus and think about how the nuances of that are reflected in the output.

Bornach

@Cheeseness @jonny
Giving people the ChatGPT answer without verifying it is a bit like doing the LMGTFY thing in total ignorance of filter bubbles and SEO spam

Jeremy List

@jonny on one of my projects I ended up using one of the unit testing frameworks that ChatGPT named when I asked it for suggestions but beyond that it's been about as helpful to me as a rubber duck or possibly an initially blank notes.txt.

000panther

@jonny also shows that naming this models AI suggests they do more than they can. My father, working at a biotech company, also asked a LLM to name the top 5 plasma products his company produces. He was astonished how wrong the answer was: 2 non plasma products, 3 from another company, one correct. Tried to explain to him that this stuff basically works like autocomplete - not fully correct but the analogy fits IMHO.

DeManiak 🇿🇦 🐧

@jonny my only regret is not being able to boost more than once

DELETED

@jonny If he doesn't understand what the bit shift operations are doing, why does he program! Let him go back to school.

gaytabase

@jonny well i can believe they're bad at programming tbh if they're learning off chatgpt

jonny (good kind)

@dysfun
It presents itself as being able to explain code! Its a reasonable assumption to make

gaytabase

@jonny oh i agree. the blame is with chatgpt obviously.

gaytabase

@jonny yes. even friends of mine are using it for stuff that matters and i'm cringeing hard for them

Nyek London

@jonny I've used LLM a handful of times when I've been absolutely, positively stuck on an issue, and it's helped a bit. But the thing is, you *have* to go in knowing there's a very, very high chance it's going to be either slightly or completely wrong. Therefore you should only use it to help reframe the issue in your mind, even if once in a while it does happen across a reasonably correct answer.

It's basically akin rubberducking with an overly confident friend.

Nyek London

@jonny But yeah, you need to have the skill to identify the potential issues, which makes it a horrible, HORRIBLE tool for people who are just learning.

We need to teach people it's a last-ditch resort, not a first point of call.

tante

@jonny Even people who should know better underestimate just how bad the results of those systems are: dl.acm.org/doi/10.1145/3558489

Quote: "Our results suggest that GitHub Copilot was able to generate valid code with a 91.5% success rate. In terms of code correctness, out of 164 problems, 47 (28.7%) were correctly, while 84 (51.2%) were partially correctly, and 33 (20.1%) were incorrectly generated."

More than 70% of the code generated is at least partly flawed.

Agonio

@jonny now, sure people rely on it so clearly there is an issue, but there's a disclaimer as you open it: "Chatgpt may give inaccurate information. It's not intended to give advice". Then under the message box: "Chatgpt can make mistakes. Verify important information"

What else should they do, in your opinion, to make users understand that they shouldn't rely on it this way?

David Gerard

@Agonio @jonny stop hyping up "AI" the way they do. the hype is a media barrage of egregious lies, and a couple of disclaimers don't cut it.

Agonio

@davidgerard @jonny yeah the media also mistakes AI for robotics and vice versa

Still research cannot depend on what the media does, and people use chatgpt because it's fascinating, not simply because the media pointed them to it, otherwise people would be playing those triple-A videogames that have high media rating and then people review as shit

Bornach

@Agonio @jonny
How about release a #SciComm Youtube video on a channel with over 600K subscribers that relies on ChatGPT to make some key calculation
youtu.be/5lDSSgHG4q0?t=15m18s
and then include its answer without verification even though it is out by more than 12%

And maybe a follow-up video where placing unquestioning trust in the #LargeLanguageModel to generate to correct engineering parameters results in the project failing

Agonio

@bornach @jonny again, what should OpenAI do about its tools being used wrongly according to their own disclaimer?
Another person using them wrong doesn't really answer my question, as I agreed many use it wrongly

Bornach

@Agonio @jonny
OpenAI should collaborate with #SciComm content creators such as Plasma Channel to produce videos that highlight how using their LLM in such a laissez-faire manner could result in disaster.

But they wouldn't do that as that would negatively affect the valuation of their company in any future IPO.

Richard Kay

@jonny When programming anything interesting I always seem to spend much more time testing than coding. If this gets too repetitive, I have to figure out how to automate some of the tests. Eventually the whole thing needs rewriting due to changes in language support e.g. the change from Python 2 to Python 3. ChatGPT seems more suited to kicking out the bottom rungs of the learning ladder in this respect than a genuine productivity improvement.

immibis
@jonny the rise of LLMs for programming will be great for hackers
Nazo

@jonny It's sad really. LLMs do have a lot of potential. For example, the basic mechanism was just used as part of a multi-pronged effort to figure out how some bacteria are able to go into a sort of hibernation and survive extreme conditions for decades or longer. This may come in handy for finding a way to stop harmful ones. They can be useful when used right. It's really sad watching them be used wrongly instead. ChatGPT and similar just aren't AI and need to stop pretending to be.

серафими многоꙮчитїи

@jonny I'm reminded of a principle I heard in the field of technical translation: you shouldn't try to translate something you couldn't have written (I can't remember the exact phrasing).

Perhaps the corollary is that there's no point in asking an LLM a question that you couldn't answer yourself. Which puts a pretty good upper bound on the usefulness of LLMs, tbh.

Purple

@jonny Playing with ChatGPT in the past, it even began to make up words that didn't exist. I never told it to do so.

Sesquipedality

@jonny LLMs can really help when you are trying to get up to speed with a particular interface, so long as you're aware they lie to you. If you are able to identify the lies and tell them about them, then sometimes they might even generate usable code. The problem comes as soon as you start trying to do something a little bit unusual, and the LLM steps up the pace from "bullshit" to "delusional fantasy".

Absolutely agree with the problem being the dishonest presentation of results.

Lars Wirzenius

@jonny I want to raise this point in particular: "Empowerment is learning how to figure things out, how to make tools for yourself and how to debug problems."

neuromatch.social/@jonny/11132

Benny

@jonny I develop Software for about 15 years as a profession, and this summer was the first I consulted Chatgpt for suggesting me what was needed to implement a desired feature.

After that I took it aside and looked by my self how to use and incorporate the suggested parts.

My intention was not to copy and paste a solution but to retrieve the information where to start more quickly.

It's just a tool.
You still need to figure out things and validate it's outputs like banter talk.

@jonny I develop Software for about 15 years as a profession, and this summer was the first I consulted Chatgpt for suggesting me what was needed to implement a desired feature.

After that I took it aside and looked by my self how to use and incorporate the suggested parts.

My intention was not to copy and paste a solution but to retrieve the information where to start more quickly.

Dave T-W

@jonny I think I'm not using it much as results are rarely useful straight away. There's extra work required to verify the results, or work out what's missing from the solution offered... so I might as well have not bothered wasting my time and the CPU cycles.

Bela Lugosi's Dad

@jonny it's incredible to me that Certified Smart People I work with (you know, PhDs, experience in research, scientific publications, distinguished academic careers) have utterly bought into LLMs and seem to have no understanding of what they actually do.

AdeptVeritatis

@jonny

Thanks for your post and a special thank for calling out the gatekeeping (in the edit).

Androcat

@jonny I hate these LLMs, and I hate the in-industry hypemonkeys that lend their clout to the absurd misconception that LLMs are any kind of intelligence, or that they are actually doing anythign else than autocorrect applied at scale.

Trip

@jonny My evaluation has ultimately been that the fundamental problem with these LLMs, at least in terms of the output the give, is that they are designed to give a satisfying answer to whatever is posed to them, even if they can't. So rather than say "I can't answer that" it will instead just invent something that sounds good. Because it may not know the answer, but it damn well knows what an answer *looks like*, and appearing to answer is preferable to giving a disappointing result.

Go Up