Email or username:

Password:

Forgot your password?
Top-level
mkj

@eniko

"If we can just get our LLM to stop hallucinating, then we could <whatever>..."

"<whoever>, do you have any idea how a LLM works?"

Yeah. And I get it, it's an easy trap to fall into. Generative AI certainly has a lot of properties that make it an easy trap to fall into. I might have fallen into that trap at some point. Then I spent some time reading an article on how generative AI (specifically LLMs in that case) work.

4 comments
dingodog

@mkj @eniko
Yes! I don't see how an LLM can reason.

I tried feeding a Logic Puzzle from a grocery store puzzle book into CoPilot. I thought that might be a good minimum threshold to show "reasoning," assuming that puzzle doesn't exist online.

It did very poorly, with the response not actually making sense. A friend put it in ChatGPT-4, and it did better -- solving 2 categories but getting the third wrong.

What do you think of logic puzzles as a test?

@mkj @eniko
Yes! I don't see how an LLM can reason.

I tried feeding a Logic Puzzle from a grocery store puzzle book into CoPilot. I thought that might be a good minimum threshold to show "reasoning," assuming that puzzle doesn't exist online.

It did very poorly, with the response not actually making sense. A friend put it in ChatGPT-4, and it did better -- solving 2 categories but getting the third wrong.

mkj

@dingodog19 Logic puzzles as a test for what?

Apple already recently concluded (in a report that got some media coverage, at least in the tech/IT press) that LLMs *cannot* reason logically. Plenty of additional anecdotal examples illustrating that exist. There exist arguments for the same lack of logical reasoning capability which are based in how LLMs function. There should be little need to repeat that unless you have reason to believe that you'll get a significantly different result.

@eniko

dingodog

@mkj @eniko
Oh I agree with all that.
I am suggesting that because of that, they will never be able to do grocery store logic puzzles, which is a good indication to non-experts that AI is not all-knowing.

And if they *did* get good at them, it would be an indication that I should not carry on dismissing AI and should see what the heck was going on.

David Nash

@dingodog19 @mkj @eniko I did precisely that, early in the hype cycle. I gave ‘em simple logic puzzles of a form like “all A’s are B’s and some C’s are D’s. You see an E which is not a B. What else can you tell me about it?” — but with the placeholder names replaced by either fake English words (phonotactically reasonable but not meaningful otherwise) or strings of emoji.

A human with any knowledge of logic could answer “this particular E is not an A” for *any* replacement of the placeholder names, without missing a beat.

ChatGPT 3.5 just babbled nonsense and when it did get the logic right, it was obviously due purely to chance.
ChatGPT 4.0 — you know, the one that gullible journalists just gushed over as “so incredibly smart” — wrote somewhat better nonsense, but still often failed to get the logic right.
Recent (a few months ago) Gemini (GPT 4.0 under the hood) spent a lot of time trying to analyze the emoji for significance, completely losing the thread (consistent with observations that feeding these silicon buffoons lookalike puzzles with a twist causes them to get stuck badly).

@dingodog19 @mkj @eniko I did precisely that, early in the hype cycle. I gave ‘em simple logic puzzles of a form like “all A’s are B’s and some C’s are D’s. You see an E which is not a B. What else can you tell me about it?” — but with the placeholder names replaced by either fake English words (phonotactically reasonable but not meaningful otherwise) or strings of emoji.

Go Up