@mkj @eniko Yes! I don't see how an LLM can reason....

Eniko | Kitsune Tails out now!'s posts Post Back to profile

@mkj @eniko
Yes! I don't see how an LLM can reason.

I tried feeding a Logic Puzzle from a grocery store puzzle book into CoPilot. I thought that might be a good minimum threshold to show "reasoning," assuming that puzzle doesn't exist online.

It did very poorly, with the response not actually making sense. A friend put it in ChatGPT-4, and it did better -- solving 2 categories but getting the third wrong.

What do you think of logic puzzles as a test?

Like 4 November at 15:39 | Wall-to-wall | Open on sfba.social

3 comments

mkj

@dingodog19 Logic puzzles as a test for what?

Apple already recently concluded (in a report that got some media coverage, at least in the tech/IT press) that LLMs *cannot* reason logically. Plenty of additional anecdotal examples illustrating that exist. There exist arguments for the same lack of logical reasoning capability which are based in how LLMs function. There should be little need to repeat that unless you have reason to believe that you'll get a significantly different result.

@eniko

4 November at 15:45 | Open on social.mkj.earth

dingodog

@mkj @eniko
Oh I agree with all that.
I am suggesting that because of that, they will never be able to do grocery store logic puzzles, which is a good indication to non-experts that AI is not all-knowing.

And if they *did* get good at them, it would be an indication that I should not carry on dismissing AI and should see what the heck was going on.

4 November at 16:04 | Open on sfba.social

David Nash

@dingodog19 @mkj @eniko I did precisely that, early in the hype cycle. I gave ‘em simple logic puzzles of a form like “all A’s are B’s and some C’s are D’s. You see an E which is not a B. What else can you tell me about it?” — but with the placeholder names replaced by either fake English words (phonotactically reasonable but not meaningful otherwise) or strings of emoji.

A human with any knowledge of logic could answer “this particular E is not an A” for *any* replacement of the placeholder names, without missing a beat.

ChatGPT 3.5 just babbled nonsense and when it did get the logic right, it was obviously due purely to chance.
ChatGPT 4.0 — you know, the one that gullible journalists just gushed over as “so incredibly smart” — wrote somewhat better nonsense, but still often failed to get the logic right.
Recent (a few months ago) Gemini (GPT 4.0 under the hood) spent a lot of time trying to analyze the emoji for significance, completely losing the thread (consistent with observations that feeding these silicon buffoons lookalike puzzles with a twist causes them to get stuck badly).

Expand text...

4 November at 16:28 | Open on c.im