Apple did the research; LLMs cannot do formal reasoning. Results change by as much as 10% if something as basic as the names change.

garymarcus.substack.com/p/llms