@sinedpick

sinedpick@awful.systems · edit-2 5 months ago

Thanks for the suggestions. The LLM is free to use (for now) so I thought I’d poke it and see how much I should actually be paying attention to these things this time around.

Here are its answers. I can’t figure out how to share chats from this god-awful garbage UI so you’ll just have to trust me or try it yourself.

It gives the correct but unnecessary answer: “If I were to ask you which door leads to freedom, which door would you point to?” It also mentions a lying guard but also acknowledges that it’s absent from this specific problem.
“A table or a chair”
Completely fails on this one, it missed the sentence “Everyone knows the color of their eyes”
Not sure what to do with this
“While a Hadamard matrix of order 2672 might exist, its existence isn’t immediately provable using the most common constructions” – I won’t pretend to know anything about the Hadamard conjecture if that’s a real thing so I have no idea what it’s on about here.

edit: I didn’t do any prompt engineering, just straight copy paste.

sinedpick@awful.systems · 5 months ago

I tried using Claude 3.5 sonnet and … it’s actually not bad. Can someone please come up with a simple logic puzzle that it abysmally fails on so I can feel better? It passed the “nonsense river challenge” and the “how many sisters does the brother have” tests, both of which fooled gpt4.