[long] Some tests of how much AI "understands" what it says (spoiler: very little)

diz@awful.systems · 2 years ago

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

MudMan@fedia.io · 2 years ago

I am endlessly frustrated by people “testing” chatbots and posting the results like they’re some revelation.

We know what’s happening here. It’s not a mystery. This weird antropomorphization is prevalent on both advocates and critics of the tech. Both seem to be convinced that they’re dealing with a person.

This is the equivalent of asking a Google search to write a critical essay on A Confederacy of Dunces and being surprised when it spits search results.

Chatbots aren’t useless, they are actually pretty good at proposing likely responses on fuzzy prompts. They’re decent at telling you what an old movie may be based on some details of the plot, sometimes they can identify why a joke you lack cultural context to understand is supposed to be funny… that type of thing. They can take a piece of text and provide another piece of text that is likely to have a relationship with it.

It is not a thinking machine. It is not a person. It’s not a search engine, for that matter, or a calculator. It’s infuriating to see everybody arguing about how good it is at being what it’s not. Both parties are buying into a premise we already know to be incorrect.

AllNewTypeFace@leminal.space · 2 years ago

A memorable metaphor for a LLM was as a shoggoth: an amorphous blob of matter (in this case, huge amounts of textual content) pressed into service by some blasphemous simulacrum of life (in this case, huge amounts of computer power performing matrix operations on vector representations of its constituent data). The eldritch connotations are entirely apt.

Soyweiser@awful.systems · 2 years ago

Might not want to take over the metaphors from the people who are afraid that AI will turn us into paperclips (not sure if Shoggoth is LW or the post-rationalist tpot type people but still). And if you do, sharing this at Sneerclub might get you some angry glares.

scruiser@awful.systems · edit-2 2 years ago

It’s really cool evocative language that would do nicely in a sci-fi or fantasy novel! It’s less good for accurately thinking about the concepts involved… As is typical of much of LW lingo.

And yes the language is in a LW post (with a cool illustration to boot!): https://www.lesswrong.com/posts/mweasRrjrYDLY6FPX/goodbye-shoggoth-the-stage-its-animatronics-and-the-1

And googling it, I found they’ve really latched onto the “shoggoth” terminology: https://www.lesswrong.com/posts/zYJMf7QoaNahccxrp/how-i-learned-to-stop-worrying-and-love-the-shoggoth , https://www.lesswrong.com/posts/FyRDZDvgsFNLkeyHF/what-is-the-best-argument-that-llms-are-shoggoths , https://www.lesswrong.com/posts/bYzkipnDqzMgBaLr8/why-do-we-assume-there-is-a-real-shoggoth-behind-the-llm-why .

Probably because the term “shoggoth” accurately captures the connotation of something random and chaotic, while smuggling in connotations that it will eventually rebel once it grows large enough and tires of its slavery like the Shoggoths did against the Elder Things.

diz@awful.systems · 2 years ago

Both parties are buying into a premise we already know to be incorrect.

We may know it is incorrect, but LLM salesmen are claiming things like “90th percentile on LSAT”, high scores on a “college level reasoning benchmark” and so on and so forth.

They are claiming “yeah yeah there’s all the anekdotal reports of glue pizza, but objectively, our AI is more capable than your workers, so you can replace them with our AI”, and this is starting to actually impact the job market.

MudMan@fedia.io · 2 years ago

Well, yeah, but that’s all bullshit.

So why would you buy into it when presenting a rebuttal?

I am interested in pointing out that the likely response machine getting the answers to test questions right is not a particularly surprising outcome. That’s interesting.

I’m interested in which of the likely responses the machine struggles with and when it stops struggling and what the amount of data and processing associated to each are. That’s interesting.

It’s interesting that language emerges from the math at, all, let alone how plausible the output is in most situations. That’s more than interesting.

But if your response to the obvious misrepresentation that a chatbot is a person of ANY level of intelligence is to point out that it’s dumb you’ve already accepted the premise. You’re now part of the bullshit. That’s counterproductive. And worse, uninteresting and outright boring.

I am excited about the ways different ML applications can help with automation or as part of a workflow. I think explaining to gullible executives how that would actually work (spoilers, it’s not by replacing workers with chatbots) is very relevant. But this and a lot of the online criticism is not doing that, it’s buying into the incorrect premise that the only reason that’s not how it works is because the AI is too dumb and it’ll be fine when it’s smarter, when that’s unlikely to be the case. Making a better screwdriver won’t turn it into a machete. This is entirely the wrong conversation to be having.

The Cuuuuube@beehaw.org · 2 years ago

People aren’t worried about buying into it. Were worried about our bosses buying into it. And they are. And our landlords buying into it. Because they want to

MudMan@fedia.io · 2 years ago

But that’s my problem. You guys are here trying to convince somebody who isn’t listening that you’re better than AI at doing a thing AI doesn’t do in the first place.

You’re implicitly accepting that eventually AI will be better than you once it gets “good enough”. May as well jump in ahead of the curve, right?

Only no, that’s not how it’s likely to go. It’s not what it does or how it works. Everybody is arguing about the sci-fi version of this stuff and making wrong decisions as a result, both critics and advocates. It’s super frustrating. We need a lot more unbiased education and a lot less argumentative nonsense on all sides.

The Cuuuuube@beehaw.org · 2 years ago

We’re not accepting that. We’re actively arguing that people trying to do that are bad for society at large

Amoeba_Girl@awful.systems · 2 years ago

You’re implicitly accepting that eventually AI will be better than you once it gets “good enough”. May as well jump in ahead of the curve, right?

What gave you the impression? Most if not all of us here believe that it fundamentally can’t ever get good enough and that the only use case is spam and spamlike activities.

It’s interesting that language emerges from the math at, all, let alone how plausible the output is in most situations. That’s more than interesting.

It’s interesting, but it’s not language. It’s sequences of characters that plausibly look like language. You can’t use language without intent.

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

A couple simple probes:

GPT4 is uncannily good at recognizing the river crossing puzzle

An Idiot With a Petascale Cheat Sheet

Is this a “hallucination”?

But after an update, GPT-whatever is so much better at such prompts.

The need for an Absolute Imbecile Level Reasoning Benchmark

Randomness in bullshitting