Do I understand LLMs?

GardenVarietyAnxiety@lemmy.world · 2 months ago

Do I understand LLMs?

AbouBenAdhem@lemmy.world · edit-2 1 month ago

There’s a part of our brain called the salience network, that continually models and predicts our environment and directs our conscious attention to things it can’t predict. When we talk to each other, most of the formal content is predictable, and the salience network filters it out; the unpredictable part that’s left is the actual meaningful part.

LLMs basically recreate the salience network. They continually model and predict the content of a text stream the same way we do—except instead of modeling someone else’s words so they can recognize the unpredictable/meaningful part, they model their own words so they can keep predicting the next ones.

This raises an obvious issue: when our salience networks process the stream of words coming out of such an LLM, it’s all predictable, so our brains tell us there’s no actual message. When AI developers ran into this, they added a feature called “temperature” that basically injects randomness into the generated text—enough to make it unpredictable, but not obvious nonsense—so our salience networks will get fooled into thinking there’s meaningful content.

GardenVarietyAnxiety@lemmy.world · 2 months ago

This was a great read, Thanks!

I have a new rabbit hole to explore 😝

vamp07@lemmy.world · 2 months ago

And our brains probably do something very similar. If llms are not intelligent our brains aren’t either.

GardenVarietyAnxiety@lemmy.world · edit-2 2 months ago

I think you’re right, but it takes more than an LLM to be intelligent. The LLM is one piece of the pie, though

will_a113@lemmy.ml · 2 months ago

The critical thing to remember about LLMs is that they are probabilistic in nature. They don’t know facts, they don’t reason, they don’t evaluate. All they do is take your input string, split that string into tokens that are about 3-4 characters long, and then go back into their vast, vast, pretrained database and say “I have this series of tokens. In the past when similar sets of tokens were given, what were the tokens that were most likely to be associated with them?” It will then construct the output string one token at-a-time (more sophisticated models can do multiple tokens at once so that words, phrases and sentences might hang together better) until the output is complete (the probability of the next token being relevant drops below some threshold value) or your output limit is reached.