It's great that we now have a paper to point to, but at no point did LLMs include any sort of reasoning capability. They are empty words, so this doesn't surprise me too much.
It's great that we now have a paper to point to, but at no point did LLMs include any sort of reasoning capability. They are empty words, so this doesn't surprise me too much.
I'm not going to blame anyone for being sick to death of the marketing hype and reacting negatively based on the saturation of unrealistic bullshit that's being pumped out, but I think it's...
Exemplary
I'm not going to blame anyone for being sick to death of the marketing hype and reacting negatively based on the saturation of unrealistic bullshit that's being pumped out, but I think it's important not to overcorrect because of that.
These models demonstrably do show the ability to follow deductive reasoning steps. It's imperfect, it's brittle, it's often inconsistent - all helpfully quantified by this paper - but it's also the comparatively early days of the field. The fact that language modelling does expand into logical problem solving strikes me as a fundamentally important factor in what this tech will and won't change once the VC dust settles, and them being not-great at it right now is quite different to not being capable at all.
My experience is that dismissals like "empty words" or "glorified autocorrect" are almost as far from the truth as the breathless commercial fluff about <insert important human position> now being totally redundant thanks to ChatGPT. Underestimating the genuine technical capabilities and what they might imply feels a bit like the skepticism around this newfangled internet thing at the time of the pets.com collapse - entirely justified based on some of the ridiculous claims and ridiculous behaviours we're seeing from companies in the space, but at risk of missing the very real fundamentals as a result.
An LLM's "ability to follow deductive reasoning steps" is purely performative - an LLM does not understand anything at a conceptual level and can only output what it determines to be of highest...
An LLM's "ability to follow deductive reasoning steps" is purely performative - an LLM does not understand anything at a conceptual level and can only output what it determines to be of highest probability based on its training data, and does so on a token-by-token basis where it is not following a true logical path. There is no "thought process".
LLMs are not currently designed to be logical or even correct, they are fundamentally designed to be as linguistically coherent as possible. Anything else is secondary to that.
Assuming it doesn't work by pure coincidence, we have to call its capability to often work step-by-step to get right answers or generate working code for novel situations something, and it's not...
Assuming it doesn't work by pure coincidence, we have to call its capability to often work step-by-step to get right answers or generate working code for novel situations something, and it's not like "reasoning" is a term defined so rigorously to obviously exclude what we see here. The term being used here doesn't inherently mean that there's a rich inner life or inner monologue going on in the model.
Interesting paper - the core focus seems to be an automated, mathematically provable approach to measuring models’ performance on logic tasks. That’s got some real potential value in that it’s a...
Interesting paper - the core focus seems to be an automated, mathematically provable approach to measuring models’ performance on logic tasks. That’s got some real potential value in that it’s a small step from measuring the objective to training for the objective. A nice feature of ML models is that if you can quantify how bad they are at something, you can often just turn that around and use let them target that metric for improvement.
I don’t love how the conflate LLMs with transformers in general: it originally made me think they were being a bit misleading and had done the taxi experiments on a model more trained for mapping than language, but I think it’s more just the way they’re using the term. The actual results do seem like they’ll have some worthwhile general application.
It's great that we now have a paper to point to, but at no point did LLMs include any sort of reasoning capability. They are empty words, so this doesn't surprise me too much.
I'm not going to blame anyone for being sick to death of the marketing hype and reacting negatively based on the saturation of unrealistic bullshit that's being pumped out, but I think it's important not to overcorrect because of that.
These models demonstrably do show the ability to follow deductive reasoning steps. It's imperfect, it's brittle, it's often inconsistent - all helpfully quantified by this paper - but it's also the comparatively early days of the field. The fact that language modelling does expand into logical problem solving strikes me as a fundamentally important factor in what this tech will and won't change once the VC dust settles, and them being not-great at it right now is quite different to not being capable at all.
My experience is that dismissals like "empty words" or "glorified autocorrect" are almost as far from the truth as the breathless commercial fluff about <insert important human position> now being totally redundant thanks to ChatGPT. Underestimating the genuine technical capabilities and what they might imply feels a bit like the skepticism around this newfangled internet thing at the time of the pets.com collapse - entirely justified based on some of the ridiculous claims and ridiculous behaviours we're seeing from companies in the space, but at risk of missing the very real fundamentals as a result.
An LLM's "ability to follow deductive reasoning steps" is purely performative - an LLM does not understand anything at a conceptual level and can only output what it determines to be of highest probability based on its training data, and does so on a token-by-token basis where it is not following a true logical path. There is no "thought process".
LLMs are not currently designed to be logical or even correct, they are fundamentally designed to be as linguistically coherent as possible. Anything else is secondary to that.
To your point, Apple's recent paper on this is enlightening, for those that have not already read it. (PDF) These models are fundamentally limited.
Assuming it doesn't work by pure coincidence, we have to call its capability to often work step-by-step to get right answers or generate working code for novel situations something, and it's not like "reasoning" is a term defined so rigorously to obviously exclude what we see here. The term being used here doesn't inherently mean that there's a rich inner life or inner monologue going on in the model.
Interesting paper - the core focus seems to be an automated, mathematically provable approach to measuring models’ performance on logic tasks. That’s got some real potential value in that it’s a small step from measuring the objective to training for the objective. A nice feature of ML models is that if you can quantify how bad they are at something, you can often just turn that around and use let them target that metric for improvement.
I don’t love how the conflate LLMs with transformers in general: it originally made me think they were being a bit misleading and had done the taxi experiments on a model more trained for mapping than language, but I think it’s more just the way they’re using the term. The actual results do seem like they’ll have some worthwhile general application.