I don't disagree with the article's premise, at all, but I do find it entertaining that throughout the history of CS/AI/ML there's this pattern of "this thing is hard for a computer to do and...
I don't disagree with the article's premise, at all, but I do find it entertaining that throughout the history of CS/AI/ML there's this pattern of
"this thing is hard for a computer to do and therefore is a sign of human-level intelligence" -> the tech improves -> the computer is able to do the thing -> "this thing is a poor heuristic for intelligence"
It's amusing, but I don't think it's actually surprising (nor is it a sign of moving goalposts): we don't have a rigorous definition of "intelligence", just human beings as exemplars, and so the...
It's amusing, but I don't think it's actually surprising (nor is it a sign of moving goalposts): we don't have a rigorous definition of "intelligence", just human beings as exemplars, and so the best option we have for evaluating intelligence in non-human entities is to pick something that only humans can do and set that as the provisional bar. Language was seen as a fairly high bar when Turing proposed it back in the '50s, but LLMs are literally the first non-human entities ever to produce large volumes of coherent language, and they are self-evidently not "intelligent", so we should perhaps not be surprised that it was clearly not a high enough bar.
Also, as with other activities such as playing chess, we know for a fact that the software isn’t completing the task in the same way that humans are. They aren’t doing what we do better or more...
Also, as with other activities such as playing chess, we know for a fact that the software isn’t completing the task in the same way that humans are. They aren’t doing what we do better or more efficiently than we can, they are doing something else that produces a similar result.
I have a definition of intelligence. (edit: also see grumbel's comment) Intelligence is prediction. In the case of intelligent living processes ranging from single celled organisms to complex...
I have a definition of intelligence. (edit: also see grumbel's comment)
Intelligence is prediction.
In the case of intelligent living processes ranging from single celled organisms to complex multicellular life, intelligence arises from the need to predict the future to survive and reproduce. More intelligent organisms build more elaborate models of the world using better developed senses in order to do so.
Humans model the world primarily with language, which allows us to share our models with each other, across both space and time! Without language, it is extraordinarily more difficult to communicate complex abstract thoughts. As a side effect of our high level of intelligence, our wetware is capable of modeling things outside of language, such as mathematics.
As much as I empathize with the author of the article and agree that ASD kids need better resources for communication, I don't think they're correct about the fundamental nature of intelligence vs. heuristics.
On the nature of LLMs, I see no intrinsic reason why "real time" learning is infeasible. The current generation of LLMs is just scratching the surface of the nature of intelligence. It's not going to be a singularity, but the technology will improve and our understanding will grow. This is the beginning, not the end.
In general, I think we need to stop equating intelligence with consciousness, agency, moral value, and living things. These are entirely different concepts, and, as a species, we are blinded by our egotistical identity of being smarter than the other animals.
I like to distinguish between "Intelligent" and "Educated," because there's definite a gap between the two, and lets a lot of the hidden biases flourish. There's also determination, empathy, and...
I like to distinguish between "Intelligent" and "Educated," because there's definite a gap between the two, and lets a lot of the hidden biases flourish. There's also determination, empathy, and other factors at play, but I'm going to set those aside to explain my first sentence a bit more.
An educated person whom is not intelligent can be quite fluent within the scope of their education, but can fall quickly once they are outside the bounds of it. The difference between a A+ doctor and a C doctor is that the A+ doctor likely won't need a full-time IT support professional standing by their side to do everything outside the scope of medical school. I've met a few people I would class as "Very educated idiots." I had one doctor tell me not to feed my kids frozen fruit because "God knows what chemicals they're using."
An intelligent person can thrive even without much education. I'd say that if there were a way to properly assess broad intelligence (again outside the scope of the other things), we would find intelligent people would be more likely than the average population to succeed in completely new situations.
And most importantly: Intelligence in one area does not necessarily imply intelligence in another, which is part of why assessing a broad scope of intelligence is a fools errand IMO.
From your examples, the "highly educated idiots" simply have an underdeveloped model of the world from lack of curiosity or necessity. I imagine you could be similarly specialized in mathematics...
From your examples, the "highly educated idiots" simply have an underdeveloped model of the world from lack of curiosity or necessity.
I imagine you could be similarly specialized in mathematics or programming, while being an idiot in regular things. I think such a person is still intelligent.
Another interesting facet of human intelligence is communication skills. How well do you listen? What do you pay attention to? Can you explain your conclusions and thought process to others?
There are plenty of highly intelligent people who simply do not listen with their full attention when others are talking. Maybe this is when empathy comes into play, as you noted.
Re: assessing intelligence. Assessing "human intelligence" is possibly a fool's errand because it's too complex, bound up in our environment, social structure, and cultures.
However, if "intelligence" (as I defined) is a low-level primitive that human intelligence builds upon, then that is something that can be tested in a more rigorous way—albeit with less of a socially relevant conclusion.
”the world” is a perfect model of the world: it can make predictions 100% accurately... but has difficulty communicating those predictions to us. is it intelligent?
”the world” is a perfect model of the world: it can make predictions 100% accurately... but has difficulty communicating those predictions to us.
We must have different definitions of model, because your statement does not make sense to me. Let me clarify my definition. The universe is a complex process, which is to say that it is a system...
We must have different definitions of model, because your statement does not make sense to me. Let me clarify my definition.
The universe is a complex process, which is to say that it is a system that changes over time. The universe itself can be viewed as a complex composition of subprocesses that come and go. The Earth is a process. Ecosystems are processes. Individual organisms are living processes. You and I and other mammals, birds, etc. are intelligent living processes (ILPs). ILPs are intelligent because they model their environment in order to predict the future, to increase the odds of surviving to reproduce.
A modeling process ("model") is internally organized to represent the modeled process—what it is modeling. A model's "time" is not required to be bound to the time of the modeled process. This is what makes a model a model, and what makes a model useful. The model can peek forwards (or backwards) in time.
A modeling process is usually simpler than the modeled process, but this is not strictly necessary. For example, we can model a real-world chess game between two players using human actors on a giant chess board. The actors debate amongst each other what move they think the players will make before they make it, and then carry it out, rolling back their positions later if they're mistaken. This is a terribly inefficient way to model a game of chess, but is still a viable model with some usable accuracy.
With the preceding definitions, it is illogical to say that a process models itself. Saying that "the universe models itself" is equivalent to asserting "that dog is a model of that dog".
If we take a perspective outside of the universe, it's certainly possible the universe is modeling something. We just don't know what it is. If the universe is a subprocess of some larger process, perhaps it is being used for some computation, a la planet Earth in The Hitchhiker's Guide to the Galaxy. In that case, the enclosing process may well be intelligent. Structurally speaking, this would be no different from animals being intelligent because their brains model their environment.
I would separate these two concepts - surely, some people are just unreasonably bad at, say, regular life, but otherwise are highly intelligent in programming, or even more commonly, are...
I would separate these two concepts - surely, some people are just unreasonably bad at, say, regular life, but otherwise are highly intelligent in programming, or even more commonly, are exceptionally good at their work, but were always bad at mathematics and has learned to believe that it is somehow inherent with him/her, instead of being a failure of their teacher. This is indeed just a bad internal model.
But there are people that managed to lend a job and/or have a degree but are complete idiots.
Even being particularly academically intelligent doesn't necessarily correspond to being knowledgeable in other fields. There's no shortage of scientists with really really stupid takes on fields...
Even being particularly academically intelligent doesn't necessarily correspond to being knowledgeable in other fields. There's no shortage of scientists with really really stupid takes on fields outside their area of expertise (Neil DeGrasse Tyson is an example that comes to mind).
In general, I think humans use "intelligence" to refer a bunch of related but disjointed concepts, since it wasn't invented as a technical term to be defined in a rigorous way. I think that's a big part of why we all keep arguing about what "counts" as intelligence wrt machine learning. But then again I've got a linguistics background so I may be biased towards attributing it to language.
You honestly don't even need to get as complex as these abstract concepts. It's difficult to come up with a sufficiently thorough definition of a word like "bird" or "chair". Language exists to...
You honestly don't even need to get as complex as these abstract concepts. It's difficult to come up with a sufficiently thorough definition of a word like "bird" or "chair". Language exists to facilitate communicatiom, and humans just don't need to rigorously define everything to communicate sufficiently well in most circumstances. Heck, ambiguity or vagueness that aren't possible if you rigorously define everything may even be desired in a lot of situations when people communicate!
I whole heartedly agree with this - like limbs evolving to navigate a species through space, intelligence navigates a species through time (or temporal possibility). Want to avoid an obvious...
I whole heartedly agree with this - like limbs evolving to navigate a species through space, intelligence navigates a species through time (or temporal possibility). Want to avoid an obvious death? Well you first have to recognize which possible futures are obvious given the current input/output.
Language is like the api between external nodes - possibly even internal nodes, but without a store of data that's reliable you could have the most robust api attempting to fill in the gaps...
Language is like the api between external nodes - possibly even internal nodes, but without a store of data that's reliable you could have the most robust api attempting to fill in the gaps dynamically but you'll eventually fall short.
Furthermore I hope our goal isn't to emulate humans, which so far have a pretty poor filter for what gets accepted into that data store as valid. We should be requiring some pretty aggressive filtering.
As a linguist and a software dev, I actually really like this analogy! I didn't focus on language and the brain specifically, but I did learn that when a particular portion of the brain (known as...
Language is like the api between external nodes - possibly even internal nodes
As a linguist and a software dev, I actually really like this analogy! I didn't focus on language and the brain specifically, but I did learn that when a particular portion of the brain (known as Broca's area), people's ability to produce language is partially lost, but this doesn't have any impact on their other cognitive abilities (absent damage to other parts of the brain, ofc). As a result, I think we can confirm that the "internal nodes" as it were at least don't rely strictly on language, since you would expect damage here to be much more impactful to other cognition if that were the case.
That is fascinating. I wonder if they still experience internal dialogue. AFAIK the research heavily suggests that higher order thinking is unlocked by language. Eg) "left of the blue wall" is...
That is fascinating. I wonder if they still experience internal dialogue. AFAIK the research heavily suggests that higher order thinking is unlocked by language. Eg) "left of the blue wall" is difficult to comprehend without language, studies show that each concept, "left", "blue", & "wall" can be understood, but connecting them into one concept doesn't seem possible without language. Though perhaps once you've understood it as a concept language isn't required to access it any longer? I'd be interested on learning more about patients that experienced issues with Broca's area.
My understanding is that the research is much less settled, since it's not possible to ethically deprive someone of language from birth for scientific study and our closest equivalents to study...
My understanding is that the research is much less settled, since it's not possible to ethically deprive someone of language from birth for scientific study and our closest equivalents to study naturally tend to have other confounding mental impairments that make it difficult to isolate what is actually due to language deprivation and what is due to other factors. There's certainly plenty of theorizing about langauge unlocking higher-level cognition in our evolution, but afaik the jury's still out on which came first in that particular chicken-egg problem.
But I'll confess that my background is principally looking at this from the other end -- studying the structure of language itself rather than how it actually operates in the brain. Within that side of things there's a lot of arguing about whether language has an inherent structure dictated by some unique human language faculty, and the arguments about that within linguistics can get heated. I don't want to end up being an example of someone talking about something I don't know enough about, and as soon as we start talking about the human body and brain, I'm not much of an expert anymore lol.
Cognition is really weird. Some people (including myself) don't have an internal dialogue at all even with a functioning Broca's area, and (at least in my experience) don't have any trouble with...
Cognition is really weird. Some people (including myself) don't have an internal dialogue at all even with a functioning Broca's area, and (at least in my experience) don't have any trouble with higher order thinking. Having no language to effectively learn or communicate those concepts with others would make things hard, though, which is really interesting.
Hey, that just prompted an interesting thought in me. Notice how language is very much removed from ideal information compression? So much redundancy in there that makes languages more complex to...
Language is like the api between external nodes - possibly even internal nodes, but without a store of data that's reliable you could have the most robust api attempting to fill in the gaps dynamically but you'll eventually fall short.
Hey, that just prompted an interesting thought in me. Notice how language is very much removed from ideal information compression? So much redundancy in there that makes languages more complex to learn, makes sentences longer, just seems pointless? Well, it does start to make sense once you realize you can reconstruct what someone must've said (not just the gist of it, but the exact wording they must've used) even if you only heard part of their speech. That's obviously useful, but also not news.
Do you also know about that vague (or maybe not so vague if you speak to the right experts) notion that human abstract thought and human speech co-developed? That maybe abstract thought is much harder or even impossible without speech, even if it is just your inner voice?
Complete conjecture, but let's put those together: Speech is useful for abstract thought because it gives us a redundant (and thus robust) way of encoding abstract thought. By encoding your thought in language, even if you never speak those words, you give it a more robust encoding. Your brain can memorize that, and if later you remember that language encoding slightly wrong (as always happens) your brain can do some error correction, overwrite the faulty memory with an error-corrected one, and thus remember the thought. Contrast a high-density compressed format: In order to error-correct this thought later, you'd basically have to come up with it all anew, because what you remember will not be a slightly garbled but still readable sentence, but an equally plausible but wrong thought.
Nice theory, but I believe it falls apart as quite a lot of people lack an inner monologue. We may not be as dependent on language, as we think. Arguably, our visual system is the most complex.
Nice theory, but I believe it falls apart as quite a lot of people lack an inner monologue. We may not be as dependent on language, as we think. Arguably, our visual system is the most complex.
Oh, this extends way beyond computers. "This thing is hard for a non-human to do and therefore is a sign of human-level intelligence" has been used to separate humans from animals, and even other...
history of CS/AI/ML
Oh, this extends way beyond computers.
"This thing is hard for a non-human to do and therefore is a sign of human-level intelligence" has been used to separate humans from animals, and even other humans as "subhumans" for millennia. Since this human chauvinism has strong roots in religion, it's not going away anytime soon. We are special; only we have souls; only we can make important decisions, vote, drive; only we deserve certain rights. For a given definition of "we."
And we will continue to make that distinction because we're control freaks. We don't want to be studied, predicted, or categorized. We don't want to be same same with everything else, we want to...
And we will continue to make that distinction because we're control freaks. We don't want to be studied, predicted, or categorized. We don't want to be same same with everything else, we want to dominate all of it.
We may be communal creatures but only because community enhances our own survival. It's really hard for us to empathize and sympathize, that's a secondary skill developed to better protect us from getting tricked.
We're just animals, like tigers, or other solitary creatures watching out for number 1. We're greedy by nature and will always be. We only move forward because the idea presented to us from someone else somehow benefits us.
Please see my other comment: https://tildes.net/~comp/194n/language_is_a_poor_heuristic_for_intelligence#comment-a34f We are absolutely special. We just don’t have a clear boundary to draw. Sure,...
We are absolutely special. We just don’t have a clear boundary to draw. Sure, corvids can use and make tools, but we are very obviously on a completely different level from them.
Of course other readings (vote, rights) are beyond this topic and I’m not commenting on that.
…and soon as that thing is achieved, the goalposts are moved, rather than admit a new member to the human intellect club. I love how much drama and distress AI is causing the "humans are supposed...
…and soon as that thing is achieved, the goalposts are moved, rather than admit a new member to the human intellect club.
I love how much drama and distress AI is causing the "humans are supposed to be special" crowd by mimicking human quirks through unthinking dirt that has studied massive quantities of inane human writing.
Agreed. I wonder if it's just the fact that we are miles away from actually getting to AGI, or if it's actually impossible for us to achieve because we don't even understand where our own...
Agreed.
I wonder if it's just the fact that we are miles away from actually getting to AGI, or if it's actually impossible for us to achieve because we don't even understand where our own consciousness comes from?
It's worth making a distinction between sentience and sapience. A (sapient) computer could be capable of performing any intelligent tasks beyond a human level and still have no qualia that would...
It's worth making a distinction between sentience and sapience. A (sapient) computer could be capable of performing any intelligent tasks beyond a human level and still have no qualia that would make it sentient.
I'd argue that we do have a decent idea of where conscious comes from. Since at least Julien Offray de La Mettrie we've observed the relationship between brain and mind.
If you get brain damage in a well understood part of the brain there's a good chance a neurologist could guess what your conscious experience would be like, and by the generalizability of that we chip away at the strong sollipsism@pienix.
A hundred patients came in with the same damage to their brain and gave similar self-reports? Odds are "the mind is what the brain does", even if that becomes inaccessibly complicated if you try to dig deeper.
I don't think norb is necessarily saying that consciousness doesn't come from the brain, I read that instead as a reference to what you call the "inaccessibly complicated" nature of it. We have a...
I don't think norb is necessarily saying that consciousness doesn't come from the brain, I read that instead as a reference to what you call the "inaccessibly complicated" nature of it. We have a very abstract and high level descriptive model that tells us a strong relationship between the brain and consciousness exists, and we can use it to make very broad predictions, but that only gets you so far. Anyone can gather patient reports about different types of brain damage and subjective experiences, and use those reports to predict that the next guy with a substantially similar kind of brain damage will have a substantially similar subjective experience, but that doesn't really get you anywhere near an understanding of the connection between those two things.
I am making the argument that consciousness doesn't come just from the brain. I think there are a lot of ways the brain acts as the CPU (central processing unit) for the body, doing a lot of work...
I am making the argument that consciousness doesn't come just from the brain. I think there are a lot of ways the brain acts as the CPU (central processing unit) for the body, doing a lot of work in conjunction with other parts of the body. Somewhere in that tangled mess we have what we define as "consciousness."
I guess I was also trying to make the point that even agreeing on when consciousness begins and ends isn't something we can all agree on (not to make this into another argument/discussion but you can see the ways in which people disagree on when life begins when it's related to abortion discussions). Are dogs or cats conscious? I would argue yes, but I'm sure others would disagree. Do we assign it to just those animals that we see as companions (dogs or cats) vs food (cows or chickens)?
I think those types of questions make it very hard for us to assign the idea of consciousness to a computer program. That is what I meant by "is it even possible to do so?"
[EDIT]: Just to add I have no formal training or knowledge in any of these spaces. I just find the ideas fascinating. Loving the conversation for sure!
It's interesting to consider that we know that tons of animals are sentient but we don't know exactly to what degree other animals are sapient, whereas we can know exactly how sapient an AI is but...
It's worth making a distinction between sentience and sapience. A (sapient) computer could be capable of performing any intelligent tasks beyond a human level and still have no qualia that would make it sentient.
It's interesting to consider that we know that tons of animals are sentient but we don't know exactly to what degree other animals are sapient, whereas we can know exactly how sapient an AI is but understanding its sentience is impossible.
But I guess that just means they're a more pure expression of the philosophical question of proving the sentience of anyone else.
Somewhere Godel smiles (or at least those who have interpreted his Incompleteness Theorem through a philosophical lens). One could argue (and many have) that we can't understand our own...
if it's actually impossible for us to achieve because we don't even understand where our own consciousness comes from?
Somewhere Godel smiles (or at least those who have interpreted his Incompleteness Theorem through a philosophical lens). One could argue (and many have) that we can't understand our own consciousness because we are inside our consciousness. We'd have to ascend (through evolution or something else) to superhuman to then be able to grok regular ole humanity.
I’m not bought on interpreting Incompleteness Theories on top of consciousness, like it is quite a mental leap to say anything about consciousness’s relation to a rule system sufficiently complex...
I’m not bought on interpreting Incompleteness Theories on top of consciousness, like it is quite a mental leap to say anything about consciousness’s relation to a rule system sufficiently complex to have arithmetics and to conclude that this one is an unprovable fact from that.
With that said, I’m more inclined to believe that consciousness is “simply” an emergent behavior of certain complex enough systems and there is not much to understand about it, similarly to how we can have laws regarding temperature, but it is not itself a thing - it is a statistics of a system. In this vein I don’t see a reason to place a superhuman limit - we jumped over the emergent behavior line and thus the limit is only complexity from here on (and the halting problem!)
To the best of my understanding, Gödel's Incompleteness Theorem is strictly about the limitations of logical systems. Interpreting Gödel's work as applicable to consciousness, particularly in this...
To the best of my understanding, Gödel's Incompleteness Theorem is strictly about the limitations of logical systems.
Interpreting Gödel's work as applicable to consciousness, particularly in this manner, seems to me a rather large and unfounded stretch. This would require that consciousness is defined in an axiomatic manner (I'm unclear how this could ever be shown), and a proof that understanding consciousness is improvable (whatever that proof would like). Defining any of these mathematically is likely a lost cause.
I think there's possibly some validity in discussing whether we can understand consciousness, but dragging Gödel into the argument seems pointless to me.
A simple Google search would have sufficed to show you that people have been "dragging" Godel into the "argument" for decades. Godel himself knew that his theorem, though strictly about...
I think there's possibly some validity in discussing whether we can understand consciousness, but dragging Gödel into the argument seems pointless to me.
A simple Google search would have sufficed to show you that people have been "dragging" Godel into the "argument" for decades. Godel himself knew that his theorem, though strictly about math/logic, would be interpreted through other lenses:
"It was something to be expected that sooner or later my proof will be made useful for religion, since that is doubtless also justified in a certain sense."
Maybe in pop culture, but overall I kind of disagree... the idea that language is not indicative of intelligence or understanding is not new, the Chinese Room argument originated in 1980....
Maybe in pop culture, but overall I kind of disagree... the idea that language is not indicative of intelligence or understanding is not new, the Chinese Room argument originated in 1980.
Surprisingly, on reading that article to confirm the exact year I learned that the exploration of the concept of mechanically replicating the human mind predates computers. Leibniz - a VERY prominent mathematician who went toe to toe with Newton - was pondering this kind of thing as far back as the 1700s. Pretty fun fact, I didn't know that about him.
If you look the history of AI, you'll see that repeated frequently, even back in the early days. It was supposed that computers would be able to easily analyze pictures to distinguish the contents...
If you look the history of AI, you'll see that repeated frequently, even back in the early days. It was supposed that computers would be able to easily analyze pictures to distinguish the contents of an image. However, this has only been a realized technology since after the turn of the millennium, as it isn't as easy as early computer scientists thought it would be. They hadn't realized how hard the thing they wanted to do was until they actually tried to do it. In that same way, as modern computer science advances, each hurdle crossed only expands our understanding of just how much we don't know. Sure, technology like ChatGPT would have been considered magic 100 years ago, but it still doesn't equate to anything close to what the human brain is capable of. It is a steadily advancing field though, and surely we will reach true AI some day.
We are not lacking in real, hard boundaries distinguishing us from AI — we have discovered nuclear power, computers, we wrote AIs themselves, etc. But that would hardly be an interesting limit, so...
We are not lacking in real, hard boundaries distinguishing us from AI — we have discovered nuclear power, computers, we wrote AIs themselves, etc. But that would hardly be an interesting limit, so we are looking for the minimal limit for human-level intelligence, and we will obviously shoot a bit below from time to time.
This phenomenon is named the "Curse of AI" or something like it, think I read at some point before even before GPT.
"this thing is hard for a computer to do and therefore is a sign of human-level intelligence" -> the tech improves -> the computer is able to do the thing -> "this thing is a poor heuristic for intelligence"
This phenomenon is named the "Curse of AI" or something like it, think I read at some point before even before GPT.
I think it's important to understand the specific failure modes and flaws of GPT as it becomes more prominent in the world, but I intensely dislike the seemingly political importance placed on...
I think it's important to understand the specific failure modes and flaws of GPT as it becomes more prominent in the world, but I intensely dislike the seemingly political importance placed on "intelligence is a single thing that is possessed or not, GPT has exactly none of it, and anyone trying to say otherwise is some kind of swindler trying to con you" present throughout this and the referenced writing of Emily Bender and Timnit Gebru. It's not a useful way to understand AI or the discussions around it. Taken most charitably, it seems like an overreaction to the possibility that people might think any intelligence means consciousness and that means GPT warrants rights or some special level of respect.
something that would require computers that are as fully complex as, and functionally equivalent to, human beings. (Which about five minutes ago was precisely what the term ‘artificial intelligence’ meant, but since tech companies managed to dumb down and rebrand ‘AI’ to mean “anything utilizing a machine-learning algorithm”, the resulting terminology vacuum necessitated a new coinage, so now we have to call machine cognition of human-level complexity ‘AGI’, for ‘artificial general intelligence’.)
The field of AI has long been considered to include things even like theorem provers and chess bots. It's ahistorical to claim that it's an abuse of the term AI to associate GPT with it, especially as it's exceeded so many expectations and benchmarks set by people in the field. It's only in sci-fi stories that "AI" been synonymous with at-least human-level intelligence.
Frankly, the constant goalpost moving of "but this is not AI" is IMO best curtailed by a short nod in the direction of the term AGI. If what these people expect of AI is basically a human...
Frankly, the constant goalpost moving of "but this is not AI" is IMO best curtailed by a short nod in the direction of the term AGI. If what these people expect of AI is basically a human benchmark, then AGI and AI are the same thing. So where does that leave us? Well, AI, in contrast to AGI, is specialized. It can do some things, quite possibly better than a human, but not others. A chess engine is an AI. A recommendation engine is an AI. The 2-layer densenet that I trained to recognize MNIST-digits is an AI. Warcraft 3's bot build order is an AI.
Within that context, ChatGPT is of course an AI. The problem with LLMs is that humans communicate their intelligence using language, and LLMs are trained on large amounts of human language. They can imitate language well, they can even imitate intelligent conversation well. That does not per se make them intelligent. And one shouldn't put too much stock in the word "imitate" here, because sufficient imitation is completely indistinguishable from the real thing. In principle, I also don't see a fundamental limit on the degree of intelligence that can be reached using LLMs - I'd be more worried about the practical costs in terms of compute and data.
I think it's best to view intelligence as a multidimensional thing: ChatGPT has the short-term memory of a 3-day old, the capacity to learn (during conversation, not via backprop) of a 3 year old, the language capacity of a 20 year old, and knowledge of facts and general knowledge of a 500 year old, random memory lapses included. I think that's a reasonable first-degree approximation, but it surely isn't foolproof. The combination can fool humans quite well, as the 20 year old sounds confident, irrespective of whether the 3 year old or the 500 year old is determining the content. But I certainly occasionally see intelligence way in excess of what other entities commonly thought to be "intelligent" produce. The problem isn't that LLMs are not intelligent, it's that the way they communicate makes us expect more than is actually there.
A point the author spends good portion of this article proving is that the creators of these models themselves have put a lot of time and effort into deliberately blurring the line between AI and...
A point the author spends good portion of this article proving is that the creators of these models themselves have put a lot of time and effort into deliberately blurring the line between AI and AGI, and trying to market these products as AGI, or at least consciously trying to evoke the aesthetics of it. She talks about a number of choices that these companies make to try to suggest to end users that the AI is more than it is. I suspect a lot of this is because of how much we used to hear about the Turing test. AI research used to be very obsessed with creating an AI that could appear and act human to the end user to pass the Turing test, and since we didn't have anything as advanced as LLMs yet, the way you did that was primarily through presentation and smoke and mirrors and language, which is how we've gotten where we've gotten.
Regardless, the effect is that most layman end users come out of an interaction with ChatGPT et al thinking that this is something qualitatively similar to AGI that just has to be iterated on to get the Star Trek computer. The Snapchat AI was a particularly good test of this for me because I got to watch dozens of friends have their first experience with an LLM and come away from it with their own impressions. And I don't know anyone who came away from that experience thinking correctly that the way it works is that it can't do anything but mimic language patterns, and a side effect of mimicking language patterns is that sometimes the language it's mimicking happens to have correct information in it. Instead I saw a lot of people come away thinking they can ask an LLM a question and get a correct answer. On the other hand, I'm in law school, and my law school friends by and large just decided it was broken and useless and not-quite-there-yet because basically any question you ask any of the existing LLMs about anything related to the law causes hallucinations. And yet even they were thinking like "wow, well once they add information about the law to the databases and teach it about the law this will be really cool," not understanding that teaching an LLM information or adding information to a database is fundamentally incompatible with how LLMs work.
And ultimately this feels like it's just a stupid semantic argument about whether AI is really AI or not really AI and a different word means what some people mean when they say AI bla bla bla bla bla. But I don't think that's really the crux of the argument the author is making, which is important in a world where people bullish on AI regularly write op-eds or get hundreds of thousands of Twitter likes discussing the integration of LLMs into education. Laypeople en masse have a fundamental misunderstanding of how LLMs work that has been deliberately promulgated primarily by people that are financially invested in the proliferation of LLMs, and that fundamental misunderstanding is driving public opinion on policy related to them. That seems bad!
One of my fun tests for an LLM is to ask them about someone you know personally but who isn't famous enough to have been in the training data scraped from the internet. The LLM will hallucinate an...
Instead I saw a lot of people come away thinking they can ask an LLM a question and get a correct answer.
One of my fun tests for an LLM is to ask them about someone you know personally but who isn't famous enough to have been in the training data scraped from the internet. The LLM will hallucinate an entire-ass background for this person you know dearly; a background that is clearly, unequivocally false.
I did this once on my best friend and the LLM reported that he was "an American entrepreneur and investor. He is the founder and CEO of Carstensen Ventures, a venture capital firm that invests in early-stage technology companies." "Carstensen Ventures" doesn't even exist, at least according to a cursory Google search.
Well of course it does, an LLM is a next word prediction intelligence, not a general observational one. This is kind of like asking a dog to solve a math problem and concluding a dog isn't...
One of my fun tests for an LLM is to ask them about someone you know personally but who isn't famous enough to have been in the training data scraped from the internet. The LLM will hallucinate an entire-ass background for this person you know dearly; a background that is clearly, unequivocally false.
Well of course it does, an LLM is a next word prediction intelligence, not a general observational one.
This is kind of like asking a dog to solve a math problem and concluding a dog isn't intelligent when it barks at you instead.
The beauty of an LLM is that you can give it information, like a background on your friend, and it can use that information, through predicting the next word, and draw real conclusions and predictions from it.
So I can give an LLM a bit of information and ask it to operate on that information pretty generally, and most often it'll do a good job of it. The sheer amazing thing is that this next word prediction machine can summarize a paragraph for me, despite being nothing but a next word prediction system.
It would be interesting to say, train an LLM and try to remove all of it's knowledge about how the world works, distilling it's knowledge to only the text it is given. This would result in the model refusing to give data which isn't drawn from its context and dramatically reduce false errors, but I image it would be a pretty hard task that would perform worse than the "knows all the information on the internet" model we have now.
You're right in the sense that this isn't how the models are trained. They are next token predictors. However, getting LLMs to query information from databases is something AI researchers are...
not understanding that teaching an LLM information or adding information to a database is fundamentally incompatible with how LLMs work.
You're right in the sense that this isn't how the models are trained. They are next token predictors. However, getting LLMs to query information from databases is something AI researchers are considering.
More generally, once LLMs are capable of using tools such as writing a database query or writing, compiling and running code for its own purposes they'll be very powerful. This will be the synthesis of good old fashioned symbolic AI with the current trends of connectionist ML.
Tool-usage training can be accomplished in a similar manner to how Reinforcement Learning with Human Feedback (RLHF) is being used to fine-tune the models now. The model can be rewarded with RL for writing correct SQL queries or Python code.
I think this is the bombshell next step that a lot of people aren't imagining. People are beginning to wrap their heads around the limitations of ChatGPT and the like, and assuming that's the...
I think this is the bombshell next step that a lot of people aren't imagining. People are beginning to wrap their heads around the limitations of ChatGPT and the like, and assuming that's the extent of what's possible with the technology. Already I'm seeing people shrug it off but they're throwing out the baby with the bathwater.
Soon we're going to see these systems querying external sources for information, cross-referencing sources, fact-checking, then introspecting with themselves to validate that their conclusions are free of bias and contain accurate citations. Maybe they'll feed their responses through other systems to validate other things like correct syntax/behavior of code, etc. All before a single character is returned to the end user.
Current LLMs have huge flaws but they're just the primordial form of something yet to be realized.
Introspecting with oneself is not something LLMs have the capacity to do by their nature. You can very easily imagine apolz's point that they might be able to construct SQL queries. Maybe you have...
Introspecting with oneself is not something LLMs have the capacity to do by their nature. You can very easily imagine apolz's point that they might be able to construct SQL queries. Maybe you have a pipeline of LLM inputs and outputs where one run of the LLM generates an SQL query based on the user's query, then the program polls a database using that query, then the LLM is again asked to convert that into human-readable content with something like "answer this question assuming the following to be true" (however I definitely have questions about the utility of this because you still end up in the same position of unverifiability you're in right now, though that's not a conversation I really care about having). This is so easy to imagine because converting text from one form to another is one of the things LLMs are actually pretty good at, and you can just string some dumb code or more traditional AI in the middle between the two text transform steps to make the more mechanical and information-oriented parts of it work and you're good.
But LLMs cannot query external sources for information, nor cross-reference sources, nor fact-check, nor introspect, because all of these are tasks that require some concept of information, and information is fundamentally incompatible with how LLMs work under the hood. I respectfully think you're not grasping the point of the article which is that an LLM does not and cannot possess information, it mimics language, and it turns out this gets you a shockingly close simulacrum of possessing information, which is good enough for a lot of uses! But that doesn't get you any closer to knowing how to create an AI that does the things you're describing because to do those things you'd need to also invent a completely different type of AI that works via a completely different method. There is no iterative development process that can be performed on an LLM that gets you a machine that knows things.
Maybe if AGI is possible then LLMs will be part of the pipeline for how they process input into and generate output from whatever currently future-tech AI actually does the "knowing" and the "thinking", and that currently future-tech AI could do the things you're describing, but that's not how LLMs work or are able to work, and our (current?) inability to create an AI that could do those things is the entire reason why we instead settled for making an AI that subjectively feels like it's doing those things when it actually isn't.
What does it actually mean to "know something" as a human? Do you need to be able to recall it with 100% accuracy? 99% accuracy? How good is human-level "knowledge" on a probability scale? Do you...
What does it actually mean to "know something" as a human?
Do you need to be able to recall it with 100% accuracy? 99% accuracy? How good is human-level "knowledge" on a probability scale?
Do you need to be able to transform it and apply it to new and different contexts?
Do you know the capital of Kyrgyzstan? An LLM does, and it can write you a story about a person living there, pulling the relevant knowledge from its weights. Perhaps if I do a million generations, I will get a wrong or incoherent answer.
What an LLM is incapable of is "feeling" the truthiness of any given statement. Where humans will (infrequently) say "I don't know" or refuse to make up bullshit, an LLM will do so because there's no internal feedback loop or self-assessment. This is different from not having any knowledge at all.
Yeah I'm gonna bow out after this one before this truly just gets stuck in the mud as a definition debate, but: Yes, an LLM can often give an answer to a question correctly. Is being able to...
Yeah I'm gonna bow out after this one before this truly just gets stuck in the mud as a definition debate, but:
Yes, an LLM can often give an answer to a question correctly. Is being able to produce text that encodes some information the same thing as knowledge? I mean, maybe! That's not really relevant though. What is relevant is that an LLM does not conceive of that as information, per se, it conceives of it purely in terms of language. What I'm getting at is that you cannot just give an LLM new information, you can only give it new language, and though that language may encode useful information, the LLM is only ever capable of using that information as language.
I brought this up because it means you can't just teach an LLM to start introspecting, or fact-checking, because those are new processes, and you cannot teach an LLM a new process, you can only give it new language. You can give it language that tends to indicate introspection, but that doesn't actually result in anything resembling an introspective process or a fact-checking process occurring under the hood, because there's no code behind that. What it results in is exactly what LLMs do when you ask them to introspect: they say "Sorry! On second thought I got that wrong, didn't I?", but no introspective process has occurred. In fact you can easily get them to say sorry even if they were originally right! Saying sorry is the simple result of mimicking the language that people tend to have after introspecting.
You can teach an LLM how to introspect, or how to fact-check, by adding language associated with these tasks to its training data, and this will result in it being able to explain to you how to introspect and fact-check, and it will result in it confidently speaking in the language of introspection and the language of fact-checking when prompted, but it will not prompt it to conduct introspection or fact-checking. The LLM does not even understand these processes as things that it has the ability to do, or things that anyone has the ability to do. This is relevant because I was responding to someone talking about introspection, fact-checking, etc as new processes that they believe you can teach LLMs how to do.
Whether a mimic of language which is sufficiently advanced that it often correctly responds to queries can or cannot be said to "know" something is not the question I was raising and is ultimately not very interesting to me because it's just a definition debate. What I was saying was that an LLM by definition does one single process (transforming text) very well and that single process also happens to give it the ability to mimic a lot of other processes (finding information, etc etc etc) but it cannot actually perform any other processes, nor can it be made to perform any other processes because it does not have a model for anything except for text and the transformation thereof.
I'm going to have to disagree on this one here. To be clear, I don't think current LLMs are how we'll get there, as the compute and data requirements are kind of insane, but I think it is...
You can give it language that tends to indicate introspection, but that doesn't actually result in anything resembling an introspective process or a fact-checking process occurring under the hood, because there's no code behind that.
I'm going to have to disagree on this one here. To be clear, I don't think current LLMs are how we'll get there, as the compute and data requirements are kind of insane, but I think it is fundamentally plausible to train a transformer LLM to actually introspect. After all, it's just another statistical pattern that underlies the data generation process. It is, thus, contained in the data. Rarely, because people on the internet rarely introspect noticably, which is driving the difficulty here.
Let's imagine the following: You have a dataset of human conversations, where one party is doing introspection, and that introspection is documented in the data. My claim here is that it is fundamentally impossible for the LLM to achieve high validation scores on this dataset without actually doing the introspection. Whether one wants to sink into a semantic argument about "it's only introspection if it works by the same mechanisms as human introspection" is your choice, but I won't follow there because from an external perspective it's indistinguishable. The only remaining question is whether a transformer architecture is in principle capable of this. This is a bit of a tricky one because you're kind of sandwiched between a runtime that's linear in the model size (for some definition of model size) thus prohibiting turing completeness on one side; and the universal approximation theorem on the other side. Add enough stuff and you can do anything, but you have to actually add the stuff. Adding more computation time to the same neural algorithm means adding more neurons that you have to now also train. But if we do the tried and true method of saying "well, after 20 layers of introspection surely we'll have our result at the latest" and calling it quits there, we'll get our answer in 99% of cases. More if you're willing to add more layers. That's hardly a fundamental reason why it can not be. And maybe a smart cookie will come up with some clever way of doing a fixed point iteration or something on the output of a network, adaptively adding more computation as needed. That'd sort that out.
Now, ok, why can't chatGPT do introspection then? Two possible explanations: (1) the phenomenon is too rare in our data to actually induce the at least modestly complex neural pathways it requires. (2) the expressiveness of our LLMs is too small to capture these complex pathways. The solutions to those two are -conceptually- very simple: moar data and moar neurons respectively.
Now, granted I'm firmly in the camp that thinks it infeasible to gather so much data and assemble so many neurons as to actually pull this off using transformer LLMs. More efficient architectures are needed. Ideally those architectures betray the internal mechanisms used by the model, thus satisfying the mechanistic crowd that needs the AI to solve the task the same way a human would.
I enjoyed this essay about large language models (LLMs) like ChatGPT and how we as humans have a hard time realizing that what they produce is basically a really advanced auto-complete function...
I enjoyed this essay about large language models (LLMs) like ChatGPT and how we as humans have a hard time realizing that what they produce is basically a really advanced auto-complete function and not a sign of actual intelligence. There are some comparisons made to humans with "neurodivergence" that I can't really speak to, but I thought it's an interesting observation comparing how we treat humans that cannot speak vs computer that can.
A quote from the essay:
"As a society, we’re going to have to radically rethink when and how and even if it makes sense to trust any information that either originates from, or is mediated by, any kind of machine-learning algorithm — which, if you think about it, currently encompasses nearly All The Things."
Part of my job is trying to explain some of this technology to non-technical people, and this is really hard to get across. The LLMs will sound confident because they've been programmed to sound that way. There are probably a number of reasons their designers have chosen to do so, but in my opinion only one that matters; money.
These companies are trying to sell us on the sci-fi ideal of AI without actually being able to give us that. In other words, a human-like intelligence that can think and reason and lie (and know that it's lying!). They want us to put our trust (or better yet, our money) into these systems.
I truly believe that the next phase of this experiment is going to be people losing trust in anything they read online. They won't be able to know if it what they're reading, seeing, or hearing was written by a real human (with all the baggage that comes with that, but baggage we at least have spent the last few hundred thousand years learning to live with) or if it was something spit out by an algorithm that doesn't really understand why it says what it says. The intent behind the words becomes even more obtuse than it would be if we assumed a person was there writing it.
A final quote from the article that really hit home for me:
Here’s the paradox: even while the language fluency fools us into imagining these chatbots are ‘persons’, we simultaneously place far more trust in their accuracy than we would with any real human. That’s because the fact that we still know it’s a computer activates another, more modern rule of thumb: that computer-generated information is accurate and trustworthy.
Interesting read, and most of it agreeable. An LLM is a language model, and people should treat it as such. However, properly applied, it might produce useful information (eg summarizing,...
Interesting read, and most of it agreeable. An LLM is a language model, and people should treat it as such. However, properly applied, it might produce useful information (eg summarizing, translating, etc). Useful, but not necessarily errorless.
I do have some thoughts with the article, though. One of them is the fact that the author conflates LLM with AI in general.
"As a society, we’re going to have to radically rethink when and how and even if it makes sense to trust any information that either originates from, or is mediated by, any kind of machine-learning algorithm — which, if you think about it, currently encompasses nearly All The Things."
There is a difference between trusting information from a trained LLM, and an AI specifically designed and trained for a specific task (eg pattern recognition in complex data, recognizing tumors in a scan). These are things that are difficult to do accurately in traditional data analysis, and where AI outperforms traditional software and even experienced humans.
Another point, not specifically aimed at the author, but at the discussion of a AI 'conscience' in general. People might overestimate the 'understanding' of an AI, but I would say we also overestimate our own 'understanding' and 'consciousness'. Whatever these concepts are, they are (complex) processes in our brain, which can be reduced to basic physical processes. Processes that are not that different from species where we don't assume 'understanding' and 'consciousness'. So I see it merely as an emergent property, and not something inherent, not something you can measure or detect. In fact, you can only be sure you yourself are conscious and can understand something. Everybody else could be an advanced LLM. But if you cannot know the difference, is there a difference?
I think it is difficult to say whether or not an AI might have a consciousness (in the future, obviously we're not there yet) if we aren't able to clearly define what a consciousness is.
I just made a comment similar to this to another reply here. I think there is some real issue with a human created AGI when we can't explain or understand where our own consciousness comes from.
Another point, not specifically aimed at the author, but at the discussion of a AI 'conscience' in general. People might overestimate the 'understanding' of an AI, but I would say we also overestimate our own 'understanding' and 'consciousness'. Whatever these concepts are, they are (complex) processes in our brain, which can be reduced to basic physical processes. Processes that are not that different from species where we don't assume 'understanding' and 'consciousness'. So I see it merely as an emergent property, and not something inherent, not something you can measure or detect. In fact, you can only be sure you yourself are conscious and can understand something. Everybody else could be an advanced LLM. But if you cannot know the difference, is there a difference?
I think it is difficult to say whether or not an AI might have a consciousness (in the future, obviously we're not there yet) if we aren't able to clearly define what a consciousness is.
I just made a comment similar to this to another reply here. I think there is some real issue with a human created AGI when we can't explain or understand where our own consciousness comes from.
I was interested in foundation models about 10 years before transformers really took off and was involved in various efforts from "semantic technology" similar to Cyc to vector embeddings for...
I was interested in foundation models about 10 years before transformers really took off and was involved in various efforts from "semantic technology" similar to Cyc to vector embeddings for patent search, CNNs, RNNs, LSTMs, etc.
I came to believe that the "language instinct" was a peripheral of an animal and you couldn't get language to work without modelling animal intelligence.
ChatGPT performs a lot better on language swirling on itself than I would have thought possible. It will, at least for a while, revive the discredited philosophy of structuralism that sees language as a model for everything else in the social science and human experience. (e.g. the structuralists all became post-structuralists.) Linguistics becoming paradigmatic by adopting the ideas of Noam Chomsky really killed it off.
Oddly, transformational grammars are "paradigmatic" in that quite literally they are a paradigm: you can formulate problems in that framework, solve them, write papers, etc. However, from an engineering viewpoint it doesn't tell you to implement language technology and from the viewpoint of the structuralists it is just too hard and not satisfying enough. (Maybe Alain Badiou could have pulled off some abuse of transformational grammar but the other French theorists wouldn't work that hard.)
Now that ChatGPT proves that "language is all you need" or at least fakes it very well, structuralism will return, at least for a while. The thing is the real competence of ChatGPT is seduction, you want to believe in it, particularly you want to believe the social gestures. When you start making excuses for it you are doomed. The ignorant and indolent are so sure that ChatGPT can write a winning pitch deck for them that they'll wind up working much, much harder than they really want to work endlessly pushing a bubble around under the rug that you never get rid of because ChatGPT has no animal intelligence, no shame, etc.
I don't disagree with the article's premise, at all, but I do find it entertaining that throughout the history of CS/AI/ML there's this pattern of
"this thing is hard for a computer to do and therefore is a sign of human-level intelligence" -> the tech improves -> the computer is able to do the thing -> "this thing is a poor heuristic for intelligence"
It's amusing, but I don't think it's actually surprising (nor is it a sign of moving goalposts): we don't have a rigorous definition of "intelligence", just human beings as exemplars, and so the best option we have for evaluating intelligence in non-human entities is to pick something that only humans can do and set that as the provisional bar. Language was seen as a fairly high bar when Turing proposed it back in the '50s, but LLMs are literally the first non-human entities ever to produce large volumes of coherent language, and they are self-evidently not "intelligent", so we should perhaps not be surprised that it was clearly not a high enough bar.
Also, as with other activities such as playing chess, we know for a fact that the software isn’t completing the task in the same way that humans are. They aren’t doing what we do better or more efficiently than we can, they are doing something else that produces a similar result.
I have a definition of intelligence. (edit: also see grumbel's comment)
Intelligence is prediction.
In the case of intelligent living processes ranging from single celled organisms to complex multicellular life, intelligence arises from the need to predict the future to survive and reproduce. More intelligent organisms build more elaborate models of the world using better developed senses in order to do so.
Humans model the world primarily with language, which allows us to share our models with each other, across both space and time! Without language, it is extraordinarily more difficult to communicate complex abstract thoughts. As a side effect of our high level of intelligence, our wetware is capable of modeling things outside of language, such as mathematics.
As much as I empathize with the author of the article and agree that ASD kids need better resources for communication, I don't think they're correct about the fundamental nature of intelligence vs. heuristics.
On the nature of LLMs, I see no intrinsic reason why "real time" learning is infeasible. The current generation of LLMs is just scratching the surface of the nature of intelligence. It's not going to be a singularity, but the technology will improve and our understanding will grow. This is the beginning, not the end.
In general, I think we need to stop equating intelligence with consciousness, agency, moral value, and living things. These are entirely different concepts, and, as a species, we are blinded by our egotistical identity of being smarter than the other animals.
I like to distinguish between "Intelligent" and "Educated," because there's definite a gap between the two, and lets a lot of the hidden biases flourish. There's also determination, empathy, and other factors at play, but I'm going to set those aside to explain my first sentence a bit more.
An educated person whom is not intelligent can be quite fluent within the scope of their education, but can fall quickly once they are outside the bounds of it. The difference between a A+ doctor and a C doctor is that the A+ doctor likely won't need a full-time IT support professional standing by their side to do everything outside the scope of medical school. I've met a few people I would class as "Very educated idiots." I had one doctor tell me not to feed my kids frozen fruit because "God knows what chemicals they're using."
An intelligent person can thrive even without much education. I'd say that if there were a way to properly assess broad intelligence (again outside the scope of the other things), we would find intelligent people would be more likely than the average population to succeed in completely new situations.
And most importantly: Intelligence in one area does not necessarily imply intelligence in another, which is part of why assessing a broad scope of intelligence is a fools errand IMO.
From your examples, the "highly educated idiots" simply have an underdeveloped model of the world from lack of curiosity or necessity.
I imagine you could be similarly specialized in mathematics or programming, while being an idiot in regular things. I think such a person is still intelligent.
Another interesting facet of human intelligence is communication skills. How well do you listen? What do you pay attention to? Can you explain your conclusions and thought process to others?
There are plenty of highly intelligent people who simply do not listen with their full attention when others are talking. Maybe this is when empathy comes into play, as you noted.
Re: assessing intelligence. Assessing "human intelligence" is possibly a fool's errand because it's too complex, bound up in our environment, social structure, and cultures.
However, if "intelligence" (as I defined) is a low-level primitive that human intelligence builds upon, then that is something that can be tested in a more rigorous way—albeit with less of a socially relevant conclusion.
”the world” is a perfect model of the world: it can make predictions 100% accurately... but has difficulty communicating those predictions to us.
is it intelligent?
We must have different definitions of model, because your statement does not make sense to me. Let me clarify my definition.
The universe is a complex process, which is to say that it is a system that changes over time. The universe itself can be viewed as a complex composition of subprocesses that come and go. The Earth is a process. Ecosystems are processes. Individual organisms are living processes. You and I and other mammals, birds, etc. are intelligent living processes (ILPs). ILPs are intelligent because they model their environment in order to predict the future, to increase the odds of surviving to reproduce.
A modeling process ("model") is internally organized to represent the modeled process—what it is modeling. A model's "time" is not required to be bound to the time of the modeled process. This is what makes a model a model, and what makes a model useful. The model can peek forwards (or backwards) in time.
A modeling process is usually simpler than the modeled process, but this is not strictly necessary. For example, we can model a real-world chess game between two players using human actors on a giant chess board. The actors debate amongst each other what move they think the players will make before they make it, and then carry it out, rolling back their positions later if they're mistaken. This is a terribly inefficient way to model a game of chess, but is still a viable model with some usable accuracy.
With the preceding definitions, it is illogical to say that a process models itself. Saying that "the universe models itself" is equivalent to asserting "that dog is a model of that dog".
If we take a perspective outside of the universe, it's certainly possible the universe is modeling something. We just don't know what it is. If the universe is a subprocess of some larger process, perhaps it is being used for some computation, a la planet Earth in The Hitchhiker's Guide to the Galaxy. In that case, the enclosing process may well be intelligent. Structurally speaking, this would be no different from animals being intelligent because their brains model their environment.
I would separate these two concepts - surely, some people are just unreasonably bad at, say, regular life, but otherwise are highly intelligent in programming, or even more commonly, are exceptionally good at their work, but were always bad at mathematics and has learned to believe that it is somehow inherent with him/her, instead of being a failure of their teacher. This is indeed just a bad internal model.
But there are people that managed to lend a job and/or have a degree but are complete idiots.
Even being particularly academically intelligent doesn't necessarily correspond to being knowledgeable in other fields. There's no shortage of scientists with really really stupid takes on fields outside their area of expertise (Neil DeGrasse Tyson is an example that comes to mind).
In general, I think humans use "intelligence" to refer a bunch of related but disjointed concepts, since it wasn't invented as a technical term to be defined in a rigorous way. I think that's a big part of why we all keep arguing about what "counts" as intelligence wrt machine learning. But then again I've got a linguistics background so I may be biased towards attributing it to language.
Getting off topic, but I've noted before that the concept of love has the exact same confusion.
You honestly don't even need to get as complex as these abstract concepts. It's difficult to come up with a sufficiently thorough definition of a word like "bird" or "chair". Language exists to facilitate communicatiom, and humans just don't need to rigorously define everything to communicate sufficiently well in most circumstances. Heck, ambiguity or vagueness that aren't possible if you rigorously define everything may even be desired in a lot of situations when people communicate!
An even more profound example for what you are saying even has a name: Nobel disease
Ooh thanks this is a good read, didn't know there was a term for it!
I whole heartedly agree with this - like limbs evolving to navigate a species through space, intelligence navigates a species through time (or temporal possibility). Want to avoid an obvious death? Well you first have to recognize which possible futures are obvious given the current input/output.
Language is like the api between external nodes - possibly even internal nodes, but without a store of data that's reliable you could have the most robust api attempting to fill in the gaps dynamically but you'll eventually fall short.
Furthermore I hope our goal isn't to emulate humans, which so far have a pretty poor filter for what gets accepted into that data store as valid. We should be requiring some pretty aggressive filtering.
As a linguist and a software dev, I actually really like this analogy! I didn't focus on language and the brain specifically, but I did learn that when a particular portion of the brain (known as Broca's area), people's ability to produce language is partially lost, but this doesn't have any impact on their other cognitive abilities (absent damage to other parts of the brain, ofc). As a result, I think we can confirm that the "internal nodes" as it were at least don't rely strictly on language, since you would expect damage here to be much more impactful to other cognition if that were the case.
That is fascinating. I wonder if they still experience internal dialogue. AFAIK the research heavily suggests that higher order thinking is unlocked by language. Eg) "left of the blue wall" is difficult to comprehend without language, studies show that each concept, "left", "blue", & "wall" can be understood, but connecting them into one concept doesn't seem possible without language. Though perhaps once you've understood it as a concept language isn't required to access it any longer? I'd be interested on learning more about patients that experienced issues with Broca's area.
Here's a really interesting podcast that goes into greater detail: https://radiolab.org/podcast/91725-words
My understanding is that the research is much less settled, since it's not possible to ethically deprive someone of language from birth for scientific study and our closest equivalents to study naturally tend to have other confounding mental impairments that make it difficult to isolate what is actually due to language deprivation and what is due to other factors. There's certainly plenty of theorizing about langauge unlocking higher-level cognition in our evolution, but afaik the jury's still out on which came first in that particular chicken-egg problem.
But I'll confess that my background is principally looking at this from the other end -- studying the structure of language itself rather than how it actually operates in the brain. Within that side of things there's a lot of arguing about whether language has an inherent structure dictated by some unique human language faculty, and the arguments about that within linguistics can get heated. I don't want to end up being an example of someone talking about something I don't know enough about, and as soon as we start talking about the human body and brain, I'm not much of an expert anymore lol.
Cognition is really weird. Some people (including myself) don't have an internal dialogue at all even with a functioning Broca's area, and (at least in my experience) don't have any trouble with higher order thinking. Having no language to effectively learn or communicate those concepts with others would make things hard, though, which is really interesting.
Hey, that just prompted an interesting thought in me. Notice how language is very much removed from ideal information compression? So much redundancy in there that makes languages more complex to learn, makes sentences longer, just seems pointless? Well, it does start to make sense once you realize you can reconstruct what someone must've said (not just the gist of it, but the exact wording they must've used) even if you only heard part of their speech. That's obviously useful, but also not news.
Do you also know about that vague (or maybe not so vague if you speak to the right experts) notion that human abstract thought and human speech co-developed? That maybe abstract thought is much harder or even impossible without speech, even if it is just your inner voice?
Complete conjecture, but let's put those together: Speech is useful for abstract thought because it gives us a redundant (and thus robust) way of encoding abstract thought. By encoding your thought in language, even if you never speak those words, you give it a more robust encoding. Your brain can memorize that, and if later you remember that language encoding slightly wrong (as always happens) your brain can do some error correction, overwrite the faulty memory with an error-corrected one, and thus remember the thought. Contrast a high-density compressed format: In order to error-correct this thought later, you'd basically have to come up with it all anew, because what you remember will not be a slightly garbled but still readable sentence, but an equally plausible but wrong thought.
Nice theory, but I believe it falls apart as quite a lot of people lack an inner monologue. We may not be as dependent on language, as we think. Arguably, our visual system is the most complex.
Do you have any material on people who don't think with language?
Oh, this extends way beyond computers.
"This thing is hard for a non-human to do and therefore is a sign of human-level intelligence" has been used to separate humans from animals, and even other humans as "subhumans" for millennia. Since this human chauvinism has strong roots in religion, it's not going away anytime soon. We are special; only we have souls; only we can make important decisions, vote, drive; only we deserve certain rights. For a given definition of "we."
And we will continue to make that distinction because we're control freaks. We don't want to be studied, predicted, or categorized. We don't want to be same same with everything else, we want to dominate all of it.
We may be communal creatures but only because community enhances our own survival. It's really hard for us to empathize and sympathize, that's a secondary skill developed to better protect us from getting tricked.
We're just animals, like tigers, or other solitary creatures watching out for number 1. We're greedy by nature and will always be. We only move forward because the idea presented to us from someone else somehow benefits us.
Please see my other comment: https://tildes.net/~comp/194n/language_is_a_poor_heuristic_for_intelligence#comment-a34f
We are absolutely special. We just don’t have a clear boundary to draw. Sure, corvids can use and make tools, but we are very obviously on a completely different level from them.
Of course other readings (vote, rights) are beyond this topic and I’m not commenting on that.
…and soon as that thing is achieved, the goalposts are moved, rather than admit a new member to the human intellect club.
I love how much drama and distress AI is causing the "humans are supposed to be special" crowd by mimicking human quirks through unthinking dirt that has studied massive quantities of inane human writing.
Agreed.
I wonder if it's just the fact that we are miles away from actually getting to AGI, or if it's actually impossible for us to achieve because we don't even understand where our own consciousness comes from?
It's worth making a distinction between sentience and sapience. A (sapient) computer could be capable of performing any intelligent tasks beyond a human level and still have no qualia that would make it sentient.
I'd argue that we do have a decent idea of where conscious comes from. Since at least Julien Offray de La Mettrie we've observed the relationship between brain and mind.
If you get brain damage in a well understood part of the brain there's a good chance a neurologist could guess what your conscious experience would be like, and by the generalizability of that we chip away at the strong sollipsism @pienix.
A hundred patients came in with the same damage to their brain and gave similar self-reports? Odds are "the mind is what the brain does", even if that becomes inaccessibly complicated if you try to dig deeper.
I don't think norb is necessarily saying that consciousness doesn't come from the brain, I read that instead as a reference to what you call the "inaccessibly complicated" nature of it. We have a very abstract and high level descriptive model that tells us a strong relationship between the brain and consciousness exists, and we can use it to make very broad predictions, but that only gets you so far. Anyone can gather patient reports about different types of brain damage and subjective experiences, and use those reports to predict that the next guy with a substantially similar kind of brain damage will have a substantially similar subjective experience, but that doesn't really get you anywhere near an understanding of the connection between those two things.
I am making the argument that consciousness doesn't come just from the brain. I think there are a lot of ways the brain acts as the CPU (central processing unit) for the body, doing a lot of work in conjunction with other parts of the body. Somewhere in that tangled mess we have what we define as "consciousness."
I guess I was also trying to make the point that even agreeing on when consciousness begins and ends isn't something we can all agree on (not to make this into another argument/discussion but you can see the ways in which people disagree on when life begins when it's related to abortion discussions). Are dogs or cats conscious? I would argue yes, but I'm sure others would disagree. Do we assign it to just those animals that we see as companions (dogs or cats) vs food (cows or chickens)?
I think those types of questions make it very hard for us to assign the idea of consciousness to a computer program. That is what I meant by "is it even possible to do so?"
[EDIT]: Just to add I have no formal training or knowledge in any of these spaces. I just find the ideas fascinating. Loving the conversation for sure!
It's interesting to consider that we know that tons of animals are sentient but we don't know exactly to what degree other animals are sapient, whereas we can know exactly how sapient an AI is but understanding its sentience is impossible.
But I guess that just means they're a more pure expression of the philosophical question of proving the sentience of anyone else.
Somewhere Godel smiles (or at least those who have interpreted his Incompleteness Theorem through a philosophical lens). One could argue (and many have) that we can't understand our own consciousness because we are inside our consciousness. We'd have to ascend (through evolution or something else) to superhuman to then be able to grok regular ole humanity.
I’m not bought on interpreting Incompleteness Theories on top of consciousness, like it is quite a mental leap to say anything about consciousness’s relation to a rule system sufficiently complex to have arithmetics and to conclude that this one is an unprovable fact from that.
With that said, I’m more inclined to believe that consciousness is “simply” an emergent behavior of certain complex enough systems and there is not much to understand about it, similarly to how we can have laws regarding temperature, but it is not itself a thing - it is a statistics of a system. In this vein I don’t see a reason to place a superhuman limit - we jumped over the emergent behavior line and thus the limit is only complexity from here on (and the halting problem!)
To the best of my understanding, Gödel's Incompleteness Theorem is strictly about the limitations of logical systems.
Interpreting Gödel's work as applicable to consciousness, particularly in this manner, seems to me a rather large and unfounded stretch. This would require that consciousness is defined in an axiomatic manner (I'm unclear how this could ever be shown), and a proof that understanding consciousness is improvable (whatever that proof would like). Defining any of these mathematically is likely a lost cause.
I think there's possibly some validity in discussing whether we can understand consciousness, but dragging Gödel into the argument seems pointless to me.
A simple Google search would have sufficed to show you that people have been "dragging" Godel into the "argument" for decades. Godel himself knew that his theorem, though strictly about math/logic, would be interpreted through other lenses:
"It was something to be expected that sooner or later my proof will be made useful for religion, since that is doubtless also justified in a certain sense."
Torkel Franzen wrote an entire book (https://www.amazon.com/G%C3%B6dels-Theorem-Torkel-Franz%C3%A9n/dp/1568812388) on Godel's Theorem being "used and abused" (author's own words) by other fields.
So pointless or not, it has happened and will likely continue to happen until the end of time.
Maybe in pop culture, but overall I kind of disagree... the idea that language is not indicative of intelligence or understanding is not new, the Chinese Room argument originated in 1980.
Surprisingly, on reading that article to confirm the exact year I learned that the exploration of the concept of mechanically replicating the human mind predates computers. Leibniz - a VERY prominent mathematician who went toe to toe with Newton - was pondering this kind of thing as far back as the 1700s. Pretty fun fact, I didn't know that about him.
If you look the history of AI, you'll see that repeated frequently, even back in the early days. It was supposed that computers would be able to easily analyze pictures to distinguish the contents of an image. However, this has only been a realized technology since after the turn of the millennium, as it isn't as easy as early computer scientists thought it would be. They hadn't realized how hard the thing they wanted to do was until they actually tried to do it. In that same way, as modern computer science advances, each hurdle crossed only expands our understanding of just how much we don't know. Sure, technology like ChatGPT would have been considered magic 100 years ago, but it still doesn't equate to anything close to what the human brain is capable of. It is a steadily advancing field though, and surely we will reach true AI some day.
We are not lacking in real, hard boundaries distinguishing us from AI — we have discovered nuclear power, computers, we wrote AIs themselves, etc. But that would hardly be an interesting limit, so we are looking for the minimal limit for human-level intelligence, and we will obviously shoot a bit below from time to time.
This phenomenon is named the "Curse of AI" or something like it, think I read at some point before even before GPT.
I think it's important to understand the specific failure modes and flaws of GPT as it becomes more prominent in the world, but I intensely dislike the seemingly political importance placed on "intelligence is a single thing that is possessed or not, GPT has exactly none of it, and anyone trying to say otherwise is some kind of swindler trying to con you" present throughout this and the referenced writing of Emily Bender and Timnit Gebru. It's not a useful way to understand AI or the discussions around it. Taken most charitably, it seems like an overreaction to the possibility that people might think any intelligence means consciousness and that means GPT warrants rights or some special level of respect.
The field of AI has long been considered to include things even like theorem provers and chess bots. It's ahistorical to claim that it's an abuse of the term AI to associate GPT with it, especially as it's exceeded so many expectations and benchmarks set by people in the field. It's only in sci-fi stories that "AI" been synonymous with at-least human-level intelligence.
Frankly, the constant goalpost moving of "but this is not AI" is IMO best curtailed by a short nod in the direction of the term AGI. If what these people expect of AI is basically a human benchmark, then AGI and AI are the same thing. So where does that leave us? Well, AI, in contrast to AGI, is specialized. It can do some things, quite possibly better than a human, but not others. A chess engine is an AI. A recommendation engine is an AI. The 2-layer densenet that I trained to recognize MNIST-digits is an AI. Warcraft 3's bot build order is an AI.
Within that context, ChatGPT is of course an AI. The problem with LLMs is that humans communicate their intelligence using language, and LLMs are trained on large amounts of human language. They can imitate language well, they can even imitate intelligent conversation well. That does not per se make them intelligent. And one shouldn't put too much stock in the word "imitate" here, because sufficient imitation is completely indistinguishable from the real thing. In principle, I also don't see a fundamental limit on the degree of intelligence that can be reached using LLMs - I'd be more worried about the practical costs in terms of compute and data.
I think it's best to view intelligence as a multidimensional thing: ChatGPT has the short-term memory of a 3-day old, the capacity to learn (during conversation, not via backprop) of a 3 year old, the language capacity of a 20 year old, and knowledge of facts and general knowledge of a 500 year old, random memory lapses included. I think that's a reasonable first-degree approximation, but it surely isn't foolproof. The combination can fool humans quite well, as the 20 year old sounds confident, irrespective of whether the 3 year old or the 500 year old is determining the content. But I certainly occasionally see intelligence way in excess of what other entities commonly thought to be "intelligent" produce. The problem isn't that LLMs are not intelligent, it's that the way they communicate makes us expect more than is actually there.
A point the author spends good portion of this article proving is that the creators of these models themselves have put a lot of time and effort into deliberately blurring the line between AI and AGI, and trying to market these products as AGI, or at least consciously trying to evoke the aesthetics of it. She talks about a number of choices that these companies make to try to suggest to end users that the AI is more than it is. I suspect a lot of this is because of how much we used to hear about the Turing test. AI research used to be very obsessed with creating an AI that could appear and act human to the end user to pass the Turing test, and since we didn't have anything as advanced as LLMs yet, the way you did that was primarily through presentation and smoke and mirrors and language, which is how we've gotten where we've gotten.
Regardless, the effect is that most layman end users come out of an interaction with ChatGPT et al thinking that this is something qualitatively similar to AGI that just has to be iterated on to get the Star Trek computer. The Snapchat AI was a particularly good test of this for me because I got to watch dozens of friends have their first experience with an LLM and come away from it with their own impressions. And I don't know anyone who came away from that experience thinking correctly that the way it works is that it can't do anything but mimic language patterns, and a side effect of mimicking language patterns is that sometimes the language it's mimicking happens to have correct information in it. Instead I saw a lot of people come away thinking they can ask an LLM a question and get a correct answer. On the other hand, I'm in law school, and my law school friends by and large just decided it was broken and useless and not-quite-there-yet because basically any question you ask any of the existing LLMs about anything related to the law causes hallucinations. And yet even they were thinking like "wow, well once they add information about the law to the databases and teach it about the law this will be really cool," not understanding that teaching an LLM information or adding information to a database is fundamentally incompatible with how LLMs work.
And ultimately this feels like it's just a stupid semantic argument about whether AI is really AI or not really AI and a different word means what some people mean when they say AI bla bla bla bla bla. But I don't think that's really the crux of the argument the author is making, which is important in a world where people bullish on AI regularly write op-eds or get hundreds of thousands of Twitter likes discussing the integration of LLMs into education. Laypeople en masse have a fundamental misunderstanding of how LLMs work that has been deliberately promulgated primarily by people that are financially invested in the proliferation of LLMs, and that fundamental misunderstanding is driving public opinion on policy related to them. That seems bad!
One of my fun tests for an LLM is to ask them about someone you know personally but who isn't famous enough to have been in the training data scraped from the internet. The LLM will hallucinate an entire-ass background for this person you know dearly; a background that is clearly, unequivocally false.
I did this once on my best friend and the LLM reported that he was "an American entrepreneur and investor. He is the founder and CEO of Carstensen Ventures, a venture capital firm that invests in early-stage technology companies." "Carstensen Ventures" doesn't even exist, at least according to a cursory Google search.
Well of course it does, an LLM is a next word prediction intelligence, not a general observational one.
This is kind of like asking a dog to solve a math problem and concluding a dog isn't intelligent when it barks at you instead.
The beauty of an LLM is that you can give it information, like a background on your friend, and it can use that information, through predicting the next word, and draw real conclusions and predictions from it.
So I can give an LLM a bit of information and ask it to operate on that information pretty generally, and most often it'll do a good job of it. The sheer amazing thing is that this next word prediction machine can summarize a paragraph for me, despite being nothing but a next word prediction system.
It would be interesting to say, train an LLM and try to remove all of it's knowledge about how the world works, distilling it's knowledge to only the text it is given. This would result in the model refusing to give data which isn't drawn from its context and dramatically reduce false errors, but I image it would be a pretty hard task that would perform worse than the "knows all the information on the internet" model we have now.
You're right in the sense that this isn't how the models are trained. They are next token predictors. However, getting LLMs to query information from databases is something AI researchers are considering.
More generally, once LLMs are capable of using tools such as writing a database query or writing, compiling and running code for its own purposes they'll be very powerful. This will be the synthesis of good old fashioned symbolic AI with the current trends of connectionist ML.
Tool-usage training can be accomplished in a similar manner to how Reinforcement Learning with Human Feedback (RLHF) is being used to fine-tune the models now. The model can be rewarded with RL for writing correct SQL queries or Python code.
I think this is the bombshell next step that a lot of people aren't imagining. People are beginning to wrap their heads around the limitations of ChatGPT and the like, and assuming that's the extent of what's possible with the technology. Already I'm seeing people shrug it off but they're throwing out the baby with the bathwater.
Soon we're going to see these systems querying external sources for information, cross-referencing sources, fact-checking, then introspecting with themselves to validate that their conclusions are free of bias and contain accurate citations. Maybe they'll feed their responses through other systems to validate other things like correct syntax/behavior of code, etc. All before a single character is returned to the end user.
Current LLMs have huge flaws but they're just the primordial form of something yet to be realized.
Introspecting with oneself is not something LLMs have the capacity to do by their nature. You can very easily imagine apolz's point that they might be able to construct SQL queries. Maybe you have a pipeline of LLM inputs and outputs where one run of the LLM generates an SQL query based on the user's query, then the program polls a database using that query, then the LLM is again asked to convert that into human-readable content with something like "answer this question assuming the following to be true" (however I definitely have questions about the utility of this because you still end up in the same position of unverifiability you're in right now, though that's not a conversation I really care about having). This is so easy to imagine because converting text from one form to another is one of the things LLMs are actually pretty good at, and you can just string some dumb code or more traditional AI in the middle between the two text transform steps to make the more mechanical and information-oriented parts of it work and you're good.
But LLMs cannot query external sources for information, nor cross-reference sources, nor fact-check, nor introspect, because all of these are tasks that require some concept of information, and information is fundamentally incompatible with how LLMs work under the hood. I respectfully think you're not grasping the point of the article which is that an LLM does not and cannot possess information, it mimics language, and it turns out this gets you a shockingly close simulacrum of possessing information, which is good enough for a lot of uses! But that doesn't get you any closer to knowing how to create an AI that does the things you're describing because to do those things you'd need to also invent a completely different type of AI that works via a completely different method. There is no iterative development process that can be performed on an LLM that gets you a machine that knows things.
Maybe if AGI is possible then LLMs will be part of the pipeline for how they process input into and generate output from whatever currently future-tech AI actually does the "knowing" and the "thinking", and that currently future-tech AI could do the things you're describing, but that's not how LLMs work or are able to work, and our (current?) inability to create an AI that could do those things is the entire reason why we instead settled for making an AI that subjectively feels like it's doing those things when it actually isn't.
What does it actually mean to "know something" as a human?
Do you need to be able to recall it with 100% accuracy? 99% accuracy? How good is human-level "knowledge" on a probability scale?
Do you need to be able to transform it and apply it to new and different contexts?
Do you know the capital of Kyrgyzstan? An LLM does, and it can write you a story about a person living there, pulling the relevant knowledge from its weights. Perhaps if I do a million generations, I will get a wrong or incoherent answer.
What an LLM is incapable of is "feeling" the truthiness of any given statement. Where humans will (infrequently) say "I don't know" or refuse to make up bullshit, an LLM will do so because there's no internal feedback loop or self-assessment. This is different from not having any knowledge at all.
Yeah I'm gonna bow out after this one before this truly just gets stuck in the mud as a definition debate, but:
Yes, an LLM can often give an answer to a question correctly. Is being able to produce text that encodes some information the same thing as knowledge? I mean, maybe! That's not really relevant though. What is relevant is that an LLM does not conceive of that as information, per se, it conceives of it purely in terms of language. What I'm getting at is that you cannot just give an LLM new information, you can only give it new language, and though that language may encode useful information, the LLM is only ever capable of using that information as language.
I brought this up because it means you can't just teach an LLM to start introspecting, or fact-checking, because those are new processes, and you cannot teach an LLM a new process, you can only give it new language. You can give it language that tends to indicate introspection, but that doesn't actually result in anything resembling an introspective process or a fact-checking process occurring under the hood, because there's no code behind that. What it results in is exactly what LLMs do when you ask them to introspect: they say "Sorry! On second thought I got that wrong, didn't I?", but no introspective process has occurred. In fact you can easily get them to say sorry even if they were originally right! Saying sorry is the simple result of mimicking the language that people tend to have after introspecting.
You can teach an LLM how to introspect, or how to fact-check, by adding language associated with these tasks to its training data, and this will result in it being able to explain to you how to introspect and fact-check, and it will result in it confidently speaking in the language of introspection and the language of fact-checking when prompted, but it will not prompt it to conduct introspection or fact-checking. The LLM does not even understand these processes as things that it has the ability to do, or things that anyone has the ability to do. This is relevant because I was responding to someone talking about introspection, fact-checking, etc as new processes that they believe you can teach LLMs how to do.
Whether a mimic of language which is sufficiently advanced that it often correctly responds to queries can or cannot be said to "know" something is not the question I was raising and is ultimately not very interesting to me because it's just a definition debate. What I was saying was that an LLM by definition does one single process (transforming text) very well and that single process also happens to give it the ability to mimic a lot of other processes (finding information, etc etc etc) but it cannot actually perform any other processes, nor can it be made to perform any other processes because it does not have a model for anything except for text and the transformation thereof.
I'm going to have to disagree on this one here. To be clear, I don't think current LLMs are how we'll get there, as the compute and data requirements are kind of insane, but I think it is fundamentally plausible to train a transformer LLM to actually introspect. After all, it's just another statistical pattern that underlies the data generation process. It is, thus, contained in the data. Rarely, because people on the internet rarely introspect noticably, which is driving the difficulty here.
Let's imagine the following: You have a dataset of human conversations, where one party is doing introspection, and that introspection is documented in the data. My claim here is that it is fundamentally impossible for the LLM to achieve high validation scores on this dataset without actually doing the introspection. Whether one wants to sink into a semantic argument about "it's only introspection if it works by the same mechanisms as human introspection" is your choice, but I won't follow there because from an external perspective it's indistinguishable. The only remaining question is whether a transformer architecture is in principle capable of this. This is a bit of a tricky one because you're kind of sandwiched between a runtime that's linear in the model size (for some definition of model size) thus prohibiting turing completeness on one side; and the universal approximation theorem on the other side. Add enough stuff and you can do anything, but you have to actually add the stuff. Adding more computation time to the same neural algorithm means adding more neurons that you have to now also train. But if we do the tried and true method of saying "well, after 20 layers of introspection surely we'll have our result at the latest" and calling it quits there, we'll get our answer in 99% of cases. More if you're willing to add more layers. That's hardly a fundamental reason why it can not be. And maybe a smart cookie will come up with some clever way of doing a fixed point iteration or something on the output of a network, adaptively adding more computation as needed. That'd sort that out.
Now, ok, why can't chatGPT do introspection then? Two possible explanations: (1) the phenomenon is too rare in our data to actually induce the at least modestly complex neural pathways it requires. (2) the expressiveness of our LLMs is too small to capture these complex pathways. The solutions to those two are -conceptually- very simple: moar data and moar neurons respectively.
Now, granted I'm firmly in the camp that thinks it infeasible to gather so much data and assemble so many neurons as to actually pull this off using transformer LLMs. More efficient architectures are needed. Ideally those architectures betray the internal mechanisms used by the model, thus satisfying the mechanistic crowd that needs the AI to solve the task the same way a human would.
I enjoyed this essay about large language models (LLMs) like ChatGPT and how we as humans have a hard time realizing that what they produce is basically a really advanced auto-complete function and not a sign of actual intelligence. There are some comparisons made to humans with "neurodivergence" that I can't really speak to, but I thought it's an interesting observation comparing how we treat humans that cannot speak vs computer that can.
A quote from the essay:
Part of my job is trying to explain some of this technology to non-technical people, and this is really hard to get across. The LLMs will sound confident because they've been programmed to sound that way. There are probably a number of reasons their designers have chosen to do so, but in my opinion only one that matters; money.
These companies are trying to sell us on the sci-fi ideal of AI without actually being able to give us that. In other words, a human-like intelligence that can think and reason and lie (and know that it's lying!). They want us to put our trust (or better yet, our money) into these systems.
I truly believe that the next phase of this experiment is going to be people losing trust in anything they read online. They won't be able to know if it what they're reading, seeing, or hearing was written by a real human (with all the baggage that comes with that, but baggage we at least have spent the last few hundred thousand years learning to live with) or if it was something spit out by an algorithm that doesn't really understand why it says what it says. The intent behind the words becomes even more obtuse than it would be if we assumed a person was there writing it.
A final quote from the article that really hit home for me:
Interesting read, and most of it agreeable. An LLM is a language model, and people should treat it as such. However, properly applied, it might produce useful information (eg summarizing, translating, etc). Useful, but not necessarily errorless.
I do have some thoughts with the article, though. One of them is the fact that the author conflates LLM with AI in general.
There is a difference between trusting information from a trained LLM, and an AI specifically designed and trained for a specific task (eg pattern recognition in complex data, recognizing tumors in a scan). These are things that are difficult to do accurately in traditional data analysis, and where AI outperforms traditional software and even experienced humans.
Another point, not specifically aimed at the author, but at the discussion of a AI 'conscience' in general. People might overestimate the 'understanding' of an AI, but I would say we also overestimate our own 'understanding' and 'consciousness'. Whatever these concepts are, they are (complex) processes in our brain, which can be reduced to basic physical processes. Processes that are not that different from species where we don't assume 'understanding' and 'consciousness'. So I see it merely as an emergent property, and not something inherent, not something you can measure or detect. In fact, you can only be sure you yourself are conscious and can understand something. Everybody else could be an advanced LLM. But if you cannot know the difference, is there a difference?
I think it is difficult to say whether or not an AI might have a consciousness (in the future, obviously we're not there yet) if we aren't able to clearly define what a consciousness is.
I just made a comment similar to this to another reply here. I think there is some real issue with a human created AGI when we can't explain or understand where our own consciousness comes from.
I was interested in foundation models about 10 years before transformers really took off and was involved in various efforts from "semantic technology" similar to Cyc to vector embeddings for patent search, CNNs, RNNs, LSTMs, etc.
I came to believe that the "language instinct" was a peripheral of an animal and you couldn't get language to work without modelling animal intelligence.
ChatGPT performs a lot better on language swirling on itself than I would have thought possible. It will, at least for a while, revive the discredited philosophy of structuralism that sees language as a model for everything else in the social science and human experience. (e.g. the structuralists all became post-structuralists.) Linguistics becoming paradigmatic by adopting the ideas of Noam Chomsky really killed it off.
Oddly, transformational grammars are "paradigmatic" in that quite literally they are a paradigm: you can formulate problems in that framework, solve them, write papers, etc. However, from an engineering viewpoint it doesn't tell you to implement language technology and from the viewpoint of the structuralists it is just too hard and not satisfying enough. (Maybe Alain Badiou could have pulled off some abuse of transformational grammar but the other French theorists wouldn't work that hard.)
Now that ChatGPT proves that "language is all you need" or at least fakes it very well, structuralism will return, at least for a while. The thing is the real competence of ChatGPT is seduction, you want to believe in it, particularly you want to believe the social gestures. When you start making excuses for it you are doomed. The ignorant and indolent are so sure that ChatGPT can write a winning pitch deck for them that they'll wind up working much, much harder than they really want to work endlessly pushing a bubble around under the rug that you never get rid of because ChatGPT has no animal intelligence, no shame, etc.