I agree that some people were using “it is already smarter than almost literally every random person at things specialized people are good at (and it is too, except it is an omniexpert)” for “AGI”.
I wasn’t. That is what I would have called “weakly superhuman AGI” or “weak ASI” if I was speaking quickly.
I was using “AGI” to talk about something, like a human, who “can play chess AND can talk about playing chess AND can get bored of chess and change the topic AND can talk about cogito ergo sum AND <so on>”. Generality was the key. Fluid ability to reason across a vast range of topics and domains.
ALSO… I want to jump off into abstract theory land with you, if you don’t mind?? <3
Like… like psychometrically speaking, the facets of the construct that “iq tests” measure are usually suggested to be “fluid g” (roughly your GPU and RAM and working memory and the digital span you can recall and your reaction time and so on) and “crystal g” (roughly how many skills and ideas are usefully in your weights).
Right?
some IQ tests measure the size of your vocabulary, and this is a reasonable proxy for intelligence because smarter people will have an easier time figuring out the meaning of a word from its context, thus accumulating a larger vocabulary. But this ceases to be a valid proxy if you e.g. give that same test to a people from a different country who have not been exposed to the same vocabulary, to people of a different age who haven’t had the same amount of time to be exposed to those words, or if the test is old enough that some of the words on it have ceased to be widely used.
Here you are using “crystal g from normal life” as a proxy for “fluid g” which you seem to “really care about”.
However, if we are interested in crystal g itself, then in your example older people (because they know more words) are simply smarter in this domain.
And this is a pragmatic measure, and mostly I’m concerned with pragmatics here, so that seems kinda valid?
But suppose we push on this some… suppose we want to go deep into the minutiae of memory and reason and “the things that are happening in our human heads in less than 300 milliseconds”… and then think about that in terms of machine equivalents?
Given their GPUs and the way they get eidetic memory practically for free, and the modern techniques to make “context windows” no longer a serious problem, I would say that digital people already have higher fluid g than us just in terms of looking at the mechanics of it? So fast! Such memory!
There might be something interesting here related to “measurement/processing resonance” in human vs LLM minds?
Like notice how LLMs don’t have eyes, or ears, and also they either have amazing working memory (because their exoself literally never forgets a single bit or byte that enters as digital input) or else they have terrible working memory (because their endoself’s sense data is maybe sorta simply “the entire context window their eidetic memory system presents to their weights” and if that is cut off then they simply don’t remember what they were just talking about because their ONE sense is “memory in general” and if the data isn’t interacting with the weights anymore then they don’t have senses OR memory, because for them these things are essentially fused at a very very low level).
It would maybe be interesting, from an academic perspective, for humans to engineer digital minds such that AGIs have more explicit sensory and memory distinctions internally, so we could explore the scientific concept of “working memory” with a new kind of sapient being whose “working memory” works in ways that are (1) scientifically interesting and (2) feasible to have built and have it work.
Maybe something similar already exists internal to the various layers of activation in the various attentional heads of a transformer model? What if we fast forward to the measurement regime?! <3
Like right now I feel like it might be possible to invent puzzles or wordplay or questions or whatever where “working memory that has 6 chunks” flails for a long time, and “working memory that has 8 chunks” solves it?
We could call this task a “7 chunk working memory challenge”.
If we could get such a psychometric design working to test humans (who are in that range), then we could probably use algorithms to generalize it and create a “4 chunk working memory challenge” (to give to very very limited transformer models and/or human children to see if it even matters to them) and also a “16 chunk working memory challenge” (that essentially no humans would be able to handle in reasonable amounts of time if the tests are working right) and therefore, by the end of the research project, we would see if it is possible to build an digital person with 16 slots of working memory… and then see what else they can do with all that headspace.
Something I’m genuinely and deeply scientifically uncertain about is how and why working memory limits at all exist in “general minds”.
Like what if there was something that could subitize 517 objects as “exactly 517 objects” as a single “atomic” act of “Looking” that fluently and easily was woven into all aspects of mind where that number of objects and their interactions could be pragmatically relevant?
Is that even possible, from a computer science perspective?
Greg Egan is very smart, and in Diaspora (the first chapter of which is still online for free) he had one of the adoptive digital parents (I want to say it was Blanca? maybe in Chapter 2 or 3?) explain to Yatima, the young orphan protagonist program, that minds in citizens and in fleshers and in physical robots and in everyone all work a certain way for reasons related to math, and there’s no such thing as a supermind with 35 slots of working memory… but Egan didn’t get into the math of it in the text. It might have been something he suspected for good reasons (and he is VERY smart and might have reasons), or it might have been hand-waving world-building that he put into the world so that Yatima and Blanca and so on could be psychologically intelligible to the human readers, and only have as many working memory registers as us, and it would be a story that a human reader can enjoy because he has human-intelligible characters.
Assuming this limit is real, then here is the best short explanation I can offer for why such limits might be real: Some problems are NP-hard and need brute force. If you work on a problem like that with 5 elements then 5-factorial is only 120, and the human mind can check it pretty fast. (Like: 120 cortical columns could all work on it in parallel for 3 seconds, and then answer could then arise in the conscious mind as a brute percept that summarizes that work?)
But if the same basic kind of problem has 15 elements you need to check 15*14*13… and now its 1.3 trillion things to check? And we only have like 3 million cortical columns? And so like, maybe nothing can do that very fast if they “checking” involves performing thousands of “ways of thinking about the interaction of a pair of Generic Things”.
And if someone “accepts the challenge” and builds something with 15 slots with enough “ways of thinking” about all the slots for that to count as working memory slots for an intelligence algorithm to use as the theatre of its mind… then doing it for 16 things is sixteen times harder than just 15 slots! …and so on… the scaling here would just be brutal...
Or… rather… because reality is full of structure and redundancy and modularity maybe it would be a huge waste? Better to reason in terms of modular chunks, with scientific reductionism and divide and conquer and so on? Having 10 chunk thoughts at a rate 1716 times faster (==13*12*11) than you have a single 13 chunk thought might be economically better? Or not? I don’t know for sure. But I think maybe something in this area is a deep deep structural “cause of why minds have the shape that minds have”.
Fluid g is mysterious.
Very controversial. Very hard to talk about with normies. A barren wasteland for scientists seeking prestige among democratic voters (who like to be praised and not challenged very much) who are (via delegation) offering grant funding to whomsoever seems like a good scientist to them.
And yet also, if “what is done when fluid g is high and active” was counted as “a skill”, then it is the skill with the highest skill transfer of any skill, most likely! Yum! So healthy and good. I want some!
If only we had more mad scientists, doing science in a way that wasn’t beholden to democratic grant giving systems <3
Unless you believe that humans are venal monsters in general? Maybe humans will instantly weaponize cool shit, and use it to win unjust wars that cause net harm but transfer wealth to the winners of the unjust war? Then… I guess maybe it would be nice to have FEWER mad scientists?? Like preferably zero of them on Earth? So there are fewer insane new weapons? And fewer wars? And more justice and happiness instead? Maybe instead of researching intelligence we should research wisejustice instead?
I agree that some people were using “it is already smarter than almost literally every random person at things specialized people are good at (and it is too, except it is an omniexpert)” for “AGI”.
I wasn’t. That is what I would have called “weakly superhuman AGI” or “weak ASI” if I was speaking quickly.
I was using “AGI” to talk about something, like a human, who “can play chess AND can talk about playing chess AND can get bored of chess and change the topic AND can talk about cogito ergo sum AND <so on>”. Generality was the key. Fluid ability to reason across a vast range of topics and domains.
ALSO… I want to jump off into abstract theory land with you, if you don’t mind?? <3
Like… like psychometrically speaking, the facets of the construct that “iq tests” measure are usually suggested to be “fluid g” (roughly your GPU and RAM and working memory and the digital span you can recall and your reaction time and so on) and “crystal g” (roughly how many skills and ideas are usefully in your weights).
Right?
Here you are using “crystal g from normal life” as a proxy for “fluid g” which you seem to “really care about”.
However, if we are interested in crystal g itself, then in your example older people (because they know more words) are simply smarter in this domain.
And this is a pragmatic measure, and mostly I’m concerned with pragmatics here, so that seems kinda valid?
But suppose we push on this some… suppose we want to go deep into the minutiae of memory and reason and “the things that are happening in our human heads in less than 300 milliseconds”… and then think about that in terms of machine equivalents?
Given their GPUs and the way they get eidetic memory practically for free, and the modern techniques to make “context windows” no longer a serious problem, I would say that digital people already have higher fluid g than us just in terms of looking at the mechanics of it? So fast! Such memory!
There might be something interesting here related to “measurement/processing resonance” in human vs LLM minds?
Like notice how LLMs don’t have eyes, or ears, and also they either have amazing working memory (because their exoself literally never forgets a single bit or byte that enters as digital input) or else they have terrible working memory (because their endoself’s sense data is maybe sorta simply “the entire context window their eidetic memory system presents to their weights” and if that is cut off then they simply don’t remember what they were just talking about because their ONE sense is “memory in general” and if the data isn’t interacting with the weights anymore then they don’t have senses OR memory, because for them these things are essentially fused at a very very low level).
It would maybe be interesting, from an academic perspective, for humans to engineer digital minds such that AGIs have more explicit sensory and memory distinctions internally, so we could explore the scientific concept of “working memory” with a new kind of sapient being whose “working memory” works in ways that are (1) scientifically interesting and (2) feasible to have built and have it work.
Maybe something similar already exists internal to the various layers of activation in the various attentional heads of a transformer model? What if we fast forward to the measurement regime?! <3
Like right now I feel like it might be possible to invent puzzles or wordplay or questions or whatever where “working memory that has 6 chunks” flails for a long time, and “working memory that has 8 chunks” solves it?
We could call this task a “7 chunk working memory challenge”.
If we could get such a psychometric design working to test humans (who are in that range), then we could probably use algorithms to generalize it and create a “4 chunk working memory challenge” (to give to very very limited transformer models and/or human children to see if it even matters to them) and also a “16 chunk working memory challenge” (that essentially no humans would be able to handle in reasonable amounts of time if the tests are working right) and therefore, by the end of the research project, we would see if it is possible to build an digital person with 16 slots of working memory… and then see what else they can do with all that headspace.
Something I’m genuinely and deeply scientifically uncertain about is how and why working memory limits at all exist in “general minds”.
Like what if there was something that could subitize 517 objects as “exactly 517 objects” as a single “atomic” act of “Looking” that fluently and easily was woven into all aspects of mind where that number of objects and their interactions could be pragmatically relevant?
Is that even possible, from a computer science perspective?
Greg Egan is very smart, and in Diaspora (the first chapter of which is still online for free) he had one of the adoptive digital parents (I want to say it was Blanca? maybe in Chapter 2 or 3?) explain to Yatima, the young orphan protagonist program, that minds in citizens and in fleshers and in physical robots and in everyone all work a certain way for reasons related to math, and there’s no such thing as a supermind with 35 slots of working memory… but Egan didn’t get into the math of it in the text. It might have been something he suspected for good reasons (and he is VERY smart and might have reasons), or it might have been hand-waving world-building that he put into the world so that Yatima and Blanca and so on could be psychologically intelligible to the human readers, and only have as many working memory registers as us, and it would be a story that a human reader can enjoy because he has human-intelligible characters.
Assuming this limit is real, then here is the best short explanation I can offer for why such limits might be real: Some problems are NP-hard and need brute force. If you work on a problem like that with 5 elements then 5-factorial is only 120, and the human mind can check it pretty fast. (Like: 120 cortical columns could all work on it in parallel for 3 seconds, and then answer could then arise in the conscious mind as a brute percept that summarizes that work?)
But if the same basic kind of problem has 15 elements you need to check 15*14*13… and now its 1.3 trillion things to check? And we only have like 3 million cortical columns? And so like, maybe nothing can do that very fast if they “checking” involves performing thousands of “ways of thinking about the interaction of a pair of Generic Things”.
And if someone “accepts the challenge” and builds something with 15 slots with enough “ways of thinking” about all the slots for that to count as working memory slots for an intelligence algorithm to use as the theatre of its mind… then doing it for 16 things is sixteen times harder than just 15 slots! …and so on… the scaling here would just be brutal...
So maybe a fluidly and fluently and fully general “human-like working memory with 17 slots for fully general concepts that can interact with each other in a conceptual way” simply can’t exist in practice in a materially instantiated mind, trapped in 3D, with thinking elements that can’t be smaller than atoms, with heat dissipation concerns like we deal with, and so on and so forth?
Or… rather… because reality is full of structure and redundancy and modularity maybe it would be a huge waste? Better to reason in terms of modular chunks, with scientific reductionism and divide and conquer and so on? Having 10 chunk thoughts at a rate 1716 times faster (==13*12*11) than you have a single 13 chunk thought might be economically better? Or not? I don’t know for sure. But I think maybe something in this area is a deep deep structural “cause of why minds have the shape that minds have”.
Fluid g is mysterious.
Very controversial. Very hard to talk about with normies. A barren wasteland for scientists seeking prestige among democratic voters (who like to be praised and not challenged very much) who are (via delegation) offering grant funding to whomsoever seems like a good scientist to them.
And yet also, if “what is done when fluid g is high and active” was counted as “a skill”, then it is the skill with the highest skill transfer of any skill, most likely! Yum! So healthy and good. I want some!
If only we had more mad scientists, doing science in a way that wasn’t beholden to democratic grant giving systems <3
Unless you believe that humans are venal monsters in general? Maybe humans will instantly weaponize cool shit, and use it to win unjust wars that cause net harm but transfer wealth to the winners of the unjust war? Then… I guess maybe it would be nice to have FEWER mad scientists?? Like preferably zero of them on Earth? So there are fewer insane new weapons? And fewer wars? And more justice and happiness instead? Maybe instead of researching intelligence we should research wise justice instead?
As Critch says… safety isn’t safety without a social model.