One thing I maybe should note, I don’t think Yudkowsky ever actually said “in the limit” per se, that was me glosseing various things he said, and I’m suddenly worried about subtle games of telephone about whatever he meant.
Another thing I thought of reading this (and maybe @johnswentworth’s Framing Practicum finally paying off, is that a better word than “limit” might be “equilibrium.”
i.e. this isn’t (necessarily) about “there is some f(x), where if you dial up X from 10 to 11 to 100 to 10,000, you expect f(x) to approach some limit”. A different angle of looking it is “what are the plausible stable equilibria that a mind could end up in, or the solar-system-system could end up in?”
A system reaching equilibria includes multiple forces pushing on stuff and interacting with each other, until they settle into a shape where it’s hard to really move the outcome –until something new shocks the system.
...
Some ~specific things you might care about the equilibrium of:
A. One particular AI mind – given some initial conditions, after somehow achieving a minimum threshold of relentless-creative-resourcefulness, and the ability to modify itself and/or it’s environment, and it has whatever combo of goals/impulses it turns out to have.
The equilibrium includes “what will the mind end up doing with itself” and also “how will the outside world try to apply pressure to the mind, and how will the mind apply pressure back?”.
B. The human economy/geopolitical-system. Given that there are lots of groups trying to build AI, there’s a clear economic incentive to do so if you don’t believe in doom, and it’s going to get easier over time. (But also, there are reasons for various political factions to oppose this).
Does this eventually produce a mind, with the conditions to kick off the previous point?
C. The collection of AI minds that end up existing, once some of them hit the minimum relentless-creative-resourcesfullness necessary to kick off A?
...
But translating back into limits:
Looking at your list of “which of these f(x)s are we talking about?”, the answer is “the humanity meta-system that includes all of B.”
“X” is “human labor + resource capital + time, etc”.
The “F” I’m most focused on is “the process of looking at the current set of AI systems, and asking ‘is there a way to improve how much profit/fame/power we can get out of this?’, and then creatively selecting a thing (such as from your list of things above), and then trying it.”
(It’s also useful to ask “what’s F?” re: a given transformer gradient descent architecture, given a set of training data and a process for generating more training data. But, that’s a narrower question, and most such systems will not be the “It” that would kill everyone if anyone build it)
...
Having said that:
“f(x)”, where f is “all human ingenuity focused on building AGI, + all opposed political focuses”, is a confusing type, yes.
I mentioned elsewhere, the “confusing type” is the problem. (or, “a” problem). We are inside a “Find the Correct Types” problem. The thing to do when you’re in a Find the Correct Types problem, is bust out your Handle Confusion and Find the Correct Types toolkit.
I am not a Type Theory Catgirl, but, some early steps I’d want to take are:
map out everything I am confused by that seems relevant (see if some confusions dissolve when I look at them)
map out everything important that seems relevant that I’m not confused by
map out at least a few different ways of structuring the problem. (including, maybe this isn’t actually best thought of as a Type Theory problem)
And part of my response to “f(x) is confusing” is to articulate the stuff above, which hopefully narrows down the confusion slightly. But, I’d also say, before getting to the point of articulating the above, “a’ight, seems like the structure here is something like”
1. AI will probably eventually get built somewhere. It might FOOM. It might takeover. And later, evolution might destroy everything we care about. (You might be uncertain about these, and might confused about some sub-pieces, but I don’t think you-in-particular were confused about this bit)
2. There will be some processes that take in resources and turn them into more intelligence. [FLAG: confused about what this process is and what inputs it involved. But, call this confusing thing f(x)]
3. There are lots of different possible shapes of f(x), I’m confused about that
4. But, the reason I care about f(x) is so that I know either a) will a given AI system FOOM or Takeover? or b) is it capable of stopping other things from FOOMing or taking over? and c) is it capable of preventing death-by-evolution, without causing worse side effects?
And #4 is what specifies which possible ways of resolving confusing bits are most useful. It specifically implies we need to be talking about pretty high powerlevels. However you choose to wrap your brain around it, it somehow need to eventually help you think about extremely high power levels.
So, like, yep “in the limit” is confusing and underspecified. But, it’s meant to be directing your attention to aspects of the confusingness that are more relevant.
Nod, makes sense.
One thing I maybe should note, I don’t think Yudkowsky ever actually said “in the limit” per se, that was me glosseing various things he said, and I’m suddenly worried about subtle games of telephone about whatever he meant.
Another thing I thought of reading this (and maybe @johnswentworth’s Framing Practicum finally paying off, is that a better word than “limit” might be “equilibrium.”
i.e. this isn’t (necessarily) about “there is some f(x), where if you dial up X from 10 to 11 to 100 to 10,000, you expect f(x) to approach some limit”. A different angle of looking it is “what are the plausible stable equilibria that a mind could end up in, or the solar-system-system could end up in?”
A system reaching equilibria includes multiple forces pushing on stuff and interacting with each other, until they settle into a shape where it’s hard to really move the outcome –until something new shocks the system.
...
Some ~specific things you might care about the equilibrium of:
A. One particular AI mind – given some initial conditions, after somehow achieving a minimum threshold of relentless-creative-resourcefulness, and the ability to modify itself and/or it’s environment, and it has whatever combo of goals/impulses it turns out to have.
The equilibrium includes “what will the mind end up doing with itself” and also “how will the outside world try to apply pressure to the mind, and how will the mind apply pressure back?”.
B. The human economy/geopolitical-system. Given that there are lots of groups trying to build AI, there’s a clear economic incentive to do so if you don’t believe in doom, and it’s going to get easier over time. (But also, there are reasons for various political factions to oppose this).
Does this eventually produce a mind, with the conditions to kick off the previous point?
C. The collection of AI minds that end up existing, once some of them hit the minimum relentless-creative-resourcesfullness necessary to kick off A?
...
But translating back into limits:
Looking at your list of “which of these f(x)s are we talking about?”, the answer is “the humanity meta-system that includes all of B.”
“X” is “human labor + resource capital + time, etc”.
The “F” I’m most focused on is “the process of looking at the current set of AI systems, and asking ‘is there a way to improve how much profit/fame/power we can get out of this?’, and then creatively selecting a thing (such as from your list of things above), and then trying it.”
(It’s also useful to ask “what’s F?” re: a given transformer gradient descent architecture, given a set of training data and a process for generating more training data. But, that’s a narrower question, and most such systems will not be the “It” that would kill everyone if anyone build it)
...
Having said that:
“f(x)”, where f is “all human ingenuity focused on building AGI, + all opposed political focuses”, is a confusing type, yes.
I mentioned elsewhere, the “confusing type” is the problem. (or, “a” problem). We are inside a “Find the Correct Types” problem. The thing to do when you’re in a Find the Correct Types problem, is bust out your Handle Confusion and Find the Correct Types toolkit.
I am not a Type Theory Catgirl, but, some early steps I’d want to take are:
map out everything I am confused by that seems relevant (see if some confusions dissolve when I look at them)
map out everything important that seems relevant that I’m not confused by
map out at least a few different ways of structuring the problem. (including, maybe this isn’t actually best thought of as a Type Theory problem)
And part of my response to “f(x) is confusing” is to articulate the stuff above, which hopefully narrows down the confusion slightly. But, I’d also say, before getting to the point of articulating the above, “a’ight, seems like the structure here is something like”
And #4 is what specifies which possible ways of resolving confusing bits are most useful. It specifically implies we need to be talking about pretty high powerlevels. However you choose to wrap your brain around it, it somehow need to eventually help you think about extremely high power levels.
So, like, yep “in the limit” is confusing and underspecified. But, it’s meant to be directing your attention to aspects of the confusingness that are more relevant.