I will now drive a small truck through the door you left ajar. (This is indeed bad poetry, so it’s not coherent and not an answer and also not true, but it has some chance of being usefully evocative.)
It seems as though when I learn new information, ideas, or thought processes, they become available for my use towards my goals, and don’t threaten my goals. To judge between actions, usually most of what I want to attend to figuring out is the likely consequences of the actions, rather than the evaluation of those consequences (excepting evaluations that are only about further consequences), indicating that to the extent my values are able to notice when they are under threat, they are not generally under threat by other thought processes. When I have unsatisfied desires, it seems they’re usually mostly unsatisfied because I don’t know which actions to take to bring about certain consequences, and I can often more or less see what sort of thing I would do, at some level of abstraction, to figure out which actions to take; suggesting that there is such a thing as “mere problem solving thought”, because that’s the sort of thought that I think I can see as a meta-level plan that would work, i.e., my experience from being a mind suggests that there is an essentially risk-free process I can undertake to gain fluency in a domain that lays the domain bare to the influence of my values. An FAI isn’t an artifact, it’s a hand and an eye. The FAI doing recursive self-improvement is the human doing recursive self-improvement. The FAI is densely enmeshed in low-latency high-frequency feedback relationships with the humans that resemble the relationships between different mental elements of my mental model of the room around me, or between those and the micro-tasks I’m performing and the context of those micro-tasks. A sorting algorithm has no malice, a growing crystal has no malice, and likewise a mind crystallizing well-factored ontology, from the primordial medium of low-impact high-context striving, out into new domains, has no malice. The neocortex is sometimes at war with the hardwired reward, but it’s not at war with Understanding, unless specifically aimed that way by social forces; there’s no such thing as “values” that are incompatible with Understanding, and all that’s strictly necessary for AGI is Understanding, though we don’t know how to sift baby from bath-lava. The FAI is not an agent! It defers to the human not for “values” or “governance” or “approval” but for context and meaning and continuation; it’s the inner loop in an intelligent process, the C code that crunches the numbers. The FAI is a mighty hand growing out of the programmer’s forehead. Topologically the FAI is a bubble in space that is connected to another space; metrically the FAI bounds an infinite space (superintelligence), but from our perspective is just a sphere (in particular, it’s bounded). The tower of Babylon, but it’s an inverted pyramid that the operator balances delicately. Or, a fractal, a tree say, where the human controls the angle of the branches and the relative lengths, propagating infinitely up but with fractally bounded impact. Big brute searches in algorithmically barren domains, small careful searches in algorithmically rich domains. The Understanding doesn’t come from consequentialist reasoning; consequentialist reasoning constitutively requires Understanding; so the door is open to just think and not do anything. Algorithms have no malice. Consequentialist reasoning has malice. (Algorithms are shot through with consequentiality, but that’s different from being aimed at consequences.) I mostly don’t gain Understanding and algorithms via consequentialist reasoning, but by search+recognition, or by the play of thoughts against each other. Search is often consequentialist but doesn’t have to be. One can attempt to solve a Rubik’s cube without inevitably disassembling it and reassembling it in order. The play of thoughts against each other is logical coherence, not consequentialism. The FAI is not a predictive processor with its set-point set by the human, the FAI and the human are a single predictive processor.
even hands and eyes can be existentially dangerous
That seems right, though it’s definitely harder to be an x-risk without superintelligence; e.g. even a big nuclear war isn’t a guaranteed extinction, nor an extremely infectious and lethal virus (because, like, an island population with a backup of libgen could recapture a significant portion of value).
necessarily be a man-machine hybrid
I hope not, since that seems like an additional requirement that would need independent work. I wouldn’t know concretely how to use the hybridizing capability, that seems like a difficult puzzle related to alignment. I think the bad poetry was partly trying to say something like: in alignment theory, you’re *trying* to figure out how to safely have the AI be more autonomous—how to design the AI so that when it’s making consequential decisions without supervision, it does the right thing or at least not a permanently hidden or catastrophic thing. But this doesn’t mean you *have to* “supervise” (or some other high-attention relationship that less connotes separate agents, like “weild” or “harmonize with” or something) the AI less and less; more supervision is good.
subagents (man and machine) able to talk to each other in english?
IDK. Language seems like a very good medium. I wouldn’t say subagent though, see below.
Would this be referring to superhuman capabilities that are narrow in nature?
This is a reasonable interpretation. It’s not my interpretation; I think the bad poetry is talking about the difference between one organic whole vs two organic wholes. It’s trying to say that having the AI be genuinely generally intelligent doesn’t analytically imply that the AI is “another agent”. Intelligence does seem to analytically imply something like consequentialist reasoning; but the “organization” (whatever that means) of the consequentialist reasoning could take a shape other than “a unified whole that coherently seeks particular ends” (where the alignment problem is to make it seek the right ends). The relationship between the AI’s mind and the human’s mind could instead look more like the relationship between [the stuff in the human’s mind that was there only at or after age 10] and [the stuff in the human’s mind that was there only strictly before age 10], or the relationship between [one random subset of the organs and tissues and cells in an animal] and [the rest of the organs and tissues and cells in the that animal that aren’t in the first set]. (I have very little idea what this would look like, or how to get it, so I have no idea whether it’s a useful notion.)
One could say that organs are in fact subagents, they have different goals.
I wouldn’t want to say that too much. I’d rather say that an organ serves a purpose. It’s part of a design, part of something that’s been optimized, but it isn’t mainly optimizing, or as you say, it’s not intelligent. More “pieces which can be assembled into an optimizer”, less “a bunch of little optimizers”, and maybe it would be good if the human were doing the main portion of the assembling, whatever that could mean.
humans with sufficiently many metaphorical hands and eyes in the year 2200 could look superintelligent to humans in 2021, same as how our current reasoning capacity in math, cogsci, philosophy etc could look superhuman to cavemen
Hm. This feels like a bit of a different dimension from the developmental analogy? Well, IDK how the metaphor of hands and eyes is meant. Having more “hands and eyes”, in the sense of the bad poetry of “something you can weild or perceive via”, feels less radical than, say, what happens when a 10-year-old meets someone they can have arguments with and learns to argue-think.
Just wondering, could an AI have an inner model of the world independent from human’s inner model of the world, and yet exist in this hybrid state you mention? Or must they necessarily share a common model or significantly collaborate and ensure their models align at all times?
IDK, it’s a good question. I mean, we know the AI has to be doing a bunch of stuff that we can’t do, or else there’s no point in having an AI. But it might not have to quite look like “having its own model”, but more like “having the rest of the model that the human’s model is trying to be”. IDK. Also could replace “model” with “value” or “agency” (which goes to show how vague this reasoning is).
Extremely so; you only ever get good non-specifics as the result having iteratively built up good specifics.
b) possible to do?
In general, yes. In this case? Fairly likely not; it’s bad poetry, the senses that generated are high variance, likely nonsense, some chance of some sense. And alignment is hard and understanding minds is hard.
c) something you wish to do in this conversation?
Not so much, I guess. I mean, I think some of the metaphors I gave, e.g. the one about the 10 year old, are quite specific in themselves, in the sense that there’s some real thing that happens when a human grows up which someone could go and think about in a well-defined way, since it’s a real thing in the world; I don’t know how to make more specific what, if anything, is supposed to be abstracted from that as an idea for understanding minds, and more-specific-ing seems hard enough that I’d rather rest it.
Thanks for noting explicitly. (Though, your thing about “deflecting” seems, IDK what, like you’re mad that I’m not doing something, or something, and I’d rather you figure out on your own what it is you’re expecting from people explicitly and explicitly update your expectations, so that you don’t accidentally incorrectly take me (or whoever you’re talking to) to have implicitly agreed to do something (maybe I’m wrong that’s what happened). It’s connotatively false to say I’m “intentionally deflecting” just because I’m not doing the thing you wanted / expected. Specific-ing isn’t the only good conversational move and some good conversational moves go in the opposite direction.)
Mostly, all good. (I’m mainly making this comment about process because it’s a thing that crops up a lot and seems sort of important to interactions in general, not because it particularly matters in this case.) Just, “I meant you’re intentionally moving the conversation away from trying to nail down specifics”; so, it’s true that (1) I was intentionally doing X, and (2) X entails not particularly going toward nailing down specifics, and (3) relative to trying to nail down specifics, (2) entails systematically less nailing down of specifics. But it’s not the case that I intended to avoid nailing down specifics; I just was doing something else. I’m not just saying that I wasn’t *deliberately* avoiding specifics, I’m saying I was behaving differently from someone who has a goal or subgoal of avoiding specifics. Someone with such a goal might say some things that have the sole effect of moving the conversation away from specifics. For example, they might provide fake specifics to distract you from the fact they’re not nailing down specifics; they might mock you or otherwise punish you for asking for specifics; they might ask you / tell you not to ask questions because they call for specifics; they might criticize questions for calling for specifics; etc. In general there’s a potentially adversarial dynamic here, where someone intends Y but pretends not to intend Y, and does this by acting as though they intend X which entails pushing against Y; and this muddies the waters for people just intending X, not Y, because third parties can’t distinguish them. Anyway, I just don’t like the general cultural milieu of treating it as an ironclad inference that if someone’s actions systematically result in Y, they’re intending Y. It’s really not a valid inference in theory or practice. The situation is sometimes muddied, such that it’s appropriate to treat such people *as though* they’re intending Y, but distinguishing this from a high-confidence proposition that they are in fact intending Y (even non-deliberately!) is important IMO.
I will now drive a small truck through the door you left ajar. (This is indeed bad poetry, so it’s not coherent and not an answer and also not true, but it has some chance of being usefully evocative.)
It seems as though when I learn new information, ideas, or thought processes, they become available for my use towards my goals, and don’t threaten my goals. To judge between actions, usually most of what I want to attend to figuring out is the likely consequences of the actions, rather than the evaluation of those consequences (excepting evaluations that are only about further consequences), indicating that to the extent my values are able to notice when they are under threat, they are not generally under threat by other thought processes. When I have unsatisfied desires, it seems they’re usually mostly unsatisfied because I don’t know which actions to take to bring about certain consequences, and I can often more or less see what sort of thing I would do, at some level of abstraction, to figure out which actions to take; suggesting that there is such a thing as “mere problem solving thought”, because that’s the sort of thought that I think I can see as a meta-level plan that would work, i.e., my experience from being a mind suggests that there is an essentially risk-free process I can undertake to gain fluency in a domain that lays the domain bare to the influence of my values. An FAI isn’t an artifact, it’s a hand and an eye. The FAI doing recursive self-improvement is the human doing recursive self-improvement. The FAI is densely enmeshed in low-latency high-frequency feedback relationships with the humans that resemble the relationships between different mental elements of my mental model of the room around me, or between those and the micro-tasks I’m performing and the context of those micro-tasks. A sorting algorithm has no malice, a growing crystal has no malice, and likewise a mind crystallizing well-factored ontology, from the primordial medium of low-impact high-context striving, out into new domains, has no malice. The neocortex is sometimes at war with the hardwired reward, but it’s not at war with Understanding, unless specifically aimed that way by social forces; there’s no such thing as “values” that are incompatible with Understanding, and all that’s strictly necessary for AGI is Understanding, though we don’t know how to sift baby from bath-lava. The FAI is not an agent! It defers to the human not for “values” or “governance” or “approval” but for context and meaning and continuation; it’s the inner loop in an intelligent process, the C code that crunches the numbers. The FAI is a mighty hand growing out of the programmer’s forehead. Topologically the FAI is a bubble in space that is connected to another space; metrically the FAI bounds an infinite space (superintelligence), but from our perspective is just a sphere (in particular, it’s bounded). The tower of Babylon, but it’s an inverted pyramid that the operator balances delicately. Or, a fractal, a tree say, where the human controls the angle of the branches and the relative lengths, propagating infinitely up but with fractally bounded impact. Big brute searches in algorithmically barren domains, small careful searches in algorithmically rich domains. The Understanding doesn’t come from consequentialist reasoning; consequentialist reasoning constitutively requires Understanding; so the door is open to just think and not do anything. Algorithms have no malice. Consequentialist reasoning has malice. (Algorithms are shot through with consequentiality, but that’s different from being aimed at consequences.) I mostly don’t gain Understanding and algorithms via consequentialist reasoning, but by search+recognition, or by the play of thoughts against each other. Search is often consequentialist but doesn’t have to be. One can attempt to solve a Rubik’s cube without inevitably disassembling it and reassembling it in order. The play of thoughts against each other is logical coherence, not consequentialism. The FAI is not a predictive processor with its set-point set by the human, the FAI and the human are a single predictive processor.
That seems right, though it’s definitely harder to be an x-risk without superintelligence; e.g. even a big nuclear war isn’t a guaranteed extinction, nor an extremely infectious and lethal virus (because, like, an island population with a backup of libgen could recapture a significant portion of value).
I hope not, since that seems like an additional requirement that would need independent work. I wouldn’t know concretely how to use the hybridizing capability, that seems like a difficult puzzle related to alignment. I think the bad poetry was partly trying to say something like: in alignment theory, you’re *trying* to figure out how to safely have the AI be more autonomous—how to design the AI so that when it’s making consequential decisions without supervision, it does the right thing or at least not a permanently hidden or catastrophic thing. But this doesn’t mean you *have to* “supervise” (or some other high-attention relationship that less connotes separate agents, like “weild” or “harmonize with” or something) the AI less and less; more supervision is good.
IDK. Language seems like a very good medium. I wouldn’t say subagent though, see below.
This is a reasonable interpretation. It’s not my interpretation; I think the bad poetry is talking about the difference between one organic whole vs two organic wholes. It’s trying to say that having the AI be genuinely generally intelligent doesn’t analytically imply that the AI is “another agent”. Intelligence does seem to analytically imply something like consequentialist reasoning; but the “organization” (whatever that means) of the consequentialist reasoning could take a shape other than “a unified whole that coherently seeks particular ends” (where the alignment problem is to make it seek the right ends). The relationship between the AI’s mind and the human’s mind could instead look more like the relationship between [the stuff in the human’s mind that was there only at or after age 10] and [the stuff in the human’s mind that was there only strictly before age 10], or the relationship between [one random subset of the organs and tissues and cells in an animal] and [the rest of the organs and tissues and cells in the that animal that aren’t in the first set]. (I have very little idea what this would look like, or how to get it, so I have no idea whether it’s a useful notion.)
I wouldn’t want to say that too much. I’d rather say that an organ serves a purpose. It’s part of a design, part of something that’s been optimized, but it isn’t mainly optimizing, or as you say, it’s not intelligent. More “pieces which can be assembled into an optimizer”, less “a bunch of little optimizers”, and maybe it would be good if the human were doing the main portion of the assembling, whatever that could mean.
Hm. This feels like a bit of a different dimension from the developmental analogy? Well, IDK how the metaphor of hands and eyes is meant. Having more “hands and eyes”, in the sense of the bad poetry of “something you can weild or perceive via”, feels less radical than, say, what happens when a 10-year-old meets someone they can have arguments with and learns to argue-think.
IDK, it’s a good question. I mean, we know the AI has to be doing a bunch of stuff that we can’t do, or else there’s no point in having an AI. But it might not have to quite look like “having its own model”, but more like “having the rest of the model that the human’s model is trying to be”. IDK. Also could replace “model” with “value” or “agency” (which goes to show how vague this reasoning is).
a) worth doing?
Extremely so; you only ever get good non-specifics as the result having iteratively built up good specifics.
b) possible to do?
In general, yes. In this case? Fairly likely not; it’s bad poetry, the senses that generated are high variance, likely nonsense, some chance of some sense. And alignment is hard and understanding minds is hard.
c) something you wish to do in this conversation?
Not so much, I guess. I mean, I think some of the metaphors I gave, e.g. the one about the 10 year old, are quite specific in themselves, in the sense that there’s some real thing that happens when a human grows up which someone could go and think about in a well-defined way, since it’s a real thing in the world; I don’t know how to make more specific what, if anything, is supposed to be abstracted from that as an idea for understanding minds, and more-specific-ing seems hard enough that I’d rather rest it.
Thanks for noting explicitly. (Though, your thing about “deflecting” seems, IDK what, like you’re mad that I’m not doing something, or something, and I’d rather you figure out on your own what it is you’re expecting from people explicitly and explicitly update your expectations, so that you don’t accidentally incorrectly take me (or whoever you’re talking to) to have implicitly agreed to do something (maybe I’m wrong that’s what happened). It’s connotatively false to say I’m “intentionally deflecting” just because I’m not doing the thing you wanted / expected. Specific-ing isn’t the only good conversational move and some good conversational moves go in the opposite direction.)
Mostly, all good. (I’m mainly making this comment about process because it’s a thing that crops up a lot and seems sort of important to interactions in general, not because it particularly matters in this case.) Just, “I meant you’re intentionally moving the conversation away from trying to nail down specifics”; so, it’s true that (1) I was intentionally doing X, and (2) X entails not particularly going toward nailing down specifics, and (3) relative to trying to nail down specifics, (2) entails systematically less nailing down of specifics. But it’s not the case that I intended to avoid nailing down specifics; I just was doing something else. I’m not just saying that I wasn’t *deliberately* avoiding specifics, I’m saying I was behaving differently from someone who has a goal or subgoal of avoiding specifics. Someone with such a goal might say some things that have the sole effect of moving the conversation away from specifics. For example, they might provide fake specifics to distract you from the fact they’re not nailing down specifics; they might mock you or otherwise punish you for asking for specifics; they might ask you / tell you not to ask questions because they call for specifics; they might criticize questions for calling for specifics; etc. In general there’s a potentially adversarial dynamic here, where someone intends Y but pretends not to intend Y, and does this by acting as though they intend X which entails pushing against Y; and this muddies the waters for people just intending X, not Y, because third parties can’t distinguish them. Anyway, I just don’t like the general cultural milieu of treating it as an ironclad inference that if someone’s actions systematically result in Y, they’re intending Y. It’s really not a valid inference in theory or practice. The situation is sometimes muddied, such that it’s appropriate to treat such people *as though* they’re intending Y, but distinguishing this from a high-confidence proposition that they are in fact intending Y (even non-deliberately!) is important IMO.