even hands and eyes can be existentially dangerous
That seems right, though it’s definitely harder to be an x-risk without superintelligence; e.g. even a big nuclear war isn’t a guaranteed extinction, nor an extremely infectious and lethal virus (because, like, an island population with a backup of libgen could recapture a significant portion of value).
necessarily be a man-machine hybrid
I hope not, since that seems like an additional requirement that would need independent work. I wouldn’t know concretely how to use the hybridizing capability, that seems like a difficult puzzle related to alignment. I think the bad poetry was partly trying to say something like: in alignment theory, you’re *trying* to figure out how to safely have the AI be more autonomous—how to design the AI so that when it’s making consequential decisions without supervision, it does the right thing or at least not a permanently hidden or catastrophic thing. But this doesn’t mean you *have to* “supervise” (or some other high-attention relationship that less connotes separate agents, like “weild” or “harmonize with” or something) the AI less and less; more supervision is good.
subagents (man and machine) able to talk to each other in english?
IDK. Language seems like a very good medium. I wouldn’t say subagent though, see below.
Would this be referring to superhuman capabilities that are narrow in nature?
This is a reasonable interpretation. It’s not my interpretation; I think the bad poetry is talking about the difference between one organic whole vs two organic wholes. It’s trying to say that having the AI be genuinely generally intelligent doesn’t analytically imply that the AI is “another agent”. Intelligence does seem to analytically imply something like consequentialist reasoning; but the “organization” (whatever that means) of the consequentialist reasoning could take a shape other than “a unified whole that coherently seeks particular ends” (where the alignment problem is to make it seek the right ends). The relationship between the AI’s mind and the human’s mind could instead look more like the relationship between [the stuff in the human’s mind that was there only at or after age 10] and [the stuff in the human’s mind that was there only strictly before age 10], or the relationship between [one random subset of the organs and tissues and cells in an animal] and [the rest of the organs and tissues and cells in the that animal that aren’t in the first set]. (I have very little idea what this would look like, or how to get it, so I have no idea whether it’s a useful notion.)
One could say that organs are in fact subagents, they have different goals.
I wouldn’t want to say that too much. I’d rather say that an organ serves a purpose. It’s part of a design, part of something that’s been optimized, but it isn’t mainly optimizing, or as you say, it’s not intelligent. More “pieces which can be assembled into an optimizer”, less “a bunch of little optimizers”, and maybe it would be good if the human were doing the main portion of the assembling, whatever that could mean.
humans with sufficiently many metaphorical hands and eyes in the year 2200 could look superintelligent to humans in 2021, same as how our current reasoning capacity in math, cogsci, philosophy etc could look superhuman to cavemen
Hm. This feels like a bit of a different dimension from the developmental analogy? Well, IDK how the metaphor of hands and eyes is meant. Having more “hands and eyes”, in the sense of the bad poetry of “something you can weild or perceive via”, feels less radical than, say, what happens when a 10-year-old meets someone they can have arguments with and learns to argue-think.
Just wondering, could an AI have an inner model of the world independent from human’s inner model of the world, and yet exist in this hybrid state you mention? Or must they necessarily share a common model or significantly collaborate and ensure their models align at all times?
IDK, it’s a good question. I mean, we know the AI has to be doing a bunch of stuff that we can’t do, or else there’s no point in having an AI. But it might not have to quite look like “having its own model”, but more like “having the rest of the model that the human’s model is trying to be”. IDK. Also could replace “model” with “value” or “agency” (which goes to show how vague this reasoning is).
Extremely so; you only ever get good non-specifics as the result having iteratively built up good specifics.
b) possible to do?
In general, yes. In this case? Fairly likely not; it’s bad poetry, the senses that generated are high variance, likely nonsense, some chance of some sense. And alignment is hard and understanding minds is hard.
c) something you wish to do in this conversation?
Not so much, I guess. I mean, I think some of the metaphors I gave, e.g. the one about the 10 year old, are quite specific in themselves, in the sense that there’s some real thing that happens when a human grows up which someone could go and think about in a well-defined way, since it’s a real thing in the world; I don’t know how to make more specific what, if anything, is supposed to be abstracted from that as an idea for understanding minds, and more-specific-ing seems hard enough that I’d rather rest it.
Thanks for noting explicitly. (Though, your thing about “deflecting” seems, IDK what, like you’re mad that I’m not doing something, or something, and I’d rather you figure out on your own what it is you’re expecting from people explicitly and explicitly update your expectations, so that you don’t accidentally incorrectly take me (or whoever you’re talking to) to have implicitly agreed to do something (maybe I’m wrong that’s what happened). It’s connotatively false to say I’m “intentionally deflecting” just because I’m not doing the thing you wanted / expected. Specific-ing isn’t the only good conversational move and some good conversational moves go in the opposite direction.)
Mostly, all good. (I’m mainly making this comment about process because it’s a thing that crops up a lot and seems sort of important to interactions in general, not because it particularly matters in this case.) Just, “I meant you’re intentionally moving the conversation away from trying to nail down specifics”; so, it’s true that (1) I was intentionally doing X, and (2) X entails not particularly going toward nailing down specifics, and (3) relative to trying to nail down specifics, (2) entails systematically less nailing down of specifics. But it’s not the case that I intended to avoid nailing down specifics; I just was doing something else. I’m not just saying that I wasn’t *deliberately* avoiding specifics, I’m saying I was behaving differently from someone who has a goal or subgoal of avoiding specifics. Someone with such a goal might say some things that have the sole effect of moving the conversation away from specifics. For example, they might provide fake specifics to distract you from the fact they’re not nailing down specifics; they might mock you or otherwise punish you for asking for specifics; they might ask you / tell you not to ask questions because they call for specifics; they might criticize questions for calling for specifics; etc. In general there’s a potentially adversarial dynamic here, where someone intends Y but pretends not to intend Y, and does this by acting as though they intend X which entails pushing against Y; and this muddies the waters for people just intending X, not Y, because third parties can’t distinguish them. Anyway, I just don’t like the general cultural milieu of treating it as an ironclad inference that if someone’s actions systematically result in Y, they’re intending Y. It’s really not a valid inference in theory or practice. The situation is sometimes muddied, such that it’s appropriate to treat such people *as though* they’re intending Y, but distinguishing this from a high-confidence proposition that they are in fact intending Y (even non-deliberately!) is important IMO.
That seems right, though it’s definitely harder to be an x-risk without superintelligence; e.g. even a big nuclear war isn’t a guaranteed extinction, nor an extremely infectious and lethal virus (because, like, an island population with a backup of libgen could recapture a significant portion of value).
I hope not, since that seems like an additional requirement that would need independent work. I wouldn’t know concretely how to use the hybridizing capability, that seems like a difficult puzzle related to alignment. I think the bad poetry was partly trying to say something like: in alignment theory, you’re *trying* to figure out how to safely have the AI be more autonomous—how to design the AI so that when it’s making consequential decisions without supervision, it does the right thing or at least not a permanently hidden or catastrophic thing. But this doesn’t mean you *have to* “supervise” (or some other high-attention relationship that less connotes separate agents, like “weild” or “harmonize with” or something) the AI less and less; more supervision is good.
IDK. Language seems like a very good medium. I wouldn’t say subagent though, see below.
This is a reasonable interpretation. It’s not my interpretation; I think the bad poetry is talking about the difference between one organic whole vs two organic wholes. It’s trying to say that having the AI be genuinely generally intelligent doesn’t analytically imply that the AI is “another agent”. Intelligence does seem to analytically imply something like consequentialist reasoning; but the “organization” (whatever that means) of the consequentialist reasoning could take a shape other than “a unified whole that coherently seeks particular ends” (where the alignment problem is to make it seek the right ends). The relationship between the AI’s mind and the human’s mind could instead look more like the relationship between [the stuff in the human’s mind that was there only at or after age 10] and [the stuff in the human’s mind that was there only strictly before age 10], or the relationship between [one random subset of the organs and tissues and cells in an animal] and [the rest of the organs and tissues and cells in the that animal that aren’t in the first set]. (I have very little idea what this would look like, or how to get it, so I have no idea whether it’s a useful notion.)
I wouldn’t want to say that too much. I’d rather say that an organ serves a purpose. It’s part of a design, part of something that’s been optimized, but it isn’t mainly optimizing, or as you say, it’s not intelligent. More “pieces which can be assembled into an optimizer”, less “a bunch of little optimizers”, and maybe it would be good if the human were doing the main portion of the assembling, whatever that could mean.
Hm. This feels like a bit of a different dimension from the developmental analogy? Well, IDK how the metaphor of hands and eyes is meant. Having more “hands and eyes”, in the sense of the bad poetry of “something you can weild or perceive via”, feels less radical than, say, what happens when a 10-year-old meets someone they can have arguments with and learns to argue-think.
IDK, it’s a good question. I mean, we know the AI has to be doing a bunch of stuff that we can’t do, or else there’s no point in having an AI. But it might not have to quite look like “having its own model”, but more like “having the rest of the model that the human’s model is trying to be”. IDK. Also could replace “model” with “value” or “agency” (which goes to show how vague this reasoning is).
a) worth doing?
Extremely so; you only ever get good non-specifics as the result having iteratively built up good specifics.
b) possible to do?
In general, yes. In this case? Fairly likely not; it’s bad poetry, the senses that generated are high variance, likely nonsense, some chance of some sense. And alignment is hard and understanding minds is hard.
c) something you wish to do in this conversation?
Not so much, I guess. I mean, I think some of the metaphors I gave, e.g. the one about the 10 year old, are quite specific in themselves, in the sense that there’s some real thing that happens when a human grows up which someone could go and think about in a well-defined way, since it’s a real thing in the world; I don’t know how to make more specific what, if anything, is supposed to be abstracted from that as an idea for understanding minds, and more-specific-ing seems hard enough that I’d rather rest it.
Thanks for noting explicitly. (Though, your thing about “deflecting” seems, IDK what, like you’re mad that I’m not doing something, or something, and I’d rather you figure out on your own what it is you’re expecting from people explicitly and explicitly update your expectations, so that you don’t accidentally incorrectly take me (or whoever you’re talking to) to have implicitly agreed to do something (maybe I’m wrong that’s what happened). It’s connotatively false to say I’m “intentionally deflecting” just because I’m not doing the thing you wanted / expected. Specific-ing isn’t the only good conversational move and some good conversational moves go in the opposite direction.)
Mostly, all good. (I’m mainly making this comment about process because it’s a thing that crops up a lot and seems sort of important to interactions in general, not because it particularly matters in this case.) Just, “I meant you’re intentionally moving the conversation away from trying to nail down specifics”; so, it’s true that (1) I was intentionally doing X, and (2) X entails not particularly going toward nailing down specifics, and (3) relative to trying to nail down specifics, (2) entails systematically less nailing down of specifics. But it’s not the case that I intended to avoid nailing down specifics; I just was doing something else. I’m not just saying that I wasn’t *deliberately* avoiding specifics, I’m saying I was behaving differently from someone who has a goal or subgoal of avoiding specifics. Someone with such a goal might say some things that have the sole effect of moving the conversation away from specifics. For example, they might provide fake specifics to distract you from the fact they’re not nailing down specifics; they might mock you or otherwise punish you for asking for specifics; they might ask you / tell you not to ask questions because they call for specifics; they might criticize questions for calling for specifics; etc. In general there’s a potentially adversarial dynamic here, where someone intends Y but pretends not to intend Y, and does this by acting as though they intend X which entails pushing against Y; and this muddies the waters for people just intending X, not Y, because third parties can’t distinguish them. Anyway, I just don’t like the general cultural milieu of treating it as an ironclad inference that if someone’s actions systematically result in Y, they’re intending Y. It’s really not a valid inference in theory or practice. The situation is sometimes muddied, such that it’s appropriate to treat such people *as though* they’re intending Y, but distinguishing this from a high-confidence proposition that they are in fact intending Y (even non-deliberately!) is important IMO.