Are people fundamentally good? Are they practically good? If you make one person God-emperor of the lightcone, is the result something we’d like?
I just want to make a couple remarks.
Conjecture: Generally, on balance, over longer time scales good shards express themselves more than bad ones. Or rather, what we call good ones tend to be ones whose effects accumulate more.
Example: Nearly all people have a shard, quite deeply stuck through the core of their mind, which points at communing with others.
Communing means: speaking with; standing shoulder to shoulder with, looking at the same thing; understanding and being understood; lifting the same object that one alone couldn’t lift.
The other has to be truly external and truly a peer. Being a truly external true peer means they have unboundedness, infinite creativity, self- and pair-reflectivity and hence diagonalizability / anti-inductiveness. They must also have a measure of authority over their future. So this shard (albeit subtly and perhaps defeasibly) points at non-perfect subjugation of all others, and democracy. (Would an immortalized Genghis Khan, having conquered everything, after 1000 years, continue to wish to see in the world only always-fallow other minds? I’m unsure. What would really happen in that scenario?)
An aspect of communing is, to an extent, melting into an interpersonal alloy. Thought patterns are quasi-copied back and forth, leaving their imprints on each other and each other leaving their imprints on the thought patterns; stances are suggested back and forth; interoperability develops; multi-person skills develop; eyes are shared. By strong default this cannot be stopped from being transitive. Thus elements, including multi-person elements, spread, binding everyone into everyone, in the long run.
God—the future or ideal collectivity of humane minds—is the extrapolation of primordial searching for shared intentionality. That primordial searching almost universally always continues, at least to some extent, to exert itself. The Ner Tamid is saying: God (or specifically, the Shekhinah) is the one direction that we move in; God will be omnipotent, and is of course omnibenevolent. To say things more concretely (though less accurately), people in some sense want to get along, and all else equal they keep searching and learning more how to get along and keep implicitly rewriting themselves to get along more; of course this process could be corrupted / disrupted / prevented, but it’s the default.
Example: On the longest time scales, love increases, hatred decreases.
Given more information about someone, your capacity for having {commune, love, compassion, kindness, cooperation} for/with them increases more than your capacity for {hatred, adversariality} towards them increases.
You can perfectly well hate someone who you don’t know much about.
How much more can you hate someone by knowing more about them? Certainly you can learn things which make you hate them more. But if you kept learning even more, would you still be able to hate them?
You can be quite adversarial towards someone without knowing them. Everyone can meet in combat in the arena of convergent instrumental subgoals.
The exception to this conjecture:
It is possible to become extremely more adversarial towards someone by knowing much more about them—so you can pessimize against their values.
However, there is a strange sort of silver crack: Because love and cooperation are compatible with unbounded creativity, love and cooperation are unbounded. Therefore, to “keep up” with the love of your unbounded love, adversariality would need access to the unbounded expression of your values, in order to pessimize against them. But this seems to imply that the adversary has to continually give you more and more love, in order to access your values at each stage. Not sure what to make of this. (The outcome would still be very bad, but it’s strange.)
It’s harder to feel compassion towards someone you don’t know much about; towards someone you do know much about, compassion is the easiest thing in the world to feel, if you try for a moment.
(This is just another example of how Buddhists are bad: faceless compassion is annihilation, not compassion. Yeah I know you like annihilation, but it’s bad.)
Kindness and cooperation requires information, and (for humans) can increase without bound with more information.
Ender: It’s impossible or almost impossible to understand someone without loving them.
This assumes that the initially-non-eudaimonic god-king(s) would choose to remain psychologically human for a vast amount of time, and keep the rest of humanity around for all that time. Instead of:
Self-modify into something that’s basically an eldritch abomination from a human perspective, either deliberately or as part of a self-modification process gone wrong.
Make some minimal self-modifications to avoid value drift, precisely not to let the sort of stuff you’re talking about happen.
Stick to behavioral patterns that would lead to never changing their mind/never value-drifting, either as an “accidental” emergent property of their behavior (the way normal humans can surround themselves in informational bubbles that only reinforce their pre-existing beliefs; the way normal human dictators end up surrounded by yes-men; but elevated to transcendence, and so robust enough to last for eons) or as an implicit preference they never tell their aligned ASI to satisfy, but which it infers and carefully ensures the satisfaction of.
Impose some totalitarian regime on the rest of humanity and forget about it, spending the rest of their time interacting only with each other/with tailor-built non-human constructs, and/or playing immersive simulation games.
Immediately disassemble the rest of humanity for raw resources, like any good misaligned agent would, and never think about it again. Edit out their social instincts or satisfy them by interacting with each other/with constructs.
Acausally sell this universe to some random paperclip-maximizer in exchange for being incarnated in some reality without entropic decay, where they wouldn’t have cosmic resources, but would be able to exist literally eternally in lavish comfort (or at least dramatically longer than this universe’s lifespan, basically trading parallel computing for sequential computing).
Et cetera.
Overall, I think all hopeful scenarios about “even a not-very-good person elevated to godhood would converge to goodness over time!” fail to feel the Singularity. It’s not going to be basically business as usual for any prolonged length of time; things are going to get arbitrarily weird essentially immediately.
All of these hopeful purported psychosocial processes that modify humans to be good hinge on tons of assumptions about how the world looks like. They’re brittle. And it seems incredibly unlikely that any of these assumptions – let alone all of them – would still be intact even a month past the event horizon, let alone thousands of years.
Yes, that’s a background assumption of the conjecture; I think making that assumption and exploring the consequences is helpful.
Self-modify into something that’s basically an eldritch abomination from a human perspective, either deliberately or as part of a self-modification process gone wrong.
Right, totally, then all bets are off. The scenario is underspecified. My default imagination of “aligned” AGI is corrigible AGI. (In fact, I’m not even totally sure that it makes much sense to talk of aligned AGI that’s not corrigible.) Part of corrigibility would be that if:
the human asks you to do X,
and X would have irreversible consequences,
and the human is not aware of / doesn’t understand those consequences,
and the consequences would make the human unable to notice or correct the change,
and the human, if aware, would have really wanted to not do X or at least think about it a bunch more before doing it,
then you DEFINITELY don’t just go ahead and do X lol!
In other words, a corrigible AGI is supposed to use its intelligence to possibilize self-alignment for the human.
Make some minimal self-modifications to avoid value drift, precisely not to let the sort of stuff you’re talking about happen.
I think this notion of values and hence value drift is probably mistaken of humans. Human values are meta and open—part of the core argument of my OP (the bullet point about communing).
Stick to behavioral patterns that would lead to never changing their mind/never value-drifting, either as an “accidental” emergent property of their behavior
So first they carefully construct an escape-proof cage for all the other humans, and then they become a perma-zombie? Not implausible, like they could for some reason specifically ask the AGI to do this, but IDK why they would.
or as an implicit preference they never tell their aligned ASI to satisfy, but which it infers and carefully ensures the satisfaction of.
Doesn’t sound very corrigible? Not sure.
Immediately disassemble the rest of humanity for raw resources, like any good misaligned agent would, and never think about it again. Edit out their social instincts or satisfy them by interacting with each other/with constructs.
Right, certainly they could. Who actually would? (Not rhetorical.)
Overall, I think all hopeful scenarios about “even a not-very-good person elevated to godhood would converge to goodness over time!” fail to feel the Singularity. It’s not going to be basically business as usual for any prolonged length of time; things are going to get arbitrarily weird essentially immediately.
I think you’re failing to feel the Singularity, and instead you’re extrapolating to like “what would a really really bad serial killer / dictator do if they were being an extra bad serial killer / dictator times 1000”. Or IDK, I don’t know what you think; What do you think would actually happen if a random person were put in the corrigible AGI control seat?
Things can get weird, but for the person to cut out a bunch of their core humanity, kinda seems like either the AGI isn’t really corrigible or isn’t really AGI (such that the emperor-AGI system is being dumb by its own lights), or else the person really wanted to do that. Why do you think people want to do that? Do you want to do that? I don’t.
If they don’t cut out a bunch of their core humanity, then my question and conjecture are live.
Human values are meta and open—part of the core argument of my OP (the bullet point about communing).
Unless the human, on reflection, doesn’t want some specific subset of their current values to be open to change / has meta-level preferences to freeze some object-level values. Which I think is common. (Source: I have meta-preferences to freeze some of my object-level values at “eudaimonia”, and I take specific deliberate actions to avoid or refuse value-drift on that.)
Not implausible, like they could for some reason specifically ask the AGI to do this, but IDK why they would.
Callousness. “We probably need to do something about the rest of humanity, probably shouldn’t just wipe them all out, lemme draft some legislation, alright looks good, rubber-stamp it and let’s move on”. Tons of bureaucracies and people in power seem to act this way today, including decisions that impact the fates of millions.
Right, certainly they could. Who actually would? (Not rhetorical.)
I don’t know that Genghis Khan or Stalin wouldn’t have. Some clinical psychopaths or philosophical extremists (e. g., the human successionists) certainly would.
What do you think would actually happen if a random person were put in the corrigible AGI control seat?
Mm...
First, I think “corrigibility to a human” is underdefined. A human is not, themselves, a coherent agent with a specific value/goal-slot to which an AI can be corrigible.
Like, is it corrigible to a human’s momentary impulses? Or to the command the human would give if they thought for five minutes? For five days? Or perhaps to the command they’d give if the AI taught them more wisdom? But then which procedure should the AI choose for teaching them more wisdom? The outcome is likely path-dependent on that: on the choice between curriculum A and curriculum B. And if so, what procedure should the AI use to decide what curriculum to use? Or should the AI perhaps basically ignore the human in front of them, and simply interpret them as a rough pointer to CEV? Well, that assumes the conclusion, and isn’t really “corrigibility” at all, is it?
The underlying issue here is that “a human’s values” are themselves underdefined. They’re derived in a continual, path-dependent fashion, by a unstable process with lots of recursions and meta-level interference. There’s no unique ground-true set of values which the AI should take care not to step onto. This leaves three possibilities:
The AI acts as a tool that does what the human knowingly instructs it to do, with the wisdom by-default outsourced to the human.
But then it is possible to use it unwisely. For example, if the human operator is smart enough to foresee issues with self-modification, they could ask the AI to watch out for that. They could also ask it to watch out for that whole general class of unwise-on-the-part-of-the-human decisions. But they can also fail to do so, or unwisely ignore a warning in a fit of emotion, or have some beliefs about how decisions Ought to be Done that they’re unwilling to even discuss with the AI.
The AI never does anything, because it knows that any of its actions can step onto one of the innumerable potential endpoints of a human’s self-reflection process.
But then it is useless.
The AI isn’t corrigible at all, it just optimizes for some fixed utility function, if perhaps with an indirect pointer to it (“this human’s happiness”, “humanity’s CEV”, etc.).
(1) is the only possibility worth examining here, I think.
And what I expect to happen if an untrained, philosophically median human is put in control of a tool ASI, is some sort of catastrophe. They would have various cached thoughts about how the story ought to go, what the greater good is, who the villains are, how the society ought to be set up. These thoughts would be endorsed at the meta-level, and not open to debate. The human would not want to ask the ASI to examine those; if the ASI attempts to challenge them as part of some other request, the human would tell it to shut up.[1]
In addition, the median human is not, really, a responsible person. If put in control of an ASI, they would not suddenly become appropriately responsible. It wouldn’t by default occur to them to ask the ASI for making them more responsible, either, because that’s itself a very responsible thing to do. The way it would actually go, they are going to be impulsive, emotional, callous, rash, unwise, cognitively lazy.
Some sort of stupid and callous outcome is likely to result. Maybe not specifically “self-modifying into a monster/zombie and trapping humanity in a dystopian prison”, but something in that reference class of outcomes.
Not to mention if the human has some extant prejudices: racism or any other manner of “assigning different moral worth to different sapient beings based on arbitrary features”. The stupid-callous-impulsive process would spit out some not-very-pleasant fate for the undesirables, and this would be reflectively endorsed on some level, so a genuine tool-like corrigible ASI[2] wouldn’t say a word of protest.
Maybe I am being overly cynical about this, that’s definitely possible. Still, that’s my current model.
Source: I would not ask the ASI to search for arguments against eudaimonia-maximization, or ask it to check if there’s something else that “I” “should” be pursuing instead, because I do not want to be argued out of that even if there’s some coherent, true, and compelling sense in which it is not what “I” “actually” “want”. If the ASI asks whether it should run that check as part of some other request, I would tell it to shut up.
(Note that it’s different from examining whether my idea of eudaimonia/human flourishing/the-thing-I-mean-when-I-say-human-flourishing is correct/good, or whether my fundamental assumptions about how the world works are correct, etc.)
As opposed to a supposedly corrigible but secretly eudaimonic ASI which, in one’s imagination, always happens to gently question the human’s decisions when the human orders it to do something bad, and then happens to pick the specific avenues of questioning that make the human “realize” they wanted good things all along.
The AGI helps out with increasing the human’s ability to follow through on attempts at internal organization (e.g. thinking, problem solving, reflecting, coherentifying) that normally the human would try a bit and then give up on.
Not saying this is some sort of grand solution to corrigibility, but it’s obviously better than the nonsense you listed. If a human were going to try to help me out, I’d want this, for example, more than the things you listed, and it doesn’t seem especially incompatible with corrigible behavior.
First, I think “corrigibility to a human” is underdefined. A human is not, themselves, a coherent agent with a specific value/goal-slot to which an AI can be corrigible.
I mean, yes, but you wrote a lot of stuff after this that seems weird / missing the point, to me. A “corrigible AGI” should do at least as well as—really, much better than—you would do, if you had a huge team of researchers under you and your full time, 100,000x speed job is to do a really good job at “being corrigible, whatever that means” to the human in the driver’s seat. (In the hypothetical you’re on board with this for some reason.)
(Source: I have meta-preferences to freeze some of my object-level values at “eudaimonia”, and I take specific deliberate actions to avoid or refuse value-drift on that.)
I would guess fairly strongly that you’re mistaken or confused about this, in a way that an AGI would understand and be able to explain to you. (An example of how that would be the case: the version of “eudaimonia” that would not horrify you, if you understood it very well, has to involve meta+open consciousness (of a rather human flavor).)
Source: I have meta-preferences to freeze some of my object-level values at “eudaimonia”, and I take specific deliberate actions to avoid or refuse value-drift on that.
I’m curious to hear more about those specific deliberate actions.
Some sort of stupid and callous outcome is likely to result. Maybe not specifically “self-modifying into a monster/zombie and trapping humanity in a dystopian prison”, but something in that reference class of outcomes.
Your and my beliefs/questions don’t feel like they’re even much coming into contact with each other… Like, you (and also other people) just keep repeating “something bad could happen”. And I’m like “yeah obviously something extremely bad could happen; maybe it’s even likely, IDK; and more likely, something very bad at the beginning of the reign would happen (Genghis spends is first 200 years doing more killing and raping); but what I’m ASKING is, what happens then?”.
If you’re saying
There is a VERY HIGH CHANCE that the emperor would PERMANENTLY put us into a near-zero value state or a negative-value state.
then, ok, you can say that, but I want to understand why; and I have some reasons (as presented) for thinking otherwise.
Your hypothesis is about the dynamics within human minds embedded in something like contemporary societies with lots of other diverse humans whom the rulers are forced to model for one reason or another.
My point is that evil, rash, or unwise decisions at the very start of the process are likely, and that those decisions are likely to irrevocably break the conditions in which the dynamics you hypothesize are possible. Make the minds in charge no longer human in the relevant sense, or remove the need to interact with/model other humans, etc.
In my view, it doesn’t strongly bear on the final outcome-distribution whether the “humans tend to become nicer to other humans over time” hypothesis is correct, because “the god-kings remain humans hanging around all the other humans in a close-knit society for millennia” is itself a very rare class of outcomes.
Your hypothesis is about the dynamics within human minds embedded in something like contemporary societies with lots of other diverse humans whom the rulers are forced to model for one reason or another.
Absolutely not, no. Humans want to be around (some) other people, so the emperor will choose to be so. Humans want to be [many core aspects of humanness, not necessarily per se, but individually], so the emperor will choose to be so. Yes, the emperor could want these insufficiently for my argument to apply, as I’ve said earlier. But I’m not immediately recalling anyone (you or others) making any argument that, with high or even substantial probability, the emperor would not want these things sufficiently for my question, about the long-run of these things, to be relevant.
Yes: some other people. The ideologically and morally aligned people, usually. Social/informational bubbles that screen away the rest of humanity, from which they only venture out if forced to (due to the need to earn money/control the populace, etc.). This problem seems to get worse as the ability to insulate yourself from other improves, as could be observed with modern internet-based informational bubbles or the surrounded-by-yes-men problem of dictators.
ASI would make this problem transcendental: there would truly be no need to ever bother with the people outside your bubble again, they could be wiped out or their management outsourced to AIs.
Past this point, you’re likely never returning to bothering about them. Why would you, if you can instead generate entire worlds of the kinds of people/entities/experiences you prefer? It seems incredibly unlikely that human social instincts can only be satisfied – or even can be best satisfied – by other humans.
It seems incredibly unlikely that human social instincts can only be satisfied – or even can be best satisfied – by other humans.
You’re 100% not understanding my argument, which is sorta fair because I didn’t lay it out clearly, but I think you should be doing better anyway.
Here’s a sketch:
Humans want to be human-ish and be around human-ish entities.
So the emperor will be human-ish and be around human-ish entities for a long time. (Ok, to be clear, I mean a lot of developmental / experiential time—the thing that’s relevant for thinking about how the emperor’s way of being trends over time.)
When being human-ish and around human-ish entities, core human shards continue to work.
When core human shards continue to work, MAYBE this implies EVENTUALLY adopting beneficence (or something else like cosmopolitanism), and hence good outcomes.
Since the emperor will be human-ish and be around human-ish entities for a long time, IF 4 obtains, then good outomes.
And then I give two IDEAS about 4 (communing->[universalist democracy], and [information increases]->understanding->caring).
I don’t know what’s making you think I don’t understand your argument. Also, I’ve never publicly stated that I’m opting into Crocker’s Rules, so while I happen not to particularly mind the rudeness, your general policy on that seems out of line here.
When being human-ish and around human-ish entities, core human shards continue to work
My argument is that the process you’re hypothesizing would be sensitive to the exact way of being human-ish, the exact classes of human-ish entities around, and the exact circumstances in which the emperor has to be around them.
As a plain and down-to-earth example, if a racist surrounds themselves with a hand-picked group of racist friends, do you expect them to eventually develop universal empathy, solely through interacting with said racist friends? Addressing your specific ideas: nobody in that group would ever need to commune with non-racists, nor have to bother learning more about non-racists. And empirically, such groups don’t seem to undergo spontaneous deradicalizations.
As a plain and down-to-earth example, if a racist surrounds themselves with a hand-picked group of racist friends, do you expect them to eventually develop universal empathy, solely through interacting with said racist friends? Addressing your specific ideas: nobody in that group would ever need to commune with non-racists, nor have to bother learning more about non-racists. And empirically, such groups don’t seem to undergo spontaneous deradicalizations.
As a plain and down-to-earth example, if a racist surrounds themselves with a hand-picked group of racist friends, do you expect them to eventually develop universal empathy, solely through interacting with said racist friends? Addressing your specific ideas: nobody in that group would ever need to commune with non-racists, nor have to bother learning more about non-racists. And empirically, such groups don’t seem to undergo spontaneous deradicalizations.
So what do you think happens when they are hanging out together, and they are in charge, and it has been 1,000 years or 1,000,000 years?
They keep each other radicalized forever as part of some transcendental social dynamic.
They become increasingly non-human as time goes on, small incremental modifications and personality changes building on each other, until they’re no longer human in the senses necessary for your hypothesis to apply.
I assume your counter-model involves them getting bored of each other and seeking diversity/new friends, or generating new worlds to explore/communicate with, with the generating processes not constrained to only generate racists, leading to the extremists interacting with non-extremists and eventually incrementally adopting non-extremist perspectives?
If yes, this doesn’t seem like the overdetermined way for things to go:
The generating processes would likely be skewed towards only generating things the extremists would find palatable, meaning more people sharing their perspectives/not seriously challenging whatever deeply seated prejudices they have. They’re there to have a good time, not have existential/moral crises.
They may make any number of modifications to themselves to make them no longer human-y in the relevant sense. Including by simply letting human-standard self-modification algorithms run for 10^3-10^6 years, becoming superhumanly radicalized.
They may address the “getting bored” part instead, periodically wiping their memories (including by standard human forgetting) or increasing each other’s capacity to generate diverse interactions.
Ok so they only generate racists and racially pure people. And they do their thing. But like, there’s no other races around, so the racism part sorta falls by the wayside. They’re still racially pure of course, but it’s usually hard to tell that they’re racist; sometimes they sit around and make jokes to feel superior over lesser races, but this is pretty hollow since they’re not really engaged in any type of race relations. Their world isn’t especially about all that, anymore. Now it’s about… what? I don’t know what to imagine here, but the only things I do know how to imagine involve unbounded structure (e.g. math, art, self-reflection, self-reprogramming). So, they’re doing that stuff. For a very long time. And the race thing just is not a part of their world anymore. Or is it? I don’t even know what to imagine there. Instead of having tastes about ethnicity, they develop tastes about questions in math, or literature. In other words, [the differences between people and groups that they care about] migrate from race to features of people that are involved in unbounded stuff. If the AGI has been keeping the racially impure in an enclosure all this time, at some point the racists might have a glance back, and say, wait, all the interesting stuff about people is also interesting about these people. Why not have them join us as well.
Past this point, you’re likely never returning to bothering about them. Why would you, if you can instead generate entire worlds of the kinds of people/entities/experiences you prefer? It seems incredibly unlikely that human social instincts can only be satisfied – or even can be best satisfied – by other humans.
For the same reason that most people (if given the power to do so) wouldn’t just replace their loved ones with their altered versions that are better along whatever dimensions the person judged them as deficient/imperfect.
I don’t know that Genghis Khan or Stalin wouldn’t have. Some clinical psychopaths or philosophical extremists (e. g., the human successionists) certainly would.
Yeah I mean this is perfectly plausible, it’s just that even these cases are not obvious to me.
Given more information about someone, your capacity for having {commune, love, compassion, kindness, cooperation} for/with them increases more than your capacity for {hatred, adversariality} towards them increases.
If this were true, I’d expect much lower divorce rates. After all, who do you have the most information about other than your wife/husband, and many of these divorces are un-amicable, though I wasn’t quickly able to get particular numbers. [EDIT:] Though in either case, this indeed indicates a much decreasing level of love over long periods of time & greater mutual knowledge. See also the decrease in all objective measures of quality of life after divorce for both parties after long marriages.
(I wrote my quick take quickly and therefore very elliptically, and therefore it would require extra charity / work on the reader’s part (like, more time spent asking “huh? this makes no sense? ok what could he have meant, which would make this statement true?”).)
It’s an interesting point, but I’m talking about time scales of, say, thousands of years or millions of years. So it’s certainly not a claim that could be verified empirically by looking at any individual humans because there aren’t yet any millenarians or megaannumarians. Possibly you could look at groups that have had a group consciousness for thousands of years, and see if pairs of them get friendlier to each other over time, though it’s not really comparable (idk if there are really groups like that in continual contact and with enough stable collectivity; like, maybe the Jews and the Indians or something).
So it’s certainly not a claim that could be verified empirically by looking at any individual humans because there aren’t yet any millenarians or megaannumarians.
If its not a conclusion which could be disproven empirically, then I don’t know how you came to it.
(I wrote my quick take quickly and therefore very elliptically, and therefore it would require extra charity / work on the reader’s part (like, more time spent asking “huh? this makes no sense? ok what could he have meant, which would make this statement true?”).)
I mean, I did ask myself about counter-arguments you could have with my objection, and came to basically your response. That is, something approximating “well they just don’t have enough information, and if they had way way more information then they’d love each other again” which I don’t find satisfying.
Namely because I expect people in such situations get stuck in a negative-reinforcement cycle, where the things which used to be fun which the other did lose their novelty over time as they get repetitive, which leads to the predicted reward of those interactions overshooting the actual reward, which in a TD learning sense is just as good (bad) as a negative reinforcement event. I don’t see why this would be fixed with more knowledge, and it indeed does seem likely to be exacerbated with more knowledge as more things the other does become less novel & more boring, and worse, fundamental implications of their nature as a person, rather than unfortunate accidents they can change easily.
I also think intuitions in this area are likely misleading. It is definitely the case now that marginally more understanding of each other would help with coordination problems, since people love making up silly reasons to hate each other. I do also think this is anchoring too much on our current bandwidth limitations, and generalizing too far. Better coordination does not always imply more love.
Namely because I expect people in such situations get stuck in a negative-reinforcement cycle, where the things which used to be fun which the other did lose their novelty over time as they get repetitive, which leads to the predicted reward of those interactions overshooting the actual reward, which in a TD learning sense is just as good (bad) as a negative reinforcement event. I don’t see why this would be fixed with more knowledge, and it indeed does seem likely to be exacerbated with more knowledge as more things the other does become less novel & more boring, and worse, fundamental implications of their nature as a person, rather than unfortunate accidents they can change easily.
This does not sound like the sort of problem you’d just let yourself wallow in for 1000 years.
And again, with regards to what is fixed by more information, I’m saying that capacity for love increases more.
more things the other does become less novel & more boring
After 1000 years, both people would have gotten bored with themselves, and learned to do infinite play!
That is, something approximating “well they just don’t have enough information, and if they had way way more information then they’d love each other again” which I don’t find satisfying.
Maybe there’s a more basic reading comprehension fail: I said capacity to love increases more with more information, not that you magically start loving each other.
Not sure if we are talking about the same thing, but I think that there are many people who just “play it safe”, and in a civilized society that generally means following the rules and avoiding unnecessary conflicts. The same people can behave differently if you give them power (even on a small scale, e.g. when they have children).
But I think there are also people who try to do good even when the incentives point the other way round. And also people who can’t resist hurting others even when that predictably gets them punished.
Given more information about someone, your capacity for having {commune, love, compassion, kindness, cooperation} for/with them increases more than your capacity for {hatred, adversariality} towards them increases.
Knowing more about people allows you to have a better model of them. So if you started with the assumption e.g. that people who don’t seem sufficiently similar to you are bad, then knowing them better will improve your attitude towards them. On the other hand, if you started from some kind of Pollyanna perspective, knowing people better can make you disappointed and bitter. Finally, if you are a psychopath, knowing people better just gives you more efficient ways to exploit them.
Right. Presumably, maybe. But I am interested in considering quite extreme versions of the claim. Maybe there’s only 10,000 people who would, as emperor, make a world that is, after 1,000,000 years, net negative according to us. Maybe there’s literally 0? I’m not even sure that there aren’t literally 0, though quite plausibly someone else could know this confidently. (For example, someone could hypothetically have solid information suggesting that someone could remain truly delusionally and disorganizedly psychotic and violent to such an extent that they never get bored and never grow, while still being functional enough to give directions to an AI that specify world domination for 1,000,000 years.)
Sounds to me like wishful thinking. You basically assume that in 1 000 000 years people will get bored of doing the wrong thing, and start doing the right thing. My perspective is that “good” is a narrow target in the possibility space, and if someone already keeps missing it now, if we expand their possibility space by making them a God-emperor, the chance of converging to that narrow target only decreases.
Basically, for your model to work, kindness would need to be the only attractor in the space of human (actually, post-human) psychology.
A simple example of how things could go wrong is for Genghis Khan to set up an AI to keep everyone else in horrible conditions forever, and then (on purpose, or accidentally) wirehead himself.
Another example is the God-emperor editing their own brain to remove all empathy, e.g. because they consider it a weakness at the moment. Once all empathy is uninstalled, there is no incentive to reinstall it.
EDIT: I see that Thane Ruthenis already made this argument, and didn’t convince you.
No, I ask the question, and then I present a couple hypothesis-pieces. (Your stance here seems fairly though not terribly anti-thought AFAICT, so FYI I may stop engaging without further warning.)
My perspective is that “good” is a narrow target in the possibility space, and if someone already keeps missing it now, if we expand their possibility space by making them a God-emperor, the chance of converging to that narrow target only decreases.
I’m seriously questioning whether it’s a narrow target for humans.
Basically, for your model to work, kindness would need to be the only attractor in the space of human (actually, post-human) psychology.
Well, if we assume that humans are fundamentally good / inevitably converging to kindness if given enough time… then, yeah, giving someone God-emperor powers is probably going to be good in long term. (If they don’t accidentally make an irreparable mistake.)
On the time scale of current human lifespan, I guess I could point out that some old people are unkind, or that some criminals keep re-offending a lot, so it doesn’t seem like time automatically translates to more kindness.
But an obvious objection is “well, maybe they need 200 years of time, or 1000”, and I can’t provide empirical evidence against that. So I am not sure how to settle this question.
On average, people get less criminal as they get older, so that would point towards human kindness increasing in time. On the other hand, they also get less idealistic, on average, so maybe a simpler explanation is that as people get older, they get less active in general. (Also, some reduction in crime is caused by the criminals getting killed as a result of their lifestyle.)
There is probably a significant impact of hormone levels, which means that we need to make an assumption about how the God-emperor would regulate their own hormones. For example, if he decides to keep a 25 years old human male body, maybe his propensity to violence will match the body?
tl;dr—what kinds of arguments should even be used in this debate?
what kinds of arguments should even be used in this debate?
Ok, now we have a reasonable question. I don’t know, but I provided two argument-sketches that I think are of a potentially relevant type. At an abstract level, the answer would be “mathematico-conceptual reasoning”, just like in all previous instances where there’s a thing that has never happened before, and yet we reason somewhat successfully about it—of which there are plenty examples, if you think about it for a minute.
On average, people get less criminal as they get older, so that would point towards human kindness increasing in time. On the other hand, they also get less idealistic, on average, so maybe a simpler explanation is that as people get older, they get less active in general.
When I read Tsvi’s OP, I was imagining something like a (trans-/post- but not too post-)human civilization where everybody by default has an unbounded lifespan and healthspan, possibly somewhat boosted intelligence and need for cognition / open intellectual curiosity. (In which case, “people tend to X as they get older”, where X is something mostly due to things related to default human aging, doesn’t apply.)
Now start it as a modern-ish democracy or a cluster of (mostly) democracies, run for 1e4 to 1e6 years, and see what happens.
I basically don’t buy the conjecture of humans being super-cooperative in the long run, or hatred decreasing and love increasing.
To the extent that something like this is true, I expect it to be a weird industrial to information age relic that utterly shatters if AGI/ASI is developed, and this remains true even if the AGI is aligned to a human.
Are people fundamentally good? Are they practically good? If you make one person God-emperor of the lightcone, is the result something we’d like?
I just want to make a couple remarks.
Conjecture: Generally, on balance, over longer time scales good shards express themselves more than bad ones. Or rather, what we call good ones tend to be ones whose effects accumulate more.
Example: Nearly all people have a shard, quite deeply stuck through the core of their mind, which points at communing with others.
Communing means: speaking with; standing shoulder to shoulder with, looking at the same thing; understanding and being understood; lifting the same object that one alone couldn’t lift.
The other has to be truly external and truly a peer. Being a truly external true peer means they have unboundedness, infinite creativity, self- and pair-reflectivity and hence diagonalizability / anti-inductiveness. They must also have a measure of authority over their future. So this shard (albeit subtly and perhaps defeasibly) points at non-perfect subjugation of all others, and democracy. (Would an immortalized Genghis Khan, having conquered everything, after 1000 years, continue to wish to see in the world only always-fallow other minds? I’m unsure. What would really happen in that scenario?)
An aspect of communing is, to an extent, melting into an interpersonal alloy. Thought patterns are quasi-copied back and forth, leaving their imprints on each other and each other leaving their imprints on the thought patterns; stances are suggested back and forth; interoperability develops; multi-person skills develop; eyes are shared. By strong default this cannot be stopped from being transitive. Thus elements, including multi-person elements, spread, binding everyone into everyone, in the long run.
God—the future or ideal collectivity of humane minds—is the extrapolation of primordial searching for shared intentionality. That primordial searching almost universally always continues, at least to some extent, to exert itself. The Ner Tamid is saying: God (or specifically, the Shekhinah) is the one direction that we move in; God will be omnipotent, and is of course omnibenevolent. To say things more concretely (though less accurately), people in some sense want to get along, and all else equal they keep searching and learning more how to get along and keep implicitly rewriting themselves to get along more; of course this process could be corrupted / disrupted / prevented, but it’s the default.
Example: On the longest time scales, love increases, hatred decreases.
Given more information about someone, your capacity for having {commune, love, compassion, kindness, cooperation} for/with them increases more than your capacity for {hatred, adversariality} towards them increases.
You can perfectly well hate someone who you don’t know much about.
How much more can you hate someone by knowing more about them? Certainly you can learn things which make you hate them more. But if you kept learning even more, would you still be able to hate them?
You can be quite adversarial towards someone without knowing them. Everyone can meet in combat in the arena of convergent instrumental subgoals.
The exception to this conjecture:
It is possible to become extremely more adversarial towards someone by knowing much more about them—so you can pessimize against their values.
However, there is a strange sort of silver crack: Because love and cooperation are compatible with unbounded creativity, love and cooperation are unbounded. Therefore, to “keep up” with the love of your unbounded love, adversariality would need access to the unbounded expression of your values, in order to pessimize against them. But this seems to imply that the adversary has to continually give you more and more love, in order to access your values at each stage. Not sure what to make of this. (The outcome would still be very bad, but it’s strange.)
It’s harder to feel compassion towards someone you don’t know much about; towards someone you do know much about, compassion is the easiest thing in the world to feel, if you try for a moment.
(This is just another example of how Buddhists are bad: faceless compassion is annihilation, not compassion. Yeah I know you like annihilation, but it’s bad.)
Kindness and cooperation requires information, and (for humans) can increase without bound with more information.
Ender: It’s impossible or almost impossible to understand someone without loving them.
Thanks to @JuliaHP for a related conversation.
This assumes that the initially-non-eudaimonic god-king(s) would choose to remain psychologically human for a vast amount of time, and keep the rest of humanity around for all that time. Instead of:
Self-modify into something that’s basically an eldritch abomination from a human perspective, either deliberately or as part of a self-modification process gone wrong.
Make some minimal self-modifications to avoid value drift, precisely not to let the sort of stuff you’re talking about happen.
Stick to behavioral patterns that would lead to never changing their mind/never value-drifting, either as an “accidental” emergent property of their behavior (the way normal humans can surround themselves in informational bubbles that only reinforce their pre-existing beliefs; the way normal human dictators end up surrounded by yes-men; but elevated to transcendence, and so robust enough to last for eons) or as an implicit preference they never tell their aligned ASI to satisfy, but which it infers and carefully ensures the satisfaction of.
Impose some totalitarian regime on the rest of humanity and forget about it, spending the rest of their time interacting only with each other/with tailor-built non-human constructs, and/or playing immersive simulation games.
Immediately disassemble the rest of humanity for raw resources, like any good misaligned agent would, and never think about it again. Edit out their social instincts or satisfy them by interacting with each other/with constructs.
Acausally sell this universe to some random paperclip-maximizer in exchange for being incarnated in some reality without entropic decay, where they wouldn’t have cosmic resources, but would be able to exist literally eternally in lavish comfort (or at least dramatically longer than this universe’s lifespan, basically trading parallel computing for sequential computing).
Et cetera.
Overall, I think all hopeful scenarios about “even a not-very-good person elevated to godhood would converge to goodness over time!” fail to feel the Singularity. It’s not going to be basically business as usual for any prolonged length of time; things are going to get arbitrarily weird essentially immediately.
All of these hopeful purported psychosocial processes that modify humans to be good hinge on tons of assumptions about how the world looks like. They’re brittle. And it seems incredibly unlikely that any of these assumptions – let alone all of them – would still be intact even a month past the event horizon, let alone thousands of years.
Yes, that’s a background assumption of the conjecture; I think making that assumption and exploring the consequences is helpful.
Right, totally, then all bets are off. The scenario is underspecified. My default imagination of “aligned” AGI is corrigible AGI. (In fact, I’m not even totally sure that it makes much sense to talk of aligned AGI that’s not corrigible.) Part of corrigibility would be that if:
the human asks you to do X,
and X would have irreversible consequences,
and the human is not aware of / doesn’t understand those consequences,
and the consequences would make the human unable to notice or correct the change,
and the human, if aware, would have really wanted to not do X or at least think about it a bunch more before doing it,
then you DEFINITELY don’t just go ahead and do X lol!
In other words, a corrigible AGI is supposed to use its intelligence to possibilize self-alignment for the human.
I think this notion of values and hence value drift is probably mistaken of humans. Human values are meta and open—part of the core argument of my OP (the bullet point about communing).
So first they carefully construct an escape-proof cage for all the other humans, and then they become a perma-zombie? Not implausible, like they could for some reason specifically ask the AGI to do this, but IDK why they would.
Doesn’t sound very corrigible? Not sure.
Right, certainly they could. Who actually would? (Not rhetorical.)
I think you’re failing to feel the Singularity, and instead you’re extrapolating to like “what would a really really bad serial killer / dictator do if they were being an extra bad serial killer / dictator times 1000”. Or IDK, I don’t know what you think; What do you think would actually happen if a random person were put in the corrigible AGI control seat?
Things can get weird, but for the person to cut out a bunch of their core humanity, kinda seems like either the AGI isn’t really corrigible or isn’t really AGI (such that the emperor-AGI system is being dumb by its own lights), or else the person really wanted to do that. Why do you think people want to do that? Do you want to do that? I don’t.
If they don’t cut out a bunch of their core humanity, then my question and conjecture are live.
Unless the human, on reflection, doesn’t want some specific subset of their current values to be open to change / has meta-level preferences to freeze some object-level values. Which I think is common. (Source: I have meta-preferences to freeze some of my object-level values at “eudaimonia”, and I take specific deliberate actions to avoid or refuse value-drift on that.)
Callousness. “We probably need to do something about the rest of humanity, probably shouldn’t just wipe them all out, lemme draft some legislation, alright looks good, rubber-stamp it and let’s move on”. Tons of bureaucracies and people in power seem to act this way today, including decisions that impact the fates of millions.
I don’t know that Genghis Khan or Stalin wouldn’t have. Some clinical psychopaths or philosophical extremists (e. g., the human successionists) certainly would.
Mm...
First, I think “corrigibility to a human” is underdefined. A human is not, themselves, a coherent agent with a specific value/goal-slot to which an AI can be corrigible.
Like, is it corrigible to a human’s momentary impulses? Or to the command the human would give if they thought for five minutes? For five days? Or perhaps to the command they’d give if the AI taught them more wisdom? But then which procedure should the AI choose for teaching them more wisdom? The outcome is likely path-dependent on that: on the choice between curriculum A and curriculum B. And if so, what procedure should the AI use to decide what curriculum to use? Or should the AI perhaps basically ignore the human in front of them, and simply interpret them as a rough pointer to CEV? Well, that assumes the conclusion, and isn’t really “corrigibility” at all, is it?
The underlying issue here is that “a human’s values” are themselves underdefined. They’re derived in a continual, path-dependent fashion, by a unstable process with lots of recursions and meta-level interference. There’s no unique ground-true set of values which the AI should take care not to step onto. This leaves three possibilities:
The AI acts as a tool that does what the human knowingly instructs it to do, with the wisdom by-default outsourced to the human.
But then it is possible to use it unwisely. For example, if the human operator is smart enough to foresee issues with self-modification, they could ask the AI to watch out for that. They could also ask it to watch out for that whole general class of unwise-on-the-part-of-the-human decisions. But they can also fail to do so, or unwisely ignore a warning in a fit of emotion, or have some beliefs about how decisions Ought to be Done that they’re unwilling to even discuss with the AI.
The AI never does anything, because it knows that any of its actions can step onto one of the innumerable potential endpoints of a human’s self-reflection process.
But then it is useless.
The AI isn’t corrigible at all, it just optimizes for some fixed utility function, if perhaps with an indirect pointer to it (“this human’s happiness”, “humanity’s CEV”, etc.).
(1) is the only possibility worth examining here, I think.
And what I expect to happen if an untrained, philosophically median human is put in control of a tool ASI, is some sort of catastrophe. They would have various cached thoughts about how the story ought to go, what the greater good is, who the villains are, how the society ought to be set up. These thoughts would be endorsed at the meta-level, and not open to debate. The human would not want to ask the ASI to examine those; if the ASI attempts to challenge them as part of some other request, the human would tell it to shut up.[1]
In addition, the median human is not, really, a responsible person. If put in control of an ASI, they would not suddenly become appropriately responsible. It wouldn’t by default occur to them to ask the ASI for making them more responsible, either, because that’s itself a very responsible thing to do. The way it would actually go, they are going to be impulsive, emotional, callous, rash, unwise, cognitively lazy.
Some sort of stupid and callous outcome is likely to result. Maybe not specifically “self-modifying into a monster/zombie and trapping humanity in a dystopian prison”, but something in that reference class of outcomes.
Not to mention if the human has some extant prejudices: racism or any other manner of “assigning different moral worth to different sapient beings based on arbitrary features”. The stupid-callous-impulsive process would spit out some not-very-pleasant fate for the undesirables, and this would be reflectively endorsed on some level, so a genuine tool-like corrigible ASI[2] wouldn’t say a word of protest.
Maybe I am being overly cynical about this, that’s definitely possible. Still, that’s my current model.
Source: I would not ask the ASI to search for arguments against eudaimonia-maximization, or ask it to check if there’s something else that “I” “should” be pursuing instead, because I do not want to be argued out of that even if there’s some coherent, true, and compelling sense in which it is not what “I” “actually” “want”. If the ASI asks whether it should run that check as part of some other request, I would tell it to shut up.
(Note that it’s different from examining whether my idea of eudaimonia/human flourishing/the-thing-I-mean-when-I-say-human-flourishing is correct/good, or whether my fundamental assumptions about how the world works are correct, etc.)
As opposed to a supposedly corrigible but secretly eudaimonic ASI which, in one’s imagination, always happens to gently question the human’s decisions when the human orders it to do something bad, and then happens to pick the specific avenues of questioning that make the human “realize” they wanted good things all along.
How about for example:
Not saying this is some sort of grand solution to corrigibility, but it’s obviously better than the nonsense you listed. If a human were going to try to help me out, I’d want this, for example, more than the things you listed, and it doesn’t seem especially incompatible with corrigible behavior.
I mean, yes, but you wrote a lot of stuff after this that seems weird / missing the point, to me. A “corrigible AGI” should do at least as well as—really, much better than—you would do, if you had a huge team of researchers under you and your full time, 100,000x speed job is to do a really good job at “being corrigible, whatever that means” to the human in the driver’s seat. (In the hypothetical you’re on board with this for some reason.)
I would guess fairly strongly that you’re mistaken or confused about this, in a way that an AGI would understand and be able to explain to you. (An example of how that would be the case: the version of “eudaimonia” that would not horrify you, if you understood it very well, has to involve meta+open consciousness (of a rather human flavor).)
I’m curious to hear more about those specific deliberate actions.
Your and my beliefs/questions don’t feel like they’re even much coming into contact with each other… Like, you (and also other people) just keep repeating “something bad could happen”. And I’m like “yeah obviously something extremely bad could happen; maybe it’s even likely, IDK; and more likely, something very bad at the beginning of the reign would happen (Genghis spends is first 200 years doing more killing and raping); but what I’m ASKING is, what happens then?”.
If you’re saying
then, ok, you can say that, but I want to understand why; and I have some reasons (as presented) for thinking otherwise.
Your hypothesis is about the dynamics within human minds embedded in something like contemporary societies with lots of other diverse humans whom the rulers are forced to model for one reason or another.
My point is that evil, rash, or unwise decisions at the very start of the process are likely, and that those decisions are likely to irrevocably break the conditions in which the dynamics you hypothesize are possible. Make the minds in charge no longer human in the relevant sense, or remove the need to interact with/model other humans, etc.
In my view, it doesn’t strongly bear on the final outcome-distribution whether the “humans tend to become nicer to other humans over time” hypothesis is correct, because “the god-kings remain humans hanging around all the other humans in a close-knit society for millennia” is itself a very rare class of outcomes.
Absolutely not, no. Humans want to be around (some) other people, so the emperor will choose to be so. Humans want to be [many core aspects of humanness, not necessarily per se, but individually], so the emperor will choose to be so. Yes, the emperor could want these insufficiently for my argument to apply, as I’ve said earlier. But I’m not immediately recalling anyone (you or others) making any argument that, with high or even substantial probability, the emperor would not want these things sufficiently for my question, about the long-run of these things, to be relevant.
Yes: some other people. The ideologically and morally aligned people, usually. Social/informational bubbles that screen away the rest of humanity, from which they only venture out if forced to (due to the need to earn money/control the populace, etc.). This problem seems to get worse as the ability to insulate yourself from other improves, as could be observed with modern internet-based informational bubbles or the surrounded-by-yes-men problem of dictators.
ASI would make this problem transcendental: there would truly be no need to ever bother with the people outside your bubble again, they could be wiped out or their management outsourced to AIs.
Past this point, you’re likely never returning to bothering about them. Why would you, if you can instead generate entire worlds of the kinds of people/entities/experiences you prefer? It seems incredibly unlikely that human social instincts can only be satisfied – or even can be best satisfied – by other humans.
You’re 100% not understanding my argument, which is sorta fair because I didn’t lay it out clearly, but I think you should be doing better anyway.
Here’s a sketch:
Humans want to be human-ish and be around human-ish entities.
So the emperor will be human-ish and be around human-ish entities for a long time. (Ok, to be clear, I mean a lot of developmental / experiential time—the thing that’s relevant for thinking about how the emperor’s way of being trends over time.)
When being human-ish and around human-ish entities, core human shards continue to work.
When core human shards continue to work, MAYBE this implies EVENTUALLY adopting beneficence (or something else like cosmopolitanism), and hence good outcomes.
Since the emperor will be human-ish and be around human-ish entities for a long time, IF 4 obtains, then good outomes.
And then I give two IDEAS about 4 (communing->[universalist democracy], and [information increases]->understanding->caring).
I don’t know what’s making you think I don’t understand your argument. Also, I’ve never publicly stated that I’m opting into Crocker’s Rules, so while I happen not to particularly mind the rudeness, your general policy on that seems out of line here.
My argument is that the process you’re hypothesizing would be sensitive to the exact way of being human-ish, the exact classes of human-ish entities around, and the exact circumstances in which the emperor has to be around them.
As a plain and down-to-earth example, if a racist surrounds themselves with a hand-picked group of racist friends, do you expect them to eventually develop universal empathy, solely through interacting with said racist friends? Addressing your specific ideas: nobody in that group would ever need to commune with non-racists, nor have to bother learning more about non-racists. And empirically, such groups don’t seem to undergo spontaneous deradicalizations.
I expect they’d get bored with that.
So what do you think happens when they are hanging out together, and they are in charge, and it has been 1,000 years or 1,000,000 years?
One or both of:
They keep each other radicalized forever as part of some transcendental social dynamic.
They become increasingly non-human as time goes on, small incremental modifications and personality changes building on each other, until they’re no longer human in the senses necessary for your hypothesis to apply.
I assume your counter-model involves them getting bored of each other and seeking diversity/new friends, or generating new worlds to explore/communicate with, with the generating processes not constrained to only generate racists, leading to the extremists interacting with non-extremists and eventually incrementally adopting non-extremist perspectives?
If yes, this doesn’t seem like the overdetermined way for things to go:
The generating processes would likely be skewed towards only generating things the extremists would find palatable, meaning more people sharing their perspectives/not seriously challenging whatever deeply seated prejudices they have. They’re there to have a good time, not have existential/moral crises.
They may make any number of modifications to themselves to make them no longer human-y in the relevant sense. Including by simply letting human-standard self-modification algorithms run for 10^3-10^6 years, becoming superhumanly radicalized.
They may address the “getting bored” part instead, periodically wiping their memories (including by standard human forgetting) or increasing each other’s capacity to generate diverse interactions.
Ok so they only generate racists and racially pure people. And they do their thing. But like, there’s no other races around, so the racism part sorta falls by the wayside. They’re still racially pure of course, but it’s usually hard to tell that they’re racist; sometimes they sit around and make jokes to feel superior over lesser races, but this is pretty hollow since they’re not really engaged in any type of race relations. Their world isn’t especially about all that, anymore. Now it’s about… what? I don’t know what to imagine here, but the only things I do know how to imagine involve unbounded structure (e.g. math, art, self-reflection, self-reprogramming). So, they’re doing that stuff. For a very long time. And the race thing just is not a part of their world anymore. Or is it? I don’t even know what to imagine there. Instead of having tastes about ethnicity, they develop tastes about questions in math, or literature. In other words, [the differences between people and groups that they care about] migrate from race to features of people that are involved in unbounded stuff. If the AGI has been keeping the racially impure in an enclosure all this time, at some point the racists might have a glance back, and say, wait, all the interesting stuff about people is also interesting about these people. Why not have them join us as well.
For the same reason that most people (if given the power to do so) wouldn’t just replace their loved ones with their altered versions that are better along whatever dimensions the person judged them as deficient/imperfect.
Yeah I mean this is perfectly plausible, it’s just that even these cases are not obvious to me.
If this were true, I’d expect much lower divorce rates. After all, who do you have the most information about other than your wife/husband, and many of these divorces are un-amicable, though I wasn’t quickly able to get particular numbers. [EDIT:] Though in either case, this indeed indicates a much decreasing level of love over long periods of time & greater mutual knowledge. See also the decrease in all objective measures of quality of life after divorce for both parties after long marriages.
(I wrote my quick take quickly and therefore very elliptically, and therefore it would require extra charity / work on the reader’s part (like, more time spent asking “huh? this makes no sense? ok what could he have meant, which would make this statement true?”).)
It’s an interesting point, but I’m talking about time scales of, say, thousands of years or millions of years. So it’s certainly not a claim that could be verified empirically by looking at any individual humans because there aren’t yet any millenarians or megaannumarians. Possibly you could look at groups that have had a group consciousness for thousands of years, and see if pairs of them get friendlier to each other over time, though it’s not really comparable (idk if there are really groups like that in continual contact and with enough stable collectivity; like, maybe the Jews and the Indians or something).
If its not a conclusion which could be disproven empirically, then I don’t know how you came to it.
I mean, I did ask myself about counter-arguments you could have with my objection, and came to basically your response. That is, something approximating “well they just don’t have enough information, and if they had way way more information then they’d love each other again” which I don’t find satisfying.
Namely because I expect people in such situations get stuck in a negative-reinforcement cycle, where the things which used to be fun which the other did lose their novelty over time as they get repetitive, which leads to the predicted reward of those interactions overshooting the actual reward, which in a TD learning sense is just as good (bad) as a negative reinforcement event. I don’t see why this would be fixed with more knowledge, and it indeed does seem likely to be exacerbated with more knowledge as more things the other does become less novel & more boring, and worse, fundamental implications of their nature as a person, rather than unfortunate accidents they can change easily.
I also think intuitions in this area are likely misleading. It is definitely the case now that marginally more understanding of each other would help with coordination problems, since people love making up silly reasons to hate each other. I do also think this is anchoring too much on our current bandwidth limitations, and generalizing too far. Better coordination does not always imply more love.
This does not sound like the sort of problem you’d just let yourself wallow in for 1000 years.
And again, with regards to what is fixed by more information, I’m saying that capacity for love increases more.
After 1000 years, both people would have gotten bored with themselves, and learned to do infinite play!
Oh my god. Do you think when I said this, I meant “has no evidentiary entanglement with sense observatiosn we can make”?
Maybe there’s a more basic reading comprehension fail: I said capacity to love increases more with more information, not that you magically start loving each other.
Maybe some people are, and some people are not?
Not sure if we are talking about the same thing, but I think that there are many people who just “play it safe”, and in a civilized society that generally means following the rules and avoiding unnecessary conflicts. The same people can behave differently if you give them power (even on a small scale, e.g. when they have children).
But I think there are also people who try to do good even when the incentives point the other way round. And also people who can’t resist hurting others even when that predictably gets them punished.
Knowing more about people allows you to have a better model of them. So if you started with the assumption e.g. that people who don’t seem sufficiently similar to you are bad, then knowing them better will improve your attitude towards them. On the other hand, if you started from some kind of Pollyanna perspective, knowing people better can make you disappointed and bitter. Finally, if you are a psychopath, knowing people better just gives you more efficient ways to exploit them.
Right. Presumably, maybe. But I am interested in considering quite extreme versions of the claim. Maybe there’s only 10,000 people who would, as emperor, make a world that is, after 1,000,000 years, net negative according to us. Maybe there’s literally 0? I’m not even sure that there aren’t literally 0, though quite plausibly someone else could know this confidently. (For example, someone could hypothetically have solid information suggesting that someone could remain truly delusionally and disorganizedly psychotic and violent to such an extent that they never get bored and never grow, while still being functional enough to give directions to an AI that specify world domination for 1,000,000 years.)
Sounds to me like wishful thinking. You basically assume that in 1 000 000 years people will get bored of doing the wrong thing, and start doing the right thing. My perspective is that “good” is a narrow target in the possibility space, and if someone already keeps missing it now, if we expand their possibility space by making them a God-emperor, the chance of converging to that narrow target only decreases.
Basically, for your model to work, kindness would need to be the only attractor in the space of human (actually, post-human) psychology.
A simple example of how things could go wrong is for Genghis Khan to set up an AI to keep everyone else in horrible conditions forever, and then (on purpose, or accidentally) wirehead himself.
Another example is the God-emperor editing their own brain to remove all empathy, e.g. because they consider it a weakness at the moment. Once all empathy is uninstalled, there is no incentive to reinstall it.
EDIT: I see that Thane Ruthenis already made this argument, and didn’t convince you.
No, I ask the question, and then I present a couple hypothesis-pieces. (Your stance here seems fairly though not terribly anti-thought AFAICT, so FYI I may stop engaging without further warning.)
I’m seriously questioning whether it’s a narrow target for humans.
Curious to hear other attractors, but your proposals aren’t really attractors. See my response here: https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=jfAoxAaFxWoDy3yso
Ah I see you saw Ruthenis’s comment and edited your comment to say so, so I edited my response to your comment to say that I saw that you saw.
Well, if we assume that humans are fundamentally good / inevitably converging to kindness if given enough time… then, yeah, giving someone God-emperor powers is probably going to be good in long term. (If they don’t accidentally make an irreparable mistake.)
I just strongly disagree with this assumption.
It’s not an assumption, it’s the question I’m asking and discussing.
Ah, then I believe the answer is “no”.
On the time scale of current human lifespan, I guess I could point out that some old people are unkind, or that some criminals keep re-offending a lot, so it doesn’t seem like time automatically translates to more kindness.
But an obvious objection is “well, maybe they need 200 years of time, or 1000”, and I can’t provide empirical evidence against that. So I am not sure how to settle this question.
On average, people get less criminal as they get older, so that would point towards human kindness increasing in time. On the other hand, they also get less idealistic, on average, so maybe a simpler explanation is that as people get older, they get less active in general. (Also, some reduction in crime is caused by the criminals getting killed as a result of their lifestyle.)
There is probably a significant impact of hormone levels, which means that we need to make an assumption about how the God-emperor would regulate their own hormones. For example, if he decides to keep a 25 years old human male body, maybe his propensity to violence will match the body?
tl;dr—what kinds of arguments should even be used in this debate?
Ok, now we have a reasonable question. I don’t know, but I provided two argument-sketches that I think are of a potentially relevant type. At an abstract level, the answer would be “mathematico-conceptual reasoning”, just like in all previous instances where there’s a thing that has never happened before, and yet we reason somewhat successfully about it—of which there are plenty examples, if you think about it for a minute.
When I read Tsvi’s OP, I was imagining something like a (trans-/post- but not too post-)human civilization where everybody by default has an unbounded lifespan and healthspan, possibly somewhat boosted intelligence and need for cognition / open intellectual curiosity. (In which case, “people tend to X as they get older”, where X is something mostly due to things related to default human aging, doesn’t apply.)
Now start it as a modern-ish democracy or a cluster of (mostly) democracies, run for 1e4 to 1e6 years, and see what happens.
I basically don’t buy the conjecture of humans being super-cooperative in the long run, or hatred decreasing and love increasing.
To the extent that something like this is true, I expect it to be a weird industrial to information age relic that utterly shatters if AGI/ASI is developed, and this remains true even if the AGI is aligned to a human.
So just don’t make an AGI, instead do human intelligence amplification.
People love the idea (as opposed to reality) of other people quite often, and knowing the other better can allow for plenty of hate
Seems true. I don’t think this makes much contact with any of my claims. Maybe you’re trying to address:
To clarify the question (which I didn’t do a good job of in the OP), the question is more about 1000 years or 1,000,000 years than 1 or 10 years.