I’m Tamsin Leake, co-founder and head of research at Orthogonal, doing agent foundations.
Tamsin Leake
Like all the other uncomputable or untractable logic in the post, the AI is to make increasingly informed guesses about them using something like logical induction, where one can estimate the likelihood of a logical statement without having to determine its truth value for sure.
this is a specific focus for what part of utopia i’d like to live in, and i don’t really have an interest in creating descendants at the moment. i’ve written more about rules for creating new persons in ∀V, but the short version is “no kids allowed (they’re too hard for me to figure out), only copies of people”. though in a setting where aligned Elua has more agency, maybe it could figure out to make kids viable.
who really want to start a family in a way that can’t be satisfied by an alternative, yes. such as: creating a merged version of their minds and have it emit preferences in advance and then consentingly modify itself until it’s reasonably childlike, have a non-moral-patient fake mind be in the body until a certain age before being replaced with that merged mind, or any other kind of weird scheme i haven’t thought about. there are many possibilities, in virtualia and with Elua to help.
there are probly many challenges one can face if they want that. i’m just fine getting my challenges from little things like video games, at least for a while. maybe i’d get back into the challenge of designing my own video games, too; i enjoyed that one.
that’s just part of my stylistic choices in blogging; to make my formal writing more representative of my casual chatting. see eg this or this [EDIT: or this]
- 6 Oct 2022 20:03 UTC; 2 points) 's comment on confusion about alignment requirements by (
- 22 Nov 2023 22:55 UTC; 1 point) 's comment on So you want to save the world? An account in paladinhood by (
after thinking about it and asking vanessa, it seems that you’re correct; thanks for noticing. the mistake comes from the fact that i express things in terms of utility functions and vanessa expresses things in terms of loss functions, and they are reversed. the post should be fixed now.
note that in the
g(G|U)
definition, i believe it is also ≥, because-log
flips the function.
would it be fair to say that an agent-instant is a program? and then we can say that a “continuous agent” can be a sequence of programs where each program tends to value the future containing something roughly similar to itself, in the same way that me valuing my survival means that i want the future to contain things pretty similar to me.
(i might follow up on the second part of that comment when i get aronud to reading that Simulators post)
this is an interesting, i didn’t know about this property of known FHE schemes. if it is the case that being able to run a HEC necessarily also entails the ability to encrypt into it, then the solution you propose is indeed fine.
as for physical (as opposed to cryptographic) event horizons, we’d want superintelligence to send copies of itself past those anyways.
i think the format could simply be to send into the HEC a transformation that take the entire world computation, and replaces it with a runtime containing the superintelligence at the top level, and giving it access to the simulated world such that it can examine it as much as it wants and decide when to keep it running and/or what to modify.
while what you say is true for the gravitational event horizon of black holes, it doesn’t apply to the cryptographic event horizon of HECs or the expansion-of-the-universe event horizon of our ligth cone. so, yes, some cases of event horizons may not be livable, others might still matter — including potentially yet unknown unknown ones.
i’m generally very anti-genocide as well, and i expect the situations where it is the least bad way to implement my values to be rare. nonetheless, there are some situations where it feels like every alternative is worse. for example, imagine an individual (or population of individuals) who strongly desires to be strongly tortured, such that both letting them be strongly tortured or letting them go without torture would be highly unethical — both would constitute a form suffering above a threshold we’d be okay with — and of course, also imagine that that person strongly disvalues being modified to want other things, etc. in this situation, it seems like they simply cannot be instantiated in an ethical manner.
suffering is inefficient and would waste energy
that is true and it is why i expect such situations to be relatively rare, but they’re not impossible. there are numerous historical instances of human societies running huge amounts of suffering even when it’s not efficient, because there are many nash equilibrias in local maximums; and it only takes inventing superintelligent singleton to crystallize a set of values forever, even if they include suffering.
they may not take others’ soul-structure
there’s an issue here: what does “other” mean? can i sign up to be tortured for a 1000 years without ability to opt back out, or modifying my future self such that i’d be unable to concieve or desire to opt out? i don’t think so, because i think that’s an unreasonable amount of control for me to have over my future selves. for shorter spans of time, it’s reasonabler — notably because my timeselves have enough mutual respect to respect and implement each other’s values, to an extent. but a society’s consensus shouldn’t get to decide for all of its individuals (like the baby eaters’ children in https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8), and i don’t think an instant-individual should get to decide arbitrarily much for arbitrarily much of its future selves. there exists a threshold of suffering at which we ought to step in and stop it.
in your sense, your perspective seems to be denying the possibility of S-risks — situations that are defined to be worse than death. you seem to think that no such situation can occur such that death would be preferable, that continued life is always preferable even if it’s full of suffering. i’m not quite sure you think this, but it seems to be what is entailed by the perspective you present.
any attempt to reduce suffering by ending a life when that life would have continued to try to survive
any ? i don’t think so at all! again, i would strongly hope that if a future me is stuck constantly trying to pursue torture, a safe AI would come and terminate that future me rather than let me experience suffering forever just because my mind is stuck in a bad loop or something like that.
but you have no right to impose your hedonic utility function on another agent. claim: preference utilitarianism iterated through coprotection/mutual-aid-and-defence games is how we got morality in the first place.
to be clear, the reason i say “suffering” and not “pain” is to use a relatively high-level/abstracted notion of “things that are bad”. given that my utilitarian preferences are probly not lexicographic, even though my valuing of self-determination is very high, there could be situations where the suffering is bad enough that my wish to terminate suffering overrides my wish to ensure self-determination. ultimately, i’ll probly bite the bullet that i intend to do good, not just do good where i “have the right” to do that — and it happens that my doing good is trying to give as self-determination to as many moral patients as i can (https://carado.moe/∀V.html), but sometimes that’s just not ethically viable.
claim: preference utilitarianism iterated through coprotection/mutual-aid-and-defence games is how we got morality in the first place.
hm. i’m not sure about this, but even if it were the case, i don’t think it would make much of a difference — i want what i want, not what is the historical thing that has caused me to want what i want. but at least in the liberal west it seems that some form of preference utilitariansm is a fairly strong foundation, sure.
in comparison a civilization living as an HEC is, worst case, relatively trivial negentropy waste.
again, this seems to be a crux here. i can think of many shapes of socities that would be horrible to exist, way worse than just “negentropy waste”. just like good worlds where we spend energy on nice things, bad worlds where we spend energy on suffering can exist.
sorry if i appear to repeat myself a bunch in this response, i want to try responding to many of the points you bring up so that we can better locate the core of our disagreement. i want to clarify that i’m not super solid on my ethical beliefs — i’m defending them not just because they’re what i believe to be right but i want to see if they hold up and/or if there are better alternatives. it’s just that “let horrible hellworlds run” / “hellworlds just wouldn’t happen” (the strawman of what your position looks like to me) does not appear to me to be that.
- 10 Sep 2022 17:06 UTC; 5 points) 's comment on ethics and anthropics of homomorphically encrypted computations by (
i tend to be a fan of “cosmic libertarianism” — see my attempt at something like that. it’s just that, as i explain in an answer i’ve given to another comment, there’s a big difference between trading a lot of suffering for self-determination, and trading arbitrarily much suffering for self-determination. i’m not willing to do the latter — there does seem to be potential amounts of suffering that are so bad that overriding self-determination is worth it.
while i hold this even for individuals, holding this for societies is way easier: a society that unconsentingly oppresses some of its people seems like a clear case for overriding the “society’s overall self-determination” for the sake of individual rights. this can be extended to override an individual’s self-determination over themself for example by saying that they can’t commit their future selves to undergoing arbitrarily much suffering for arbitrarily long.
to you. i understand your comment as “this kind of thing wouldn’t really be a question with black holes” and i’m saying “maybe, sure, but there are other event horizons to which it might apply too”
this seems like a very weird model to me. can you clarify what you mean by “suffering” ? whether or not you call it “suffering”, there is way worse stuff than a star. for example, a star’s worth of energy spent running variations of the holocaust is way worse than a star just doing combustion. the holocaust has a lot of suffering; a simple star probly barely has any random moral patients arising and experiencing anything.
here are some examples from me: “suffering” contains things like undergoing depression or torture, “nice things” contains things like “enjoying hugging a friend” or “enjoying having an insight”. both “consume energy” that could’ve not been spent — but isn’t the whole point of that we need to defeat moloch in order to have enough slack to have nice things, and also be sure that we don’t spend our slack printing suffering?
i could see myself biting the bullet that we should probly extinguish black holes whose contents we can’t otherwise ensure the ethicality of. not based on pain/pleasure alone, but based on whatever it is that my general high-level notions of “suffering” and “self-determination” and whatever else actually mean.
why suffer a huge utility hit to preserve a blackbox, which at its best is still much worse than your best, and at its worst is possibly truly astronomically dreadful?
the reason i disagree with this is “killing people is bad” — i.e. i care more about satisfying the values of currently existing moral patients than satisfying the values of potential alternate moral patients; and those values can include “continuing to exist”. so if possible, even up to some reasonable compute waste factor, i’d want moral patients currently existing past event horizons to have their values satisfied.
as for the blackbox and universal abhorrence thing, i think that that smuggles in the assumptions “civilizations will tend to have roughly similar values” and “a civilization’s fate (such as being in an HEC without decryption keys) can be taken as representative of most of its inhabitants’ wills, let alone all”. that latter assumption especially, is evidenced against by the current expected fate of our own civilization (getting clipped).
oops, i missed that flag. sorry.
well, i worry about the ethics of the situation where those third parties don’t unanimously agree and you end up suffering. note that your past self, while it is a very close third party, is a third party among others.
i feel like i still wanna stick to my “sorry, you can’t go to sufficiently bad hell” limitation.
(also, surely whatever “please take me out of there if X” command you’d trust third parties with, you could simply trust Elua with, no?)
depends on the amount of mental suffering. there could be an amount of mental suffering where the awake phases of that moral patient would be ethically unviable.
this doesn’t necessarily prevent their sleeping phases from existing; even if the dreams are formed by desires that would arise from the days of suffering, the AI could simply induce them synthetic desires that are statistically likely to match what they would’ve gotten from suffering, even without going through it. if they also value genuineness strongly enough, however, then their sleeping phase as it is now might be ethically unviable as well, and might have to be dissatisfied.
I don’t know that we currently have the technology to simulate researchers, let alone simulate them really fast. The point of my plan is to first let AGI overtake the universe, and then let it simulate those researchers as fast as it can implement.
In addition, getting to boot an AGI can give us advantages such as being able to extract the researchers from a simulation of earth (thanks to having an immense amount of compute available), or some other bootstrap scenarios I’ve thought of such as locating earth within the universal distribution, getting in touch with me-in-this-simulation, giving me acces to huge amounts of (dumb) compute I might need to figure out brain extraction, and then extract brains.
The point of this is not to get 100× as much time by running researchers at 100× speed, it’s getting a bazillion× as much time by running the researchers after superintelligence takes over, without any time constraints, without any (or with very little) compute constraints (and thus potentially having access to oracles), without competing with anything, and with the certainty that the superintelligence will implement whatever we do decide on.