I’m Tamsin Leake, co-founder and head of research at Orthogonal, doing agent foundations.
Tamsin Leake
Nantes, France – ACX Meetups Everywhere 2021
The Peerless
the Insulated Goal-Program idea
goal-program bricks
PreDCA: vanessa kosoy’s alignment protocol
Like all the other uncomputable or untractable logic in the post, the AI is to make increasingly informed guesses about them using something like logical induction, where one can estimate the likelihood of a logical statement without having to determine its truth value for sure.
everything is okay
this is a specific focus for what part of utopia i’d like to live in, and i don’t really have an interest in creating descendants at the moment. i’ve written more about rules for creating new persons in ∀V, but the short version is “no kids allowed (they’re too hard for me to figure out), only copies of people”. though in a setting where aligned Elua has more agency, maybe it could figure out to make kids viable.
who really want to start a family in a way that can’t be satisfied by an alternative, yes. such as: creating a merged version of their minds and have it emit preferences in advance and then consentingly modify itself until it’s reasonably childlike, have a non-moral-patient fake mind be in the body until a certain age before being replaced with that merged mind, or any other kind of weird scheme i haven’t thought about. there are many possibilities, in virtualia and with Elua to help.
there are probly many challenges one can face if they want that. i’m just fine getting my challenges from little things like video games, at least for a while. maybe i’d get back into the challenge of designing my own video games, too; i enjoyed that one.
that’s just part of my stylistic choices in blogging; to make my formal writing more representative of my casual chatting. see eg this or this [EDIT: or this]
- 6 Oct 2022 20:03 UTC; 2 points) 's comment on confusion about alignment requirements by (
after thinking about it and asking vanessa, it seems that you’re correct; thanks for noticing. the mistake comes from the fact that i express things in terms of utility functions and vanessa expresses things in terms of loss functions, and they are reversed. the post should be fixed now.
note that in the
g(G|U)
definition, i believe it is also ≥, because-log
flips the function.
program searches
ethics and anthropics of homomorphically encrypted computations
would it be fair to say that an agent-instant is a program? and then we can say that a “continuous agent” can be a sequence of programs where each program tends to value the future containing something roughly similar to itself, in the same way that me valuing my survival means that i want the future to contain things pretty similar to me.
(i might follow up on the second part of that comment when i get aronud to reading that Simulators post)
this is an interesting, i didn’t know about this property of known FHE schemes. if it is the case that being able to run a HEC necessarily also entails the ability to encrypt into it, then the solution you propose is indeed fine.
as for physical (as opposed to cryptographic) event horizons, we’d want superintelligence to send copies of itself past those anyways.
i think the format could simply be to send into the HEC a transformation that take the entire world computation, and replaces it with a runtime containing the superintelligence at the top level, and giving it access to the simulated world such that it can examine it as much as it wants and decide when to keep it running and/or what to modify.
while what you say is true for the gravitational event horizon of black holes, it doesn’t apply to the cryptographic event horizon of HECs or the expansion-of-the-universe event horizon of our ligth cone. so, yes, some cases of event horizons may not be livable, others might still matter — including potentially yet unknown unknown ones.
i’m generally very anti-genocide as well, and i expect the situations where it is the least bad way to implement my values to be rare. nonetheless, there are some situations where it feels like every alternative is worse. for example, imagine an individual (or population of individuals) who strongly desires to be strongly tortured, such that both letting them be strongly tortured or letting them go without torture would be highly unethical — both would constitute a form suffering above a threshold we’d be okay with — and of course, also imagine that that person strongly disvalues being modified to want other things, etc. in this situation, it seems like they simply cannot be instantiated in an ethical manner.
suffering is inefficient and would waste energy
that is true and it is why i expect such situations to be relatively rare, but they’re not impossible. there are numerous historical instances of human societies running huge amounts of suffering even when it’s not efficient, because there are many nash equilibrias in local maximums; and it only takes inventing superintelligent singleton to crystallize a set of values forever, even if they include suffering.
they may not take others’ soul-structure
there’s an issue here: what does “other” mean? can i sign up to be tortured for a 1000 years without ability to opt back out, or modifying my future self such that i’d be unable to concieve or desire to opt out? i don’t think so, because i think that’s an unreasonable amount of control for me to have over my future selves. for shorter spans of time, it’s reasonabler — notably because my timeselves have enough mutual respect to respect and implement each other’s values, to an extent. but a society’s consensus shouldn’t get to decide for all of its individuals (like the baby eaters’ children in https://www.lesswrong.com/posts/HawFh7RvDM4RyoJ2d/three-worlds-collide-0-8), and i don’t think an instant-individual should get to decide arbitrarily much for arbitrarily much of its future selves. there exists a threshold of suffering at which we ought to step in and stop it.
in your sense, your perspective seems to be denying the possibility of S-risks — situations that are defined to be worse than death. you seem to think that no such situation can occur such that death would be preferable, that continued life is always preferable even if it’s full of suffering. i’m not quite sure you think this, but it seems to be what is entailed by the perspective you present.
any attempt to reduce suffering by ending a life when that life would have continued to try to survive
any ? i don’t think so at all! again, i would strongly hope that if a future me is stuck constantly trying to pursue torture, a safe AI would come and terminate that future me rather than let me experience suffering forever just because my mind is stuck in a bad loop or something like that.
but you have no right to impose your hedonic utility function on another agent. claim: preference utilitarianism iterated through coprotection/mutual-aid-and-defence games is how we got morality in the first place.
to be clear, the reason i say “suffering” and not “pain” is to use a relatively high-level/abstracted notion of “things that are bad”. given that my utilitarian preferences are probly not lexicographic, even though my valuing of self-determination is very high, there could be situations where the suffering is bad enough that my wish to terminate suffering overrides my wish to ensure self-determination. ultimately, i’ll probly bite the bullet that i intend to do good, not just do good where i “have the right” to do that — and it happens that my doing good is trying to give as self-determination to as many moral patients as i can (https://carado.moe/∀V.html), but sometimes that’s just not ethically viable.
claim: preference utilitarianism iterated through coprotection/mutual-aid-and-defence games is how we got morality in the first place.
hm. i’m not sure about this, but even if it were the case, i don’t think it would make much of a difference — i want what i want, not what is the historical thing that has caused me to want what i want. but at least in the liberal west it seems that some form of preference utilitariansm is a fairly strong foundation, sure.
in comparison a civilization living as an HEC is, worst case, relatively trivial negentropy waste.
again, this seems to be a crux here. i can think of many shapes of socities that would be horrible to exist, way worse than just “negentropy waste”. just like good worlds where we spend energy on nice things, bad worlds where we spend energy on suffering can exist.
sorry if i appear to repeat myself a bunch in this response, i want to try responding to many of the points you bring up so that we can better locate the core of our disagreement. i want to clarify that i’m not super solid on my ethical beliefs — i’m defending them not just because they’re what i believe to be right but i want to see if they hold up and/or if there are better alternatives. it’s just that “let horrible hellworlds run” / “hellworlds just wouldn’t happen” (the strawman of what your position looks like to me) does not appear to me to be that.
- 10 Sep 2022 17:06 UTC; 5 points) 's comment on ethics and anthropics of homomorphically encrypted computations by (
I don’t know that we currently have the technology to simulate researchers, let alone simulate them really fast. The point of my plan is to first let AGI overtake the universe, and then let it simulate those researchers as fast as it can implement.
In addition, getting to boot an AGI can give us advantages such as being able to extract the researchers from a simulation of earth (thanks to having an immense amount of compute available), or some other bootstrap scenarios I’ve thought of such as locating earth within the universal distribution, getting in touch with me-in-this-simulation, giving me acces to huge amounts of (dumb) compute I might need to figure out brain extraction, and then extract brains.
The point of this is not to get 100× as much time by running researchers at 100× speed, it’s getting a bazillion× as much time by running the researchers after superintelligence takes over, without any time constraints, without any (or with very little) compute constraints (and thus potentially having access to oracles), without competing with anything, and with the certainty that the superintelligence will implement whatever we do decide on.