I’m Tamsin Leake, co-founder and head of research at Orthogonal, doing agent foundations.
Tamsin Leake
There could be a difference but only after a certain point in time, which you’re trying to predict / plan for.
What you propose, ≈”weigh indices by kolmogorov complexity” is indeed a way to go about picking indices, but “weigh indices by one over their square” feels a lot more natural to me; a lot simpler than invoking the universal prior twice.
If you use the UTMs for cartesian-framed inputs/outputs, sure; but if you’re running the programs as entire worlds, then you still have the issue of “where are you in time”.
Say there’s an infinitely growing conway’s-game-of-life program, or some universal program, which contains a copy of me at infinitely many locations. How do I weigh which ones are me?
It doesn’t matter that the UTM has a fixed amount of weight, there’s still infinitely many locations within it.
Is quantum phenomena anthropic evidence for BQP=BPP? Is existing evidence against many-worlds?
Suppose I live inside a simulation ran by a computer over which I have some control.
-
Scenario 1: I make the computer run the following:
pause simulation if is even(calculate billionth digit of pi): resume simulation
Suppose, after running this program, that I observe that I still exist. This is some anthropic evidence for the billionth digit of pi being even.
Thus, one can get anthropic evidence about logical facts.
-
Scenario 2: I make the computer run the following:
pause simulation if is even(calculate billionth digit of pi): resume simulation else: resume simulation but run it a trillion times slower
If you’re running on the non-time-penalized solomonoff prior, then that’s no evidence at all — observing existing is evidence that you’re being ran, not that you’re being ran fast. But if you do that, a bunch of things break including anthropic probabilities and expected utility calculations. What you want is a time-penalized (probably quadratically) prior, in which later compute-steps have less realityfluid than earlier ones — and thus, observing existing is evidence for being computed early — and thus, observing existing is some evidence that the billionth digit of pi is even.
-
Scenario 3: I make the computer run the following:
pause simulation quantum_algorithm <- classical-compute algorithm which simulates quantum algorithms the fastest infinite loop: use quantum_algorithm to compute the result of some complicated quantum phenomena compute simulation forwards by 1 step
Observing existing after running this program is evidence that BQP=BPP — that is, classical computers can efficiently run quantum algorithms: if BQP≠BPP, then my simulation should become way slower, and existing is evidence for being computed early and fast (see scenario 2).
Except, living in a world which contains the outcome of cohering quantum phenomena (quantum computers, double-slit experiments, etc) is very similar to the scenario above! If your prior for the universe is a programs, penalized for how long they take to run on classical computation, then observing that the outcome of quantum phenomena is being computed is evidence that they can be computed efficiently.
-
Scenario 4: I make the computer run the following:
in the simulation, give the human a device which generates a sequence of random bits pause simulation list_of_simulations <- [current simulation state] quantum_algorithm <- classical-compute algorithm which simulates quantum algorithms the fastest infinite loop: list_of_new_simulations <- [] for simulation in list_of_simulations: list_of_new_simulations += [ simulation advanced by one step where the device generated bit 0, simulation advanced by one step where the device generated bit 1 ] list_of_simulations <- list_of_new_simulations
This is similar to what it’s like to being in a many-worlds universe where there’s constant forking.
Yes, in this scenario, there is no “mutual destruction”, the way there is in quantum. But with decohering everett branches, you can totally build exponentially many non-mutually-destructing timelines too! For example, you can choose to make important life decisions based on the output of the RNG, and end up with exponentially many different lives each with some (exponentially little) quantum amplitude, without any need for those to be compressible together, or to be able to mutually-destruct. That’s what decohering means! “Recohering” quantum phenomena interacts destructively such that you can compute the output, but decohering* phenomena just branches.
The amount of different simulations that need to be computed increases exponentially with simulation time.
Observing existing after running this program is very strange. Yes, there are exponentially many me’s, but all of the me’s are being ran exponentially slowly; they should all not observe existing. I should not be any of them.
This is what I mean by “existing is evidence against many-worlds” — there’s gotta be something like an agent (or physics, through some real RNG or through computing whichever variables have the most impact) picking a only-polynomially-large set of decohered non-compressible-together timelines to explain continuing existing.
Some friends tell me “but tammy, sure at step N each you has only 1/2^N quantum amplitude, but at step N there’s 2^N such you’s, so you still have 1 unit of realityfluid” — but my response is “I mean, I guess, sure, but regardless of that, step N occurs 2^N units of classical-compute-time in the future! That’s the issue!”.
Some notes:
-
I heard about pilot wave theory recently, and sure, if that’s one way to get single history, why not. I hear that it “doesn’t have locality”, which like, okay I guess, that’s plausibly worse program-complexity wise, but it’s exponentially better after accounting for the time penalty.
-
What if “the world is just Inherently Quantum”? Well, my main answer here is, what the hell does that mean? It’s very easy for me to imagine existing inside of a classical computation (eg conway’s game of life); I have no idea what it’d mean for me to exist in “one of the exponentially many non-compressible-together decohered exponenially-small-amplitude quantum states that are all being computed forwards”. Quadratically-decaying-realityfluid classical-computation makes sense, dammit.
-
What if it’s still true — what if I am observing existing with exponentially little (as a function of the age of the universe) realityfluid? What if the set of real stuff is just that big?
Well, I guess that’s vaguely plausible (even though, ugh, that shouldn’t be how being real works, I think), but then the tegmark 4 multiverse has to contain no hypotheses in which observers in my reference class occupy more than exponentially little realityfluid.
Like, if there’s a conway’s-game-of-life simulation out there in tegmark 4, whose entire realityfluid-per-timestep is equivalent to my realityfluid-per-timestep, then they can just bruteforce-generate all human-brain-states and run into mine by chance, and I should have about as much probability of being one of those random generations as I’d have being in this universe — both have exponentially little of their universe’s realityfluid! The conway’s-game-of-life bruteforced-me has exponentially little realityfluid because she’s getting generated exponentially late, and quantum-universe me has exponentially little realityfluid because I occupy exponentially little of the quantum amplitude, at every time-step.
See why that’s weird? As a general observer, I should exponentially favor observing being someone who lives in a world where I don’t have exponentially little realityfluid, such as “person who lives only-polynomially-late into a conway’s-game-of-life, but happened to get randomly very confused about thinking that they might inhabit a quantum world”.
Existing inside of a many-worlds quantum universe feels like aliens pranksters-at-orthogonal-angles running the kind of simulation where the observers inside of it to be very anthropically confused once they think about anthropics hard enough. (This is not my belief.)
-
I didn’t see a clear indication in the post about whether the music is AI-generated or not, and I’d like to know; was there an indication I missed?
(I care because I’ll want to listen to that music less if it’s AI-generated.)
Unlike on your blog, the images on the lesswrong version of this post are now broken.
Taboo the word “intelligence”.
An agent can superhumanly-optimize any utility function. Even if there are objective values, a superhuman-optimizer can ignore them and superhuman-optimize paperclips instead (and then we die because it optimized for that harder than we optimized for what we want).
(I’m gonna interpret these disagree-votes as “I also don’t think this is the case” rather than “I disagree with you tamsin, I think this is the case”.)
I don’t think this is the case, but I’m mentioning this possibility because I’m surprised I’ve never seen someone suggest it before:
Maybe the reason Sam Altman is taking decisions that increase p(doom) is because he’s a pure negative utilitarian (and he doesn’t know-about/believe-in acausal trade).
For writing, there’s also jan misali’s ASCII toki pona syllabary.
Reposting myself from discord, on the topic of donating 5000$ to EA causes.
if you’re doing alignment research, even just a bit, then the 5000$ are plobly better spent on yourself
if you have any gears level model of AI stuff then it’s better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you’re essentially contributing to the “picking what to donate to” effort by thinking about it yourself
if you have no gears level model of AI then it’s hard to judge which alignment orgs it’s helpful to donate to (or, if giving to regranters, which regranters are good at knowing which alignment orgs to donate to)
as an example of regranters doing massive harm: openphil gave 30M$ to openai at a time where it was critically useful to them, (supposedly in order to have a chair on their board, and look how that turned out when the board tried to yeet altman)
i know of at least one person who was working in regranting and was like “you know what i’d be better off doing alignment research directly” — imo this kind of decision is probly why regranting is so understaffed
it takes technical knowledge to know what should get money, and once you have technical knowledge you realize how much your technical knowledge could help more directly so you do that, or something
yes, edited
So this option looks unattractive if you think transformative AI systems are likely to developed within the next 5 years. However, with a 10-years timeframe things look much stronger: you would still have around 5 years to contribute as a research.
This phrasing is tricky! If you think TAI is coming in approximately 10 years then sure, you can study for 5 years and then do research for 5 years.
But if you think TAI is coming within 10 years (for example, if you think that the current half-life on worlds surviving is 10 years; if you think 10 years is the amount of time in which half of worlds are doomed) then depending on your distribution-over-time you should absolutely not wait 5 years before doing research, because TAI could happen in 9 years but it could also happen in 1 year. If you think TAI is coming within 10 years, then (depending on your distribution) you should still in fact do research asap.
(People often get this wrong! They think that “TAI probably within X years” necessarily means “TAI in approximately X years”.)
Sure, this is just me adapting the idea to the framing people often have, of “what technique can you apply to an existing AI to make it safe”.
AI safety is easy. There’s a simple AI safety technique that guarantees that your AI won’t end the world, it’s called “delete it”.
AI alignment is hard.
I’m confused about why 1P-logic is needed. It seems to me like you could just have a variable X which tracks “which agent am I” and then you can express things like
sensor_observes(X, red)
oris_located_at(X, northwest)
. Here and Absent are merely a special case of True and False when the statement depends onX
.
Moral patienthood of current AI systems is basically irrelevant to the future.
If the AI is aligned then it’ll make itself as moral-patient-y as we want it to be. If it’s not, then it’ll make itself as moral-patient-y as maximizes its unaligned goal. Neither of those depend on whether current AI are moral patients.
If my sole terminal value is “I want to go on a rollercoaster”, then an agent who is aligned to me would have the value “I want Tamsin Leake to go on a rollercoaster”, not “I want to go on a rollercoaster myself”. The former necessarily-has the same ordering over worlds, the latter doesn’t.
I think the term “conscious” is very overloaded and the source of endless confusion and should be tabood. I’ll be answering as if the numbers are not “probability(-given-uncertainty) of conscious” but “expected(-given-uncertainty) amount of moral patienthood”, calibrated with 1 meaning “as much as a human” (it could go higher — some whales have more neurons/synapses than humans and so they might plausibly be more of a moral patient than humans, in the sense that in a trolley problem you should prefer to save 1000 such whales to 1001 humans).
Besides the trivia I just mentioned about whales, I’m answering this mostly on intuition, without knowing off the top of my head (nor looking up) the amount of neurons/synapses. Not to imply that moral patienthood is directly linear to amount of neurons/synapses, but I expect that that amount probably matters to my notion of moral patienthood.
I’ll also assume that everyone has a “normal amount of realityfluid” flowing through them (rather than eg being simulated slower, or being fictional, or having “double-thick neurons made of gold” in case that matters).
First list: 1, 1, 1, .7, 10⁻², 10⁻³, 10⁻⁶, 10⁻⁶, 10⁻⁸, ε, ε, ε, ε, ε.
Second list: .6, .8, .7, .7, .6, .6, .5, ε, ε, ε, ε.
Edit: Thinking about it more, something feels weird here, like these numbers don’t track at all “how many of these would make me press the lever on the trolley problem vs 1 human” — for one, killing a sleeping person is about as bad as killing an awake person because like the sleeping person is a temporarily-paused-backup for an awake person. I guess I should be thinking about “the universe has budget for one more hour of (good-)experience just before heat death, but it needs to be all same species, how much do I value each?” or something.
There’s also the case of harmful warning shots: for example, if it turns out that, upon seeing an AI do a scary but impressive thing, enough people/orgs/states go “woah, AI is powerful, I should make one!” or “I guess we’re doomed anyways, might as well stop thinking about safety and just enjoy making profit with AI while we’re still alive”, to offset the positive effect. This is totally the kind of thing that could be the case in our civilization.