Tamsin Leake

Karma: 2,706

I’m Tamsin Leake, co-founder and head of research at Orthogonal, doing agent foundations.

Tamsin Leake 27 Apr 2024 17:25 UTC
47 points
23
on: carado’s Shortform
decision theory is no substitute for utility function

some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner’s dilemma, end up believing the following:

my utility function is about what i want for just me; but i’m altruistic (/egalitarian/cosmopolitan/pro-fairness/etc) because decision theory says i should cooperate with other agents. decision theoritic cooperation is the true name of altruism.

it’s possible that this is true for some people, but in general i expect that to be a mistaken analysis of their values.

decision theory cooperates with agents relative to how much power they have, and only when it’s instrumental.

in my opinion, real altruism (/egalitarianism/cosmopolitanism/fairness/etc) should be in the utility function which the decision theory is instrumental to. i actually intrinsically care about others; i don’t just care about others instrumentally because it helps me somehow.

some important aspects that my utility-function-altruism differs from decision-theoritic-cooperation includes:
- i care about people weighed by moral patienthood, decision theory only cares about agents weighed by negotiation power. if an alien superintelligence is very powerful but isn’t a moral patient, then i will only cooperate with it instrumentally (for example because i care about the alien moral patients that it has been in contact with); if cooperating with it doesn’t help my utility function (which, again, includes altruism towards aliens) then i won’t cooperate with that alien superintelligence. corollarily, i will take actions that cause nice things to happen to people even if they’ve very impoverished (and thus don’t have much LDT negotiation power) and it doesn’t help any other aspect of my utility function than just the fact that i value that they’re okay.
- if i can switch to a better decision theory, or if fucking over some non-moral-patienty agents helps me somehow, then i’ll happily do that; i don’t have goal-content integrity about my decision theory. i do have goal-content integrity about my utility function: i don’t want to become someone who wants moral patients to unconsentingly-die or suffer, for example.
- there seems to be a sense in which some decision theories are better than others, because they’re ultimately instrumental to one’s utility function. utility functions, however, don’t have an objective measure for how good they are. hence, moral anti-realism is true: there isn’t a Single Correct Utility Function.
decision theory is instrumental; the utility function is where the actual intrinsic/axiomatic/terminal goals/values/preferences are stored. usually, i also interpret “morality” and “ethics” as “terminal values”, since most of the stuff that those seem to care about looks like terminal values to me. for example, i will want fairness between moral patients intrinsically, not just because my decision theory says that that’s instrumental to me somehow.

Tamsin Leake 3 Nov 2023 12:09 UTC
33 points
12
on: The other side of the tidal wave
i’ve written before about how aligned-AI utopia can very much conserve much of what we value now, including doing effort to achieve things that are meaningful to ourselves or other real humans. on top of alleviating all (unconsented) suffering and (unconsented) scarcty and all the other “basics”, of course.

and without aligned-AI utopia we pretty much die-for-sure. there aren’t really attractors in-between those two.

Tamsin Leake 20 May 2023 22:53 UTC
LW: 32 AF: 7
0
AF
on: carado’s Shortform
an approximate illustration of QACI:

Tamsin Leake 11 Dec 2023 22:33 UTC
30 points
7
on: carado’s Shortform
A short comic I made to illustrate what I call “outside-view double-counting”.

(resized to not ruin how it shows on lesswrong, full-scale version here)
What links here?

Tamsin Leake 18 May 2024 16:18 UTC
28 points
30
on: carado’s Shortform
I’m surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don’t care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

Tamsin Leake 16 Mar 2024 17:56 UTC
LW: 27 AF: 4
2
AF
on: carado’s Shortform
Reposting myself from discord, on the topic of donating 5000$ to EA causes.

if you’re doing alignment research, even just a bit, then the 5000$ are plobly better spent on yourself

if you have any gears level model of AI stuff then it’s better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you’re essentially contributing to the “picking what to donate to” effort by thinking about it yourself

if you have no gears level model of AI then it’s hard to judge which alignment orgs it’s helpful to donate to (or, if giving to regranters, which regranters are good at knowing which alignment orgs to donate to)

as an example of regranters doing massive harm: openphil gave 30M$ to openai at a time where it was critically useful to them, (supposedly in order to have a chair on their board, and look how that turned out when the board tried to yeet altman)

i know of at least one person who was working in regranting and was like “you know what i’d be better off doing alignment research directly” — imo this kind of decision is probly why regranting is so understaffed

it takes technical knowledge to know what should get money, and once you have technical knowledge you realize how much your technical knowledge could help more directly so you do that, or something

Tamsin Leake 7 Apr 2023 11:54 UTC
26 points
21
on: Anthropic is further accelerating the Arms Race?
terrible news, sucks to see even more people speedrunning the extinction of everything of value.

reminder that if you destroy everything first you don’t gain anything. everything is just destroyed earlier.

Tamsin Leake 9 Mar 2024 16:34 UTC
22 points
23
on: carado’s Shortform
If my sole terminal value is “I want to go on a rollercoaster”, then an agent who is aligned to me would have the value “I want Tamsin Leake to go on a rollercoaster”, not “I want to go on a rollercoaster myself”. The former necessarily-has the same ordering over worlds, the latter doesn’t.

Tamsin Leake 14 Apr 2024 6:20 UTC
19 points
2
on: What convincing warning shot could help prevent extinction from AI?
There’s also the case of harmful warning shots: for example, if it turns out that, upon seeing an AI do a scary but impressive thing, enough people/orgs/states go “woah, AI is powerful, I should make one!” or “I guess we’re doomed anyways, might as well stop thinking about safety and just enjoy making profit with AI while we’re still alive”, to offset the positive effect. This is totally the kind of thing that could be the case in our civilization.

Tamsin Leake 3 Jun 2023 1:19 UTC
19 points
15
on: Upcoming AI regulations are likely to make for an unsafer world
even the very vague general notion that the government is regulating at all could maybe help make investment in AI more frisky, which is a good thing.

the main risk i’m worried about is that it brings more attention to AI and causes more people to think of clever AI engineering tricks.

Tamsin Leake 2 May 2024 17:33 UTC
18 points
1
in reply to: Erik Jenner’s comment on: Please stop publishing ideas/insights/research about AI

I don’t buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs

I don’t think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.

That said, if someone hasn’t thought at all about concepts like “differentially advancing safety” or “capabilities externalities,” then reading this post would probably be helpful, and I’d endorse thinking about those issues.

That’s a lot of what I intend to do with this post, yes. I think a lot of people do not think about the impact of publishing very much and just blurt-out/publish things as a default action, and I would like them to think about their actions more.

Tamsin Leake 13 Apr 2024 11:47 UTC
18 points
−4
on: carado’s Shortform
(cross-posted from my blog)

Is quantum phenomena anthropic evidence for BQP=BPP? Is existing evidence against many-worlds?

Suppose I live inside a simulation ran by a computer over which I have some control.
- Scenario 1: I make the computer run the following:
```
pause simulation

if is even(calculate billionth digit of pi):
	resume simulation
```
  Suppose, after running this program, that I observe that I still exist. This is some anthropic evidence for the billionth digit of pi being even.
  
  Thus, one can get anthropic evidence about logical facts.
- Scenario 2: I make the computer run the following:
```
  pause simulation
  
  if is even(calculate billionth digit of pi):
  	resume simulation
  else:
  	resume simulation but run it a trillion times slower
```
  If you’re running on the non-time-penalized solomonoff prior, then that’s no evidence at all — observing existing is evidence that you’re being ran, not that you’re being ran fast. But if you do that, a bunch of things break including anthropic probabilities and expected utility calculations. What you want is a time-penalized (probably quadratically) prior, in which later compute-steps have less realityfluid than earlier ones — and thus, observing existing is evidence for being computed early — and thus, observing existing is some evidence that the billionth digit of pi is even.
- Scenario 3: I make the computer run the following:
```
  pause simulation

  quantum_algorithm <- classical-compute algorithm which simulates quantum algorithms the fastest

  infinite loop:
  	use quantum_algorithm to compute the result of some complicated quantum phenomena

  	compute simulation forwards by 1 step
```
  Observing existing after running this program is evidence that BQP=BPP — that is, classical computers can efficiently run quantum algorithms: if BQP≠BPP, then my simulation should become way slower, and existing is evidence for being computed early and fast (see scenario 2).
  
  Except, living in a world which contains the outcome of cohering quantum phenomena (quantum computers, double-slit experiments, etc) is very similar to the scenario above! If your prior for the universe is a programs, penalized for how long they take to run on classical computation, then observing that the outcome of quantum phenomena is being computed is evidence that they can be computed efficiently.
- Scenario 4: I make the computer run the following:
```
  in the simulation, give the human a device which generates a sequence of random bits
  pause simulation

  list_of_simulations <- [current simulation state]

  quantum_algorithm <- classical-compute algorithm which simulates quantum algorithms the fastest

  infinite loop:
  	list_of_new_simulations <- []
  	
  	for simulation in list_of_simulations:
  		list_of_new_simulations += 
  			[ simulation advanced by one step where the device generated bit 0,
  			  simulation advanced by one step where the device generated bit 1 ]

  	list_of_simulations <- list_of_new_simulations
```
  This is similar to what it’s like to being in a many-worlds universe where there’s constant forking.
  
  Yes, in this scenario, there is no “mutual destruction”, the way there is in quantum. But with decohering everett branches, you can totally build exponentially many non-mutually-destructing timelines too! For example, you can choose to make important life decisions based on the output of the RNG, and end up with exponentially many different lives each with some (exponentially little) quantum amplitude, without any need for those to be compressible together, or to be able to mutually-destruct. That’s what decohering means! “Recohering” quantum phenomena interacts destructively such that you can compute the output, but decohering* phenomena just branches.
  
  The amount of different simulations that need to be computed increases exponentially with simulation time.
  
  Observing existing after running this program is very strange. Yes, there are exponentially many me’s, but all of the me’s are being ran exponentially slowly; they should all not observe existing. I should not be any of them.
  
  This is what I mean by “existing is evidence against many-worlds” — there’s gotta be something like an agent (or physics, through some real RNG or through computing whichever variables have the most impact) picking a only-polynomially-large set of decohered non-compressible-together timelines to explain continuing existing.
  
  Some friends tell me “but tammy, sure at step N each you has only 1/2^N quantum amplitude, but at step N there’s 2^N such you’s, so you still have 1 unit of realityfluid” — but my response is “I mean, I guess, sure, but regardless of that, step N occurs 2^N units of classical-compute-time in the future! That’s the issue!”.
Some notes:
- I heard about pilot wave theory recently, and sure, if that’s one way to get single history, why not. I hear that it “doesn’t have locality”, which like, okay I guess, that’s plausibly worse program-complexity wise, but it’s exponentially better after accounting for the time penalty.
- What if “the world is just Inherently Quantum”? Well, my main answer here is, what the hell does that mean? It’s very easy for me to imagine existing inside of a classical computation (eg conway’s game of life); I have no idea what it’d mean for me to exist in “one of the exponentially many non-compressible-together decohered exponenially-small-amplitude quantum states that are all being computed forwards”. Quadratically-decaying-realityfluid classical-computation makes sense, dammit.
- What if it’s still true — what if I am observing existing with exponentially little (as a function of the age of the universe) realityfluid? What if the set of real stuff is just that big?
  
  Well, I guess that’s vaguely plausible (even though, ugh, that shouldn’t be how being real works, I think), but then the tegmark 4 multiverse has to contain no hypotheses in which observers in my reference class occupy more than exponentially little realityfluid.
  
  Like, if there’s a conway’s-game-of-life simulation out there in tegmark 4, whose entire realityfluid-per-timestep is equivalent to my realityfluid-per-timestep, then they can just bruteforce-generate all human-brain-states and run into mine by chance, and I should have about as much probability of being one of those random generations as I’d have being in this universe — both have exponentially little of their universe’s realityfluid! The conway’s-game-of-life bruteforced-me has exponentially little realityfluid because she’s getting generated exponentially late, and quantum-universe me has exponentially little realityfluid because I occupy exponentially little of the quantum amplitude, at every time-step.
  
  See why that’s weird? As a general observer, I should exponentially favor observing being someone who lives in a world where I don’t have exponentially little realityfluid, such as “person who lives only-polynomially-late into a conway’s-game-of-life, but happened to get randomly very confused about thinking that they might inhabit a quantum world”.
Existing inside of a many-worlds quantum universe feels like aliens pranksters-at-orthogonal-angles running the kind of simulation where the observers inside of it to be very anthropically confused once they think about anthropics hard enough. (This is not my belief.)

Tamsin Leake 1 Apr 2024 11:56 UTC
18 points
1
in reply to: Yoav Ravid’s comment on: The Story of “I Have Been A Good Bing”
I didn’t see a clear indication in the post about whether the music is AI-generated or not, and I’d like to know; was there an indication I missed?

(I care because I’ll want to listen to that music less if it’s AI-generated.)

Tamsin Leake 27 Dec 2023 11:25 UTC
17 points
5
on: carado’s Shortform
I remember a character in Asimov’s books saying something to the effect of

It took me 10 years to realize I had those powers of telepathy, and 10 more years to realize that other people don’t have them.

and that quote has really stuck with me, and keeps striking me as true about many mindthings (object-level beliefs, ontologies, ways-to-use-one’s-brain, etc).

For so many complicated problem (including technical problems), “what is the correct answer?” is not-as-difficult to figure out as “okay, now that I have the correct answer: how the hell do other people’s wrong answers mismatch mine? what is the inferential gap even made of? what is even their model of the problem? what the heck is going on inside other people’s minds???”

Answers to technical questions, once you have them, tend to be simple and compress easily with the rest of your ontology. But not models of other people’s minds. People’s minds are actually extremely large things that you fundamentally can’t fully model and so you’re often doomed to confusion about them. You’re forced to fill in the details with projection, and that’s often wrong because there’s so much more diversity in human minds than we imagine.

The most complex software engineering projects in the world are absurdly tiny in complexity compared to a random human mind.
What links here?
- Does LessWrong make a difference when it comes to AI alignment? by PhilosophicalSoul (3 Jan 2024 12:21 UTC; 21 points)

Tamsin Leake 11 Dec 2023 22:36 UTC
17 points
7
on: carado’s Shortform
I’ve heard some describe my recent posts as “overconfident”.

I think I used to calibrate how confident I sound based on how much I expect the people reading/listening-to me to agree with what I’m saying, kinda out of “politeness” for their beliefs; and I think I also used to calibrate my confidence based on how much they match with the apparent consensus, to avoid seeming strange.

I think I’ve done a good job learning over time to instead report my actual inside-view, including how confident I feel about it.

There’s already an immense amount of outside-view double-counting going on in AI discourse, the least I can do is provide {the people who listen to me} with my inside-view beliefs, as opposed to just cycling other people’s opinions through me.

Hence, how confident I sound while claiming things that don’t match consensus. I actually am that confident in my inside-view. I strive to be honest by hedging what I say when I’m in doubt, but that means I also have to sound confident when I’m confident.

Tamsin Leake 23 Nov 2023 1:33 UTC
17 points
6
in reply to: Said Achmiz’s comment on: So you want to save the world? An account in paladinhood
you know what, fair enough. i’ve edited the post to be capitalized in the usual way.

Tamsin Leake 8 Oct 2023 11:44 UTC
16 points
9
on: The Gradient – The Artificiality of Alignment

the pathways to AI x-risk ultimately require a society where relying on — and trusting — algorithms for making consequential decisions is not only commonplace, but encouraged and incentivized

this is wrong, of course. the whole point of alignment, the thing that makes AI doom a particular type of risk, is that highly capable AI takes over the world on its own just fine. it does not need us to put it in charge of our institutions, it just takes over everything regardless of our trust or consent.

all it takes is one team, somewhere, to build the wrong piece of software, and a few days~months later all life on earth is dead forever. AI doom is not a function of adoption, it’s a function of the-probability-that-on-any-given-day-some-team-builds-the-thing-that-will-take-over-the-world-and-kill-everyone.

(this is why i think we lose control of the future in 0 to 5 years, rather than much later)

Tamsin Leake 12 Dec 2023 23:53 UTC
15 points
2
on: carado’s Shortform
I’m a big fan of Rob Bensinger’s “AI Views Snapshot” document idea. I recommend people fill their own before anchoring on anyone else’s.

Here’s mine at the moment:
What links here?
- Tamsin Leake's comment on AI Views Snapshots by Rob Bensinger (13 Dec 2023 19:59 UTC; 5 points)

Tamsin Leake 26 Apr 2024 15:34 UTC
14 points
2
on: Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
I’m generally not a fan of increasing the amount of illegible selection effects.

On the privacy side, can lesswrong guarantee that, if I never click on Recommended, then recombee will never see an (even anonymized) trace of what I browse on lesswrong?

Tamsin Leake 17 Dec 2023 20:44 UTC
14 points
11
in reply to: Thane Ruthenis’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
Seems right. In addition, if there was some person out there waiting to make a new AI org, it’s not like they’re waiting for the major orgs to shut down to compete.

Shutting down the current orgs does not fully solve the problem, but it surely helps a lot.

Tamsin Leake

decision theory is no substitute for utility function

Is quantum phenomena anthropic evidence for BQP=BPP? Is existing evidence against many-worlds?