Morphism

Karma: 176

Morphism 2 May 2024 2:42 UTC
1 point
−1
in reply to: Wei Dai’s comment on: The formal goal is a pointer
I’m 60% confident that SBF and Mao Zedong (and just about everyone) would converge to nearly the same values (which we call “human values”) if they were rational enough and had good enough decision theory.
If I’m wrong, (1) is a huge problem and the only surefire way to solve it is to actually be the human whose values get extrapolated. Luckily the de-facto nominees for this position are alignment researchers, who pretty strongly self-select for having cosmopolitan altruistic values.
I think (2) is a very human problem. Due to very weird selection pressure, humans ended up really smart but also really irrational. I think most human evil is caused by a combination of overconfidence wrt our own values and lack of knowledge of things like the unilateralist’s curse. An AGI (at least, one that comes from something like RL rather than being conjured in a simulation or something else weird) will probably end up with a way higher rationality:intelligence ratio, and so it will be much less likely to destroy everything we value than an empowered human. (Also 60% confident. I would not want to stake the fate of the universe on this claim)
I agree that moral uncertainty is a very hard problem, but I don’t think we humans can do any better on it than an ASI. As long as we give it the right pointer, I think it will handle the rest much better than any human could. Decision theory is a bit different, since you have to put that into the utility function. Dealing with moral uncertainty is just part of expected utility maximization.
To solve (2), I think we should try to adapt something like the Hippocratic principle to work for QACI, without requiring direct reference to a human’s values and beliefs (the sidestepping of which is QACI’s big advantage over PreDCA). I wonder if Tammy has thought about this.

Morphism 2 May 2024 2:41 UTC
1 point
0
in reply to: Tamsin Leake’s comment on: carado’s Shortform
What about the following:

My utility function is pretty much just my own happiness (in a fun-theoretic rather than purely hedonistic sense). However, my decision theory is updateless with respect to which sentient being I ended up as, so once you factor that in, I’m a multiverse-wide realityfluid-weighted average utilitarian.

I’m not sure how correct this is, but it’s possible.

Morphism 1 May 2024 0:33 UTC
1 point
0
on: The formal goal is a pointer
Edit log:

2024-04-30 19:31 CST: Footnote formatting fix and minor grammar fix.

20:40 CST: “The problem is...” --> “Alignment is...”

22:17 CST: Title changed from “All we need is a pointer” to “The formal goal is a pointer”

Morphism 22 Apr 2024 16:03 UTC
7 points
−2
in reply to: Tamsin Leake’s comment on: carado’s Shortform
OpenAI is not evil. They are just defecting on an epistemic prisoner’s dilemma.

Morphism 6 Mar 2024 1:19 UTC
3 points
0
in reply to: the gears to ascension’s comment on: Even if we lose, we win
Maybe some kind of simulated long-reflection type thing like QACI where “doing philosophy” basically becomes “predicting how humans would do philosophy if given lots of time and resources”

Morphism 15 Jan 2024 15:41 UTC
6 points
0
in reply to: Tamsin Leake’s comment on: Even if we lose, we win
Yes, amount of utopiastuff across all worlds remains constant, or possibly even decreases! But I don’t think amount-of-utopiastuff is the thing I want to maximize. I’d love to live in a universe that’s 10% utopia and 90% paperclips! I much prefer that to a 90% chance of extinction and a 10% chance of full-utopia. It’s like insurance. Expected money goes down, but expected utility goes up.

Decision theory does not imply that we get to have nice things, but (I think) it does imply that we get to hedge our insane all-or-nothing gambles for nice things, and redistribute the nice things across more worlds.

Morphism 15 Jan 2024 15:23 UTC
3 points
−4
in reply to: Tamsin Leake’s comment on: Even if we lose, we win
I think this is only true if we are giving the AI a formal goal to explicitly maximize, rather than training the AI haphazardly and giving it a clusterfuck of shards. It seems plausible that our FAI would be formal-goal aligned, but it seems like UAI would be more like us unaligned humans—a clusterfuck of shards. Formal-goal AI needs the decision theory “programmed into” its formal goal, but clusterfuck-shard AI will come up with decision theory on its own after it ascends to superintelligence and makes itself coherent. It seems likely that such a UAI would end up implementing LDT, or at least something that allows for acausal trade across the Everett branches.

Morphism 15 Jan 2024 15:10 UTC
1 point
0
in reply to: Throwaway2367’s comment on: Even if we lose, we win
Fixed it! Thanks! It is very confusing that half the time people talk about loss functions and the other half of the time they talk about utility functions

Morphism 2 Jan 2024 7:06 UTC
1 point
on: Diagonalization Fixed Point Exercises
Solution to 8 implemented in python using zero self-reference, where you can replace f with code for any arbitrary function on string x (escaping characters as necessary):
f=”x+‘\\n’+x”
def ff(x):
return eval(f)
(lambda s : print(ff(‘f=’+chr(34)+f+chr(34)+chr(10)+‘def ff(x):’+chr(10)+chr(9)+‘return eval(f)‘+chr(10)+s+‘(’+chr(34)+s+chr(34)+‘)’)))(“(lambda s : print(ff(‘f=’+chr(34)+f+chr(34)+chr(10)+‘def ff(x):’+chr(10)+chr(9)+‘return eval(f)‘+chr(10)+s+‘(’+chr(34)+s+chr(34)+‘)’)))”)
edit: fixed spoiler tags