My government name is Mack Gallagher. I am a thrift store cashier living in Iowa. Crocker’s Rules. I am an underfunded “alignment” “researcher”. DM me if you’d like to fund my posts.
Lorec
What happens at that point depends a lot on the details of the lawbreaker’s creation. [ . . . ] The probability seems unlikely to me to be zero for the sorts of qualities which would make such an AI agent dangerous.
Have you read The Sun is big, but superintelligences will not spare Earth a little sunlight?
I’ll address each of your 4 critiques:
[ 1. ] In public policy making, you have a set of preferences, which you get from votes or surveys, and you formulate policy based on your best objective understanding of cause and effect. The preferences don’t have to be objective, because they are taken as given.
The point I’m making in the post
Well, I reject the presumption of guilt.
is that no matter whether you have to treat the preferences as objective, there is an objective fact of the matter about what someone’s preferences are, in the real world [ real, even if not physical ].
[ 2. ] [ Agreeing on such basic elements of our ontology/epistemology ] isn’t all that relevant to AI safety, because an AI only needs some potentially dangerous capabilities.
Whether or not an AI “only needs some potentially dangerous capabilities” for your local PR purposes, the global truth of the matter is that “randomly-rolled” superintelligences will have convergent instrumental desires that have to do with making use of the resources we are currently using [like the negentropy that would make Earth’s oceans a great sink for 3 x 10^27 joules], but not desires that tightly converge with our terminal desires that make boiling the oceans without evacuating all the humans first a Bad Idea.
[ 3. ] You haven’t defined consciousness and you haven’t explained how [ we can know something that lives in a physical substrate that is unlike ours is conscious ].
My intent is not to say “I/we understand consciousness, therefore we can derive objectively sound-valid-and-therefore-true statements from theories with mentalistic atoms”. The arguments I actually give for why it’s true that we can derive objective abstract facts about the mental world, begin at “So why am I saying this premise is false?”, and end at ”. . . and agree that the results came out favoring one theory or another.” If we can derive objectively true abstract statements about the mental world, the same way we can derive such statements about the physical world [e.g. “the force experienced by a moving charge in a magnetic field is orthogonal both to the direction of the field and to the direction of its motion”] this implies that we can understand consciousness well, whether or not we already do.
[ 4. ] there doesn’t need to be [ some degree of objective truth as to what is valuable ]. You don’t have to solve ethics to set policy.
My point, again, isn’t that there needs to be, for whatever local practical purpose. My point is that there is.
I think, in retrospect, the view that abstract statements about shared non-reductionist reality can be objectively sound-valid-and-therefore true follows pretty naturally from combining the common-on-LessWrong view that logical or abstract physical theories can make sound-valid-and-therefore-true abstract conclusions about Reality, with the view, also common on LessWrong, that we make a lot of decisions by modeling other people as copies of ourselves, instead of as entities primarily obeying reductionist physics.
It’s just that, despite the fact that all the pieces are there, it goes on being a not-obvious way to think, if for years and years you’ve heard about how we can only have objective theories if we can do experiments that are “in the territory” in the sense that they are outside of anyone’s map. [ Contrast with celebrity examples of “shared thought experiments” from which many people drew similar conclusions because they took place in a shared map—Singer’s Drowning Child, the Trolley Problem, Rawls’s Veil of Ignorance, Zeno’s story about Achilles and the Tortoise, Pascal’s Wager, Newcomb’s Problem, Parfit’s Hitchhiker, the St. Petersburg paradox, etc. ]
Theories With Mentalistic Atoms Are As Validly Called Theories As Theories With Only Non-Mentalistic Atoms
? Yes, that is the bad post I am rebutting.
Recently, Raginrayguns and Philosophy Bear both [presumably] read “Cargo Cult Science” [not necessarily for the first time] on /r/slatestarcodex. I follow both of them, so I looked into it. And TIL that’s where “cargo-culting” comes from. He doesn’t say why it’s wrong, he just waves his hands and says it doesn’t work and it’s silly. Well, now I feel silly. I’ve been cargo-culting “cargo-culting”. I’m a logical decision theorist. Cargo cults work. If they work unreliably, so do reductionistic methods.
Lorec’s Shortform
I once thought “slack mattered more than any outcome”. But whose slack? It’s wonderful for all humans to have more slack. But there’s a huge game-theoretic difference between the species being wealthier, and thus wealthier per capita, and being wealthy/high-status/dominant/powerful relative to other people. The first is what I was getting at by “things orthogonal to the lineup”; the second is “the lineup”. Trying to improve your position relative to copies of yourself in a way that is zero-sum is “the rat race”, or “the Red Queen’s race”, where running will ~only ever keep you in the same place, and cause you and your mirror-selves to expend a lot of effort that is useless if you don’t enjoy it.
[I think I enjoy any amount of “the rat race”, which is part of why I find myself doing any of it, even though I can easily imagine tweaking my mind such that I stop doing it and thus exit an LDT negotiation equilibrium where I need to do it all the time. But I only like it so much, and only certain kinds.]
‘Meta’, ‘mesa’, and mountains
! I’m genuinely impressed if you wrote this post without having a mental frame for the concepts drawn from LDT.
LDT says that, for the purposes of making quasi-Kantian [not really Kantian but that’s the closest thing I can gesture at OTOH that isn’t just “read the Yudkowsky”] correct decisions, you have to treat the hostile telepaths as copies of yourself.
Indexical uncertainty, ie not knowing whether you’re in Omega’s simulation or the real world, means that, even if “I would never do that”, if someone is “doing that” to me, in ways I can’t ignore, I have to act as though I might ever be in a situation where I’m basically forced to “do that”.
I can still preferentially withhold reward from copies of myself that are executing quasi-threats, though. And in fact this is correct because it minimizes quasi-threats in the mutual copies-of-myself negotiating equilibrium.
“Acquire the ability to coerce, rather than being coerced by, other agents in my environment”, is not a solution to anything—because the quasi-Rawlsian [again, not really Rawlsian, but I don’t have any better non-Yudkowsky reference points OTOH] perspective means that if you precommit to acquire power, you end up in expectation getting trodden on just as much as you trod on the other copies of you. So you’re right back where you started.
Basically, you have to control things orthogonal to your position in the lineup, to robustly improve your algorithm for negotiating with others.
And I think “be willing to back deceptions” is in fact such a socially-orthogonal improvement.
I think this means that if you care both about (a) wholesomeness and (b) ending self-deception, it’s helpful to give yourself full permission to lie as a temporary measure as needed. Creating space for yourself so you can (say) coherently build power such that it’s safe for you to eventually be fully honest.
The first sentence here, I think, verbalizes something important.
The second [instrumental-power] is a bad justification, to the extent that we’re talking about game-theoretic power [as opposed to power over reductionistic, non-mentalizing Nature]. LDT is about dealing with copies of myself. They’ll all just do the same thing [lie for power] and create needless problems.
You do give a good justification that, I think, doesn’t create any needless aggression between copies of oneself, and which I think suffices to justify “backing self-deception” as promising:
I mean something more wholehearted. If I self-deceive, it’s because it’s the best solution I have to some hostile telepath problem. If I don’t have a better solution, then I want to keep deceiving myself. I don’t just tolerate it. I actively want it there. I’ll fight to keep it there! [...]
This works way better if I trust my occlumency skills here. If I don’t feel like I have to reveal the self-deceptions I notice to others, and I trust that I can and will hide it from others if need be, then I’m still safe from hostile telepaths.
[emphases mine]
“I’m not going to draw first, but drawing second and shooting faster is what I’m all about” but for information theory.
Dath ilani are canonically 3 Earthling standard deviations smarter than Earthlings, partly because they have been deliberately optimizing their kids’ genomes for hundreds of years.
A superficially plausible promising alternate Earth without lockstep
A decision tree that’s ostensibly both normative and exhaustive of the space at hand.
I don’t know, I’m not familiar with the history; probably zero. It’s a metaphor. The things the two scenarios are supposed to have in common are first-time-ness, danger, and technical difficulty. I point out in the post that the AGI scenario is actually irreducibly harder than first-time heavier-than-air flight: you can’t safely directly simulate intelligent computations themselves for testing, because then you’re just running the actual computation.
But as for the application of “green light” standards—the actual Wright brothers were only risking their own lives. Why should someone else need to judge their project for safety?
Changed to “RLHF as actually implemented.” I’m aware of its theoretical origin story with Paul Christiano; I’m going a little “the purpose of a system is what it does”.
A metaphor: what “green lights” for AGI would look like
Motte-and-Bailey: a Short Explanation
Unlike with obvious epistemic predicates over some generality [ eg “Does it snow at 40 degrees N?”, “Can birds heavier than 44lb fly?”—or even more generally the skills of predicting the weather and building flying machines ], to which [parts of] the answers can be usefully remembered as monolithic invariants, obvious deontic predicates over generalities [ eg “Should I keep trying when I am exhausted?”, “Will it pay to fold under pressure?”—and the surrounding general skills ] don’t have generalizable answers that are independent of one’s acute strategic situation. I am not against trying to formulate invariant answers to these questions by spelling out every contingency; I am unsure whether LessWrong is the place, except when there’s some motivating or illustrative question of fact that makes your advice falsifiable [ I think Eliezer’s recent The Sun is big, but superintelligences will not spare Earth a little sunlight is a good example of this ].
Update: My best current theory [ hasn’t changed in a few months but I figured it might be worth posting ] is that composite smell data [i.e. the better part of smell processing] is passed directly from the olfactory bulb to somewhere in the entorhinal-amygdalar-temporal area, while there are a few scents that function as pheromones in the sense that we have innate responses to the scents as opposed to their associated experiences [ so, skunk and feces as well as the scent of eligible mates ] and data about these scents is relayed by thin, almost invisible projections to the hypothalamus or other nuclei in the “emotional motor system” so the behavioral responses can bootstrap.