Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
xpym
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
We’ll say that a state is in fact reachable if a group of humans could in principle take actions with actuators—hands, vocal chords, etc—that could realize that state.
The main issue here is that groups of humans may in principle be capable of great many things, but there’s a vast chasm between “in principle” and “in practice”. A superintelligence worthy of the name would likely be able to come up with plans that we wouldn’t in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
I think that saying that “executable philosophy” has failed is missing Yudkowsky’s main point. Quoting from the Arbital page:
To build and align Artificial Intelligence, we need to answer some complex questions about how to compute goodness
He claims that unless we learn how to translate philosophy into “ideas that we can compile and run”, aligned AGI is out of the question. This is not a worldview, but an empirical proposition, the truth of which remains to be determined.
There’s also an adjacent worldview, which suffuses the Sequences, that it’s possible in the relatively short term to become much more generally “rational” than even the smartest uninitiated people, “faster than science” etc, and that this is chiefly rooted in Bayes, Solomonoff &Co. It’s fair to conclude that this has largely failed, and IMO Chapman makes a convincing case that this failure was unavoidable. (He also annoyingly keeps hinting that there is a supremely fruitful “meta-rational” worldview instead that he’s about to reveal to the world. Any day now. I’m not holding my breath.)
the philosophy department thinks you should defect in a one-shot prisoners’ dilemma
Without further qualifications, shouldn’t you? There are plenty of crazy mainstream philosophical ideas, but this seems like a strange example.
Yes, I buy the general theory that he was bamboozled by misleading maps. My claim is that it’s precisely the situation where a compass should’ve been enough to point out that something had gone wrong early enough for the situation to have been salvageable, in a way that sun clues plausibly wouldn’t have.
Well, the thing I’m most interested in is the basic compass. From what I can see on the maps, he was going in the opposite direction from the main road for a long time after it should have become obvious that he had been lost. This is a truly essential thing that I’ve never gone into unfamiliar wilderness without.
If you go out into the wilderness, bring plenty of water. Maybe bring a friend. Carry a GPS unit or even a PLB if you might go into risky territory. Carry the 10 essentials.
Most people who die in the wilderness have done something stupid to wind up there. Fewer people die who have NOT done anything glaringly stupid, but it still happens, the same way. Ewasko’s case appears to have been one of these.
Hmm, so is there evidence that he did in fact follow those common-sense guidelines and died in spite of that? Google doesn’t tell me what was found alongside his remains besides a wallet.
They don’t, of course, but if you’re lucky enough not to be located among the more zealous of them and be subjected to mandatory struggle sessions, their wrath will generally be pointed at more conspicuous targets. For now, at least.
We have a significant comparative advantage to pretty much all of Western philosophy.
I do agree that there are some valuable Eastern insights that haven’t yet penetrated the Western mainstream, so work in this direction is worth a try.
We believe we’re in a specific moment in history where there’s more leverage than usual, and so there’s opportunity. We understand that chances are slim and dim.
Also reasonable.
We have been losing the thread to ‘what is good’ over the millenia. We don’t need to reinvent the wheel on this; the answers have been around.
Here I disagree. I think that much of “what is good” is contingent on our material circumstances, which are changing ever faster these days, so it’s no surprise that old answers no longer work as well as they did in their time. Unfortunately, nobody has discovered a reliable way to timely update them yet, and very few seem to even acknowledge this problem.
I don’t think that intelligence and military are likely to be much more of reckless idiots than Altman and co., what seems more probable is that their interests and attitudes genuinely align.
most modern humans are terribly confused about morality
The other option is being slightly less terribly confused, I presume.
This is why MAPLE exists, to help answer the question of what is good
Do you consider yourselves having significant comparative advantage in this area relative to all other moral philosophers throughout the millenia whose efforts weren’t enough to lift humanity from the aforementioned dismal state?
Oh, sure, I agree that an ASI would understand all of that well enough, but even if it wanted to, it wouldn’t be able to give us either all of what we think we want, or what we would endorse in some hypothetical enlightened way, because neither of those things comprise a coherent framework that robustly generalizes far out-of-distribution for human circumstances, even for one person, never mind the whole of humanity.
The best we could hope for is that some-true-core-of-us-or-whatever would generalize in such way, the AI recognizes this and propagates that while sacrificing inessential contradictory parts. But given that our current state of moral philosophy is hopelessly out of its depth relative to this, to the extent that people rarely even acknowledge these issues, trusting that AI would get this right seems like a desperate gamble to me, even granting that we somehow could make it want to.
Of course, it doesn’t look like we would get to choose not to get subjected a gamble of this sort even if more people were aware of it, so maybe it’s better for them to remain in blissful ignorance for now.
I expect this because humans seem agent-like enough that modeling them as trying to optimize for some set of goals is a computationally efficient heuristic in the toolbox for predicting humans.
Sure, but the sort of thing that people actually optimize for (revealed preferences) tends to be very different from what they proclaim to be their values. This is a point not often raised in polite conversation, but to me it’s a key reason for the thing people call “value alignment” being incoherent in the first place.
But meditation is non-addictive.
Why not? An ability to get blissed-out on demand sure seems like it could be dangerous. And, relatedly, I have seen stuff mentioning jhana addicts a few times.
Indeed, from what I see there is consensus that academic standards on elite campuses are dramatically down, likely this has a lot to do with the need to sustain holistic admissions.
As in, the academic requirements, the ‘being smarter’ requirement, has actually weakened substantially. You need to be less smart, because the process does not care so much if you are smart, past a minimum. The process cares about… other things.
So, the signalling value of their degrees should be decreasing accordingly, unless one mainly intends to take advantage of the process. Has some tangible evidence of that appeared already, and are alternative signalling opportunities emerging?
I think Scott’s name is not newsworthy either.
Metz/NYT disagree. He doesn’t completely spell out why (it’s not his style), but, luckily, Scott himself did:
If someone thinks I am so egregious that I don’t deserve the mask of anonymity, then I guess they have to name me, the same way they name criminals and terrorists.
Metz/NYT considered Scott to be bad enough to deserve whatever inconveniences/punishments would come to him as a result of tying his alleged wrongthink to his real name, is the long and short of it.
Right, the modern civilization point is more about the “green” archetype. The “yin” thing is of course much more ancient and subtle, but even so I doubt that it (and philosophy in general) was a major consideration before the advent of agriculture leading to greater stability, especially for the higher classes.
and another to actually experience the insights from the inside in a way that shifts your unconscious predictions.
Right, so my experience around this is that I’m probably one of the lucky ones in that I’ve never really had those sorts of internal conflicts that make people claim that they suffer from akrasia, or excessive shame/guilt/regret. I’ve always been at peace with myself in this sense, and so reading people trying to explain their therapy/spirituality insights usually makes me go “Huh, so apparently this stuff doesn’t come naturally to most people, shame that they have to bend themselves backwards to get to where I have always been. Cool that they have developed all these neat theoretical constructions meanwhile though.”
Maybe give some of it a try if you haven’t already, see if you feel motivated to continue doing it for the immediate benefits, and then just stick to reading about it out of curiosity if not?
Trying to dismiss the content of my thoughts does seem to help me fall asleep faster (sometimes), so there’s that at least :)
I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.