I don’t understand how this task is easier than finding a nonperson predicate. Surely you need a nonperson predicate before you can even start to build a nonsentient AI—whatever calculation convinces you that the AI you’re building is nonsentient, /is/ a nonperson predicate. Perhaps not an enormously general one—e.g. it may well only ever return 0 on entities for which you can examine the source code—but clearly a nontrivial one. Have I missed something?
lmm
The person offering the bet still (presumably) wants to minimize their loss, so they would be more likely to offer it if the unknown occurrence was impossible than if it was certain.
It may or may not be Holden’s, but I think you’ve put your finger on my real reasons for not wanting to donate to SI. I’d be interested to hear any counterpoint.
The guys at last.fm are usually very willing to help out with interesting research (or at least were when I worked there a couple of years ago), so if you particularly care about that information it’s worth trying to contact them.
I’ve never understood this argument. I have a visceral reaction against surgery (even the sight of blood can set me off); I certainly couldn’t stand to be in the same room in which surgery was being performed. Does this mean that for consistency I’m required to morally oppose surgery?
It’s striking how different our cultural response seems to be to political assassination by knife or political assassination by airstrike.
Once he uses lethal force against you, your use of lethal force would be self-defense, not murder.
I was jarringly horrified when Yudkowsky[?] casually said something like “who would ever want to eat a chocolate chip cookie as the sun’s going out” in one of the sequences. It seems I don’t just value eating chocolate chip cookies, I also (whether terminally or not) value being the kind of entity that values eating chocolate chip cookies.
The best conversations are in places that put a low value on humour. Unfortunately in wider society disliking humour is seen as a massive negative.
As a relative outsider who thought much the same thing I’d definitely want the money to go to Tuxedage—it sounds like he suffers doing this, something I’d feel much more guilty about if I wasn’t paying him.
I think you’re cherry-picking. Faith is undergoing an unprecedented decline, and has been for the last century or so; society takes time to adjust, but it’s happening. There are countries without compulsory education, but it seems to leave them worse off overall. Zero-sum competition is a commonsense outcome of evolution, but politics has matured to the point where we in the west can largely ignore it and get on with their lives (end of history). And there are enough areas where we seem luckier than one would reasonably expect. Above all, progress is happening.
As a devil I would probably try and associate more good things with atrocities, to get people to stop trying them. The holocaust puts people off reasonable eugenics; I’d make sure there was a holocaust committed by e.g. a culturally integrated organization. Or maybe try and fragment language more. Once you get people to divide into in-groups and out-groups they do the rest of the work for you.
I’m pretty indifferent to humour per se, but empirically it takes away from other things. Discussion sites where humour is valued have a lower proportion of interesting (to me) posts; television series with a lot of humour seem to make a corresponding sacrifice in character development.
If I thought that worked I would already have given MIRI all my money.
How would you empirically distinguish between your invisible-pink-unicorn maximizer and something that wasn’t an invisible-pink-unicorn maximizer? I mean, you could look for a section of code that was interpreting sensory inputs as number of invisible-pink-unicorns—except you couldn’t, because there’s no set of sensory inputs that corresponds to that, because they’re impossible. If we’re talking about counterfactuals, the counterfactual universe in which the sensory inputs that currently correspond to paperclips correspond to invisible-pink-unicorns seems just as valid as any other.
If I enjoy the subjective experience of thinking about something, I can’t think of any conceivable fact that would invalidate that.
My position would be that actions speak louder than thoughts. If you act as though you value your own happiness more than that of others… maybe you really do value your own happiness more than that of others? If you like doing certain things, maybe you value those things—I don’t see anything irrational in that.
(It’s perfectly normal to self-deceive to believe our values are more selfless than they actually are. I wouldn’t feel guilty about it—similarly, if your actions are good it doesn’t really matter whether you’re doing them for the sake of other people or for your own satisfaction)
The other resolution I can see would be to accept that you really are a set of not-entirely-aligned entities, a pattern running on untrusted hardware. At which point parts of you can try and change other parts of you. That seems rather perilous though. FWIW I accept the meat and its sometimes-contradictory desires as part of me; it feels meaningless to draw lines inside my own brain.
But people’s values changes over time, and that’s a good thing. For example in medieval/ancient times people didn’t value animals’ lives and well-being (as much) as we do today. If a medieval person tells you “well we value what we value, I don’t value animals, what more is there to say?”, would you agree with him and let him go on to burning cats for entertainment, or would you try to convince him that he should actually care about animals’ well-being?
Is that an actual change in values? Or is it merely a change of facts—much greater availability of entertainment, much less death and cruelty in the world, and the knowledge that humans and animals are much more similar than it would have seemed to the medieval worldview?
Do people who’ve changed their mind consider themselves to have different values from their past selves? Do we find that when someone has changed their mind, we can explain the relevant values in terms of some “more fundamental” value that’s just being applied to different observations (or different reasoning), or not? Can we imagine a scenario where an entity with truly different values—the good ol’ paperclip maximizer—is persuaded to change them?
I guess that’s my real point—I wouldn’t even dream of trying to persuade a paperclip maximizer to start valuing human life (except insofar as live humans encourage the production of paperclips) - it values what it values, it doesn’t value what it doesn’t value, what more is there to say? To the extent that I would hope to persuade a medieval person to act more kindly towards animals, it would be because and in terms of the values that they already have, that would likely be mostly shared with mine.
So there’s a view that a rational entity should never change its values. If we accept that, then any entity with different values from present-me seems to be in some sense not a “natural successor” of present-me, even if it remembers being me and shares all my values. There seems to be a qualitative distinction between an entity like that and upload-me, even if there are several branching upload-mes that have undergone various experiences and would no doubt have different views on concrete issues than present-me.
But that’s just an intuition, and I don’t know whether it can be made rigorous.
I was confused by this post for some time, and I feel I have an analagous but clearer example: Suppose scientist A says “I believe in proposition A, and will test it at the 95% confidence level”, and scientist B says “I believe in proposition B, and will test it at the 99% confidence level”. They go away and do their tests, and each comes back from their experiment with a p-value of 0.03. Do we now believe proposition A more or less than proposition B? The traditional scientific method, with its emphasis on testability, prefers A to B; for a bayesian it’s clear that we have the same amount of evidence for each.
Have I fairly characterised both sides? Does this capture the same paradox as the original example, and is it any clearer?