I think you’ve summarized the question we’re trying to answer pretty well. Does Daniel want to go on vacations? We don’t know. How would one go about deciding whether they want to go on vacations? You seem to be missing the fact that one might be unsure about their preferences.
AprilSR
I feel like “Politics is the Mind-Killer” made two points that came out pretty clearly to me and, I’d assume, most other people.
It is very hard to discuss politics rationally.
Therefore, avoid political examples (or use historical ones) when discussing rationality.
For example, Eliezer would advocate against saying “Hey, those stupid [political party] people made a huge mistake in supporting [candidate] in the 20XX election. Let’s learn from their mistake,” unless you were quite confident people could discuss the rationality and not the politics.
I think a lot of the “might”s and “could”s were avoided mainly for emphasis. Unless you have a strong reason to believe that someone will be able to be rational about politics, you can very safely assume they won’t be. “You have to support every argument on one side,” for example, is basically saying that most people don’t understand the nuance in saying that you think an argument is flawed even if you agree with its conclusion. I very commonly see people male horribly incorrect arguments for positions I strongly support, but pointing these out as flawed is rarely looked upon nicely among people who lack rationality skills.
While the conclusions you drew from the post were obviously harmful, I feel like very few people interpreted it that way.
I’m pretty sure “qualia do not exist” is an extreme fringe position. You seem to be under the impression that materialists deny qualia, which is not the case.
That said, this is a decent argument against the position that qualia do not exist.
This is a nitpick, but I contest the $10,000 figure. If I had an incentive as strong as building an (aligned) AGI, I’m sure I could find a way to obtain upwards of a million dollars worth of compute.
I think 1000 people being struck by lightning would register as a gigantic surprise, not a less-than-1-signal-confusion.
I’ve definitely experienced mental exhaustion from video games before—particularly when trying to do an especially difficult task.
If both of those things happened I would be very interested in hearing about the person who decided to make a paperclip maximizer despite having an explicit model of human utility function they could implement.
Actually, I wouldn’t be interested in anything. I would be paperclips.
If they didn’t need exactly the same amount of information I would be very interested in what kind of math wizardry is involved.
“(It makes sense that) A proof-based agent can’t cross a bridge whose safety is dependent on the agent’s own logic being consistent, since proof-based agents can’t know whether their logic is consistent.”
If the agent crosses the bridge, then the agent knows itself to be consistent.
The agent cannot know whether they are consistent.
Therefore, crossing the bridge implies an inconsistency (they know themself to be consistent, even though that’s impossible.)
The counterfactual reasoning seems quite reasonable to me.
I believe there is some amount of broken arms over the course of my life that would be worse than losing a toe, even though the broken arms are non-permanent and the toe is permanent.
I don’t understand what “an illness like DZV” means. Depending on how similar it has to be to qualify as “like,” it might be extremely unlikely purely on the basis of there being so many conjunctions, even putting aside that many parts of it are implausible.
Do we need it to predict people with high accuracy? Humans do well enough at our level of prediction.
Reminds me of the thought experiment where you’re in hell and there’s a button that will either condemn you permanently, or, with probability increasing over time, will allow you to escape. Since permanent hell is infinitely bad, any decreased chance of that is infinitely good, so you either wait forever or make an arbitrary unjustifiable decision.
Are you sure they weren’t using kill metaphorically?
I think when people say “Could I have done X?” We can usually interpret it as if they said “Could I have done X had I wanted to?”
Eh. The next question to ask is going to depend entirely upon context. I feel like most of the time people use it in practice they’re talking about the extent of capabilities, where whether you were able to want something is irrelevant. There are other cases though.
Doesn’t being willing to accept a trade *directly follow* from the expected value of the trade being positive? Isn’t that like, the *definition* of when you should be willing to accept a trade? The only disagreement would be how likely it is that losses of knowledge / epistemics are involved in positive value trades. (My guess is it does happen rarely.)
If you have epistemic terminal values then it would not be a positive expected value trade, would it? Unless “expected value” is referring to the expected value of something other than your utility function, in which case it should’ve been specified.
Given SAI is possible, regulation on AI is necessary to prevent people from making a UFAI. Alternatively, an SAI which is not fully aligned but has not goals directly conflicting with ours might be used to prevent the creation of UFAI.
This assumes that there’s some point where things sharply cut off between being me and not being me. I think it makes more sense for my utility function to care more about something the more similar it is to me. The existence of a single additional memory means pretty much nothing, and I still care a lot about most human minds. Something entirely alien I might not care about at all.
Even if this actually raises my utility, it does it by changing my utility function. Instead of helping the people I care about, it makes me care about different people.