Replace AMF with any organisation for which this statement becomes obviously true. If none such organisations exists, I’m curious.
Most likely that’s where this intuition can be traced back to
Right. You can make up a lot of just-so stories, but the one you came up with falls neatly into the categories I’m trying to explain.
In this case, being altruistic doesn’t satisfy any need at all. There’s no pressure because you’re not penalized in any way for a shitty restaurant. That’s why I make an exception for respect, in the sense that I claim that respect can be a driving force behind altruism even if other needs (like reduced income from being outperformed) are lacking.
I suppose any need ought to be considered when building incentive structures. Just using income will not always lead to the best outcome.
I think the context of “haters don’t matter” is one where you already decided to ignore them.
To me this looks like a knockdown argument to any non-solipsistic morality. I really do just care about my qualia.
In some sense it’s the same mistake the deontologists make, on a deeper level. A lot their proposed rules strike me as heavily correlated with happiness. How were these rules ever generated? Whatever process generated them must have been a consequentialist process.
If deontology is just applied consequentialism, then maybe “happiness” is just applied “0x7fff5694dc58″.
Your post still leaves the possibility that “quality of life”, “positive emotions” or “meaningfulness” are objectively existing variables, and people differ only in their weighting. But I think the problem might be worse than that.
I think this makes the problem less bad, because if you get people to go up their chain of justification, they will all end up at the same point. I think that point is just predictions of the valence of their qualia.
Because some people might already be at this level, and I worry that I’m just adding noise to their signal.
Maybe my question is this: given that, every year, I unexpectedly learn important considerations that discredit my old beliefs, how can I tell that my models are further along this process than those written by others?
Thank you. We’re reflecting on this and will reach out to have a conversation soon.
Apologies, that was a knee-jerk reply. I take it back: we did disagree about something.
We’re going to take some time to let all of this criticism sink in.
Looks like I didn’t entirely succeed in explaining our plan.
My recommendation was very concretely “try to build an internal model of what really needs to happen for AI-risk to go well” and very much not “try to tell other people what really needs to happen for AI-risk”, which is almost the exact opposite.
And that’s also what we meant. The goal isn’t to just give advice. The goal is to give useful and true advice, and this necessarily requires a model of what really needs to happen for AI risk.
We’re not just going to spin up some interesting ideas. That’s not the mindset. The mindset is to generate a robust model and take it from there, if we ever get that far.
We might be talking to people in the process, but as long as we are in the dark the emphasis will be on asking questions.
EDIT: this wasn’t a thoughtful reply. I take it back. See Ruby’s comments below
To what extent are these just features of low-trust and high-trust environments?
Assuming that these dimensions are the same, here’s my incomplete list of things that modulate trust levels:
Group size (smaller is easier to trust)
Average emotional intelligence
The quality of group memes that relate to emotions
The level of similarity of group members
Some of these might screen off others. This model suggests that groups with healthy discourse tend to be small, affluent, emotionally mature and aligned.
Apart from social effects I get the impression that there are also psychological factors that modulate the tendency to trust, including:
Independence (of those that disagree)
Different answer: one thing that I’ve seen work is to meet someone offline. People tend to be a lot more considerate after that
Still I think this line of thinking is extremely important because it means that people won’t agree with any proposal for a morality that isn’t useful for them, and keeping this in mind makes it a lot easier to propose moralities that will actually be adopted.
I agree with your conclusion, but feel like there’s some nuance lacking. In three ways.
It seems that indeed a lot of our moral reasoning is confused because we fall for some kind of moral essentialism, some idea that there is an objective morality that is more than just a cultural contract that was invented and refined by humans over the course of time.
But then you reinstall this essentialism into our “preferences”, which you hold to be grounded in your feelings:
Human flourishing is good because the idea of human flourishing makes me smile. Kicking puppies is bad because it upsets me.
We recursively justify our values, and this recursion doesn’t end at the boundary between consciousness and subconsciousness. Your feelings might appear to be your basic units of value, but they’re not. This is obvious if you consider that our observation about the world often change our feelings.
Where does this chain of justifications end? I don’t know, but I’m reasonably sure about two things:
1) The bedrock of our values are probably the same for any human being, and any difference between conscious values is either due to having seen different data, but more likely due to different people situationally benefitting more under different moralities. For example a strong person will have “values” that are more accepting of competition, but that will change once they become weaker.
2) While a confused ethicist is wrong to be looking for a “true” (normative) morality, this is still better than not searching at all because you hold your conscious values to be basic. The best of both worlds is an ethicist that doesn’t believe in normative morality, but still knows there is something to be learned about the source of our values.
Considering our evolutionary origins, it seems very unlikely to me that we are completely selfish. It seems a lot more likely to me that the source of our values is some proxy of the survival and spread of our genes.
You’re not the only one who carries your genes, and so your “selfish” preferences might not be completely selfish after all
We’re a mashup of various subagents that want different things. I’d be surprised if they all had the same moral systems. Part of you might be reflective, aware of the valence of your experience, and actively (and selfishly) trying to increase it. Part of you will reflect your preferences for things that are very not-selfish. Other parts of you will just be naive deontologists.
Does that still lead to good outcomes though? I found that being motivated by my social role makes me a lot less effective because signalling and the actual thing come apart considerably. At least for the short term.
It starts with the sense that, if something doesn’t feel viscerally obvious, there is something left to be explained.
It’s a bottom up process. I don’t determine that images will convince me, then think of some images and play them in front of me so that they will hopefully convince my s1.
Instead I “become” my s1, take on a skeptical attitude, and ask myself what the fuss is all about.
Warning: the following might give you nightmares, if you’re imaginative enough.
In this case, what happened was something like “okay, well I guess at some point we’re going to have pretty strong optimizers. Fine. So what? Ah, I guess that’s gonna mean we’re going to have some machines that carry out commands for us. Like what? Like *picture of my living room magically tidying itself up*. Really? Well yeah I can see that happening. And I suppose this magical power can also be pretty surprising. Like *blurry picture/sense of surprising outcome*. Is this possible? Yeah like *memory of this kind of surprise*. What if this surprise was like 1000x stronger? Oh fuck...”
I guess the point is that convincing a person, or a subagent, can be best explained as an internal decision to be convinced, and not as an outside force of convincingness. So if you want to convince a part of you that feels like something outside of you, then first you have to become it. You do this by sincerely endorsing whatever it has to say. Then if the part of you feels like you, you (formerly it) decide to re-evaluate the thing that the other subagent (formerly you) disagreed with.
A bit like internal double crux, but instead of going back and forth you just do one round. Guess you could call it internal ITT.
which confuses me because it seems like worrying about being embarrassed is worrying about impressions?
What I meant to say is that I can tell that my work isn’t going to be very good from next year’s standards, which are better standards because they’re more informed
I expect that there are metrics that screen off gender so we can have better predictions and also circumvent the politics of doing anything related to gender