Leaning on this, someone could write a post about the “infectiousness of realism” since it might be hard to reconcile openness to non-zero probabilities of realism with anti-realist frameworks? :P
For people who believe their actions matter infinitely more if realism is true, this could be modeled as an overriding meta-preference to act as though realism is true. Unfortunately if realism isn’t true this could go in all kinds of directions depending on how the helpful AI system would expect to get into such a judged-to-be-wrong epistemic state.
Probably you were thinking of something like teaching AIs metaphilosophy in order to perhaps improve the procedure? This would be the main alternative I see, and it does feel more robust. I am wondering though whether we’ll know by that point whether we’ve found the right way to do metaphilosophy (and how approaching that question is different from approaching whichever procedures philosophically sophisticated people would pick to settle open issues in something like the above proposals). It seems like there has to come a point where one has to hand off control to some in-advance specified “metaethical framework” or reflection procedure, and judged from my (historically overconfidence-prone) epistemic state it doesn’t feel obvious why something like Stuart’s anti-realism isn’t already close to there (though I’d say there are many open questions and I’d feel extremely unsure about how to proceed regarding for instance “2. A method for synthesising such basic preferences into a single utility function or similar object,” and also to some extent about the premise of squeezing a utility function out of basic preferences absent meta-preferences for doing that). Adding layers of caution sounds good though as long as they don’t complicate things enough to introduce large new risks.
Ethical theories don’t need to be simple. I used to have the belief that ethical theories ought to be simple/elegant/non-arbitrary for us to have a shot at them being the correct theory, a theory that intelligent civilizations with different evolutionary histories would all converge on. This made me think that NU might be that correct theory. Now I’m confident that this sort of thinking was confused: I think there is no reason to expect that intelligent civilizations with different evolutionary histories would converge on the same values, or that there is one correct set of ethics that they “should” converge on if they were approaching the matter “correctly”. So, looking back, my older intuition feels confused now in a similar way as ordering the simplest food in a restaurant in expectation of anticipating what others would order if they also thought that the goal was that everyone orders the same thing. Now I just want to order the “food” that satisfies my personal criteria (and these criteria do happen to include placing value on non-arbitrariness/simplicity/elegance, but I’m a bit less single-minded about it).
Your way of unifying psychological motivations down to suffering reduction is an “externalist” account of why decisions are made, which is different from the internal story people tell themselves. Why think all people who tell different stories are mistaken about their own reasons? The point “it is a straw man argument that NUs don’t value life or positive states“ is unconvincing, as others have already pointed out. I actually share your view that a lot of things people do might in some way trace back to a motivating quality in feelings of dissatisfaction, but (1) there are exceptions to that (e.g., sometimes I do things on auto-pilot and not out of an internal sense of urgency/need, and sometimes I feel agenty and do things in the world to achieve my reflected life goals rather than tend to my own momentary well-being), and (2) that doesn’t mean that whichever parts of our minds we most identify with need to accept suffering reduction as the ultimate justification of their actions. For instance, let’s say you could prove that a true proximate cause why a person refused to enter Nozick’s experience machine was that, when they contemplated the decision, they felt really bad about the prospect of learning that their own life goals are shallower and more self-centered than they would have thought, and *therefore* they refuse the offer. Your account would say: “They made this choice driven by the avoidance of bad feelings, which just shows that ultimately they should accept the offer, or choose whichever offer reduces more suffering all-things-considered.“ Okay yeah, that’s one story to tell. But the person in question tells herself the story that she made this choice because she has strong aspirations about what type of person she wants to be. Why would your externally-imported justification be more valid (for this person’s life) than her own internal justification?
I think I broadly agree with all the arguments to characterize the problem and to motivate indefinability as a solution, but I have a different (meta-)meta-level intuitions about how palatable indefinability would be, and as a result of that, I’d say I have been thinking about similar issues in a differently drawn framework. While you seem to advocate for “salvaging the notion of ’one ethics’“ while highlighting that we then need to live with indefinability, I am usually thinking of it in terms of: “Most of this is underdefined, and that’s unsettling at least in some (but not necessarily all) cases, and if we want to make it less underdefined, the notion of ‘one ethics’ has to give.“ Maybe one reason why I find indefinability harder to tolerate is because in my own thinking, the problem arises forcefully at an earlier/higher-order stage already, and therefore the span of views that “ethics” is indefinable about(?) is larger and already includes questions of high practical significance. Having said that, I think there are some important pragmatic advantages to an “ethics includes indefinability“ framework, and that might be reason enough to adopt it. While different frameworks tend to differ in the underlying intuitions they highlight or move into the background, I think there is more than one parsimonious framework in which people can “do moral philosophy“ in a complete and unconfused way. Translation between frameworks can be difficult though (which is one reason I started to write a sequence about moral reasoning under anti-realism, to establish a starting points for disagreements, but then I got distracted – it’s on hold now).
Some more unorganized comments (apologies for “lazy“ block-quote commenting):
Moral indefinability is the term I use for the idea that there is no ethical theory which provides acceptable solutions to all moral dilemmas, and which also has the theoretical virtues (such as simplicity, precision and non-arbitrariness) that we currently desire.
This idea seems correct to me. And as you indicate later in the paragraph, we can add that it’s plausible that the “theoretical virtues“ are not well-specified either (e.g., there’s disagreement between people’s theoretical desiderata, or there’s vagueness in how to cash out a desideratum such as “non-arbitrariness”).
My claim is that eventually we will also need to change our meta-level intuitions in important ways, because it will become clear that the only theories which match them violate key object-level intuitions.
This recommendation makes sense to me (insofar as one can still do that), but I don’t think it’s completely obvious. Because both meta-level intuitions and object-level intuitions are malleable in humans, and because there’s no(t obviously a) principled distinction between these two types of intuitions, it’s an open question to what degree people want to adjust their meta-level intuitions in order to not have to bite the largest bullets.
If the only reason people were initially tempted to bite the bullets in question (e.g., accept a counterintuitive stance like the repugnant conclusion) was because they had a cached thought that “Moral theories ought to be simple/elegant“, then it makes a lot of sense to adjust this one meta-level intuition after the realization that it seems ungrounded. However, maybe “Moral theories ought to be simple/elegant“ is more than just a cached thought for some people:
Some moral realists buy the “wager” that their actions matter infinitely more in case moral realism is true. I suspect that an underlying reason why they find this wager compelling is that they have strong meta-level intuitions about what they want morality to be like, and it feels to them that it’s pointless to settle for something other than that.
I’m not a moral realist, but I find myself having similarly strong meta-level intuitions about wanting to do something that is “non-arbitrary” and in relevant ways “simple/elegant”. I’m confused about whether that’s literally the whole intuition, or whether I can break it down into another component. But motivationally it feels like this intuition is importantly connected to what makes it easy for me to go “all-in“ for my ethical/altruistic beliefs.
A second reason to believe in moral indefinability is the fact that human concepts tend to be open texture: there is often no unique “correct” way to rigorously define them.
I strongly agree with this point. I think even very high-level concepts in moral philosophy or the philosophy of reason/self-interest are “open texture“ like that. In your post you seem to start with an assumption that people have a rough, shared sense of what “ethics“ is about. But if the fuzziness is already attacking at this very high level, it calls into question whether you can find a solution that seems satisfying to different people’s (fuzzy and underdetermined) sense of what the question/problem is even about.
For instance, there is the narrow interpretations such as “ethics as altruism/caring/doing good“ (which I think roughly captures at least large parts of what you assume, and it also captures the parts I’m personally most interested in). There’s also “ethics as cooperation or contract“. And maybe the two blend into each other.
Then there’s the broader (I label it “existentialist“) sense in which ethics is about “life goals“ or “Why do I get up in the morning?“. And within this broader interpretation of it, you suddenly get narrower subdomains like “realism about rationality“ or “What makes up a person’s self-interest?“ where the connection to the other narrower domains (e.g. “ethics as altruism“) are not always clear.
I think indefinability is a plausible solution (or meta-philosophical framework?) for all of these. But when the scope over which we observe indefinability becomes so broad, it illustrates why it might feel a bit frustrating for some people, because without clearly delineated concepts it can be harder to make progress, and so a framework in which indefinability plays a central role could in some cases obscure conceptual progress in subareas where one might be able to make such progress (at least at the “my personal morality“ level, though not necessarily at the level of a “consensus morality“).
(I’m not sure I’m disagreeing with you BTW; probably I’m just adding thoughts and blowing up the scope of your post.)
I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much—for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs.
I agree. The second part of my comment here tries to talk about this as well.
And even if idealised reflection is a coherent concept, it simply passes the buck to your idealised self, who might then believe my arguments and decide to change their meta-level intuitions.
Yeah. I assume most of us are familiar with a deep sense of uncertainty about whether we found the right approach to ethical deliberation. And one can maybe avoid to feel this uncomfortable feeling of uncertainty by deferring to idealized reflection. But it’s not obvious that this lastingly solves the underlying problem: Maybe we’ll always feel uncertain whenever we enter the mode of “actually making a moral judgment“. If I found myself as a virtual person who is part of a moral reflection procedure such as Paul Christiano’s indirect normativity, I wouldn’t suddenly know and feel confident in how to resolve my uncertainties. And the extra power, and the fact that life in the reflection procedure would be very different from the world I currently know, introduces further risks and difficulties. I think there are still reasons why one might want to value particularly-open-ended moral reflection, but maybe it’s important that people don’t use the uncomfortable feeling of “maybe I’m doing moral philosophy wrong“ as their sole reason to value particularly-open-ended moral reflection. If the reality is that this feeling never goes away, then there seems something wrong with the underlying intuition that valuing particularly-open-ended moral reflection is by default the “safe” or “prudent” thing to do. (And I’m not saying it’s wrong for people value particularly-open-ended moral reflection; I suspect that it depends on one’s higher-order intuitions: For every perspective there’s a place where the buck stops.)
From an anti-realist perspective, I claim that perpetual indefinability would be better.
It prevents fanaticism, which is a big plus. And it plausibly creates more agreement, which is also a plus in some weirder sense (there’s a “non-identity problem” type thing about whether we can harm future agents by setting up the memetic environment such that they’ll end up having less easily satisfiable goals, compared to an alternative where they’d find themselves in larger agreement and therefore with more easily satisfiable goals). A drawback is that it can mask underlying disagreements and maybe harm underdeveloped positions relative to the status quo.
That may be a little more difficult to swallow from a realist perspective, of course. My guess is that the core disagreement is whether moral claims are more like facts, or more like preferences or tastes
That’s a good description. I sometimes use the analogy of “morality is more like career choice than scientific inquiry“.
I don’t think that’s a coincidence: psychologically, humans just aren’t built to be maximisers, and so a true maximiser would be fundamentally adversarial.
This is another good instrumental/pragmatic argument why anti-realists interested in shaping the memetic environment where humans engage in moral philosophy might want to promote the framing of indefinability rather than “many different flavors of consequentialism, and (eventually) we should pick“.
AlphaStar’s innovative league-based training process finds the approaches that are most reliable and least likely to go wrong.
“Go wrong” is still tied to the game’s win condition. So while the league-based training process does find the set of agents whose gameplay is least exploitable (among all the agents they trained), it’s not obvious how this relates to problems in AGI safety such as goal specification or robustness to capability gains. Maybe they’re thinking of things like red teaming. But without more context I’m not sure how safety-relevant this is.
2. The ability to comment on a specific line in a document, with the comment showing up in context.
Yeah, I really like how convenient that is.
For me there’s a huge difference between these two.
In gdocs I feel like it’s more okay to write “unpolished” comments. I think that’s mostly because the expectations are lower. Polishing my comments takes me 3-5x longer, which often takes away the motivation to comment at all.
In a public forum I worry more about provoking misleading impressions. For instance, in a gdoc shared with people who know me well, I’m not worried that a comment like “AIs might do [complex sequence of actions]” will get people to think that I have weirdly confident views about how the future might play out. In public conversations I’d experience a strong urge to qualify statements like that even though it feels tedious to do so.
You need a lot of hindsight bias to say that it was clear from the get go which paradigms were going to win over the last century.
Sure. And I think Kuhn’s main point as summarized by Scott really does give a huge blow to the naive view that you can just compare successful predictions to missed predictions, etc.
But to think that you cannot do better than chance at generating successful new hypotheses is obviously wrong. There would be way too many hypotheses to consider, and not enough scientists to test them. From merely observing science’s success, we can conclude that there has to be some kind of skill (Yudkowksy’s take on this is here and here, among other places) that good scientists employ to do better than chance at picking what to work on. And IMO it’s a strange failure of curiosity to not want to get to the bottom of this when studying Kuhn or the history of science.
When I hear scientists talk about Thomas Kuhn, he sounds very reasonable. [...] When I hear philosophers talk about Thomas Kuhn, he sounds like a madman.
Yes, this! I remember I was extremely confused by the discourse around Kuhn. I’m not sure whether for me the impression was split into scientists vs. non-scientists, but I definitely felt like there was something weird about it and there were too sides to it, one that sounded potentially reasonable, and one that sounded clearly like relativism.
When taking a course on the book, I concluded that both perspectives were appropriate. One thing that went too far into relativism was Kuhn’s insistence that there is no way to tell in advance which paradigm is going to be successful. His description of this is that you pick “teams” initially for all kinds of not-truth-tracking reasons, and you only figure out many years later whether your new paradigm will be winning or not.
But I’m not sure Kuhn even was (at least in The Structure of Scientific Revolutions) explicitly saying “No, you cannot do better than chance at picking sides.” Rather, the weird thing is that I remember feeling like he was not explicitly asking that question, that he was just brushing it under the carpet. Likewise the lecturer of the course, a Kuhn expert, seemed to only be asking the question “How does (human-)science proceed?“, and never “How should science proceed?”
Suppose the agent you’re trying to imitate is itself goal-directed. In order for the imitator to generalize beyond its training distribution, it seemingly has to learn to become goal-directed (i.e., perform the same sort of computations that a goal-directed agent would). I don’t see how else it can predict what the goal-directed agent would do in a novel situation. If the imitator is not able to generalize, then it seems more tool-like than agent-like. On the other hand, if the imitatee is not goal-directed… I guess the agent could imitate humans and be not entirely goal-directed to the extent that humans are not entirely goal-directed. (Is this the point you’re trying to make, or are you saying that an imitation of a goal-directed agent would constitute a non-goal-directed agent?)
I’m not sure these are the points Rohin was trying to make, but there seem to be at least two important points here:
Imitation learning applied to humans produces goal-directed behavior only insofar humans are goal-directed
Imitation learning applied to humans produces agents no more capable than humans. (I think IDA goes beyond this by adding amplification steps, which are separate. And IRL goes beyond this by trying to correct “errors” that the humans make.)
Regarding the second point, there’s a safety-relevant sense in which a human-imitating agent is less goal-directed than the human. Because if you scale the human’s capabilities, the human will become better at achieving its personal objectives. By contrast, if you scale the imitator’s capabilities, it’s only supposed to become even better at imitating the unscaled human.
I believe for some people it’s very important to have a moment of realization that one can get to the frontier of knowledge in a given field of interest. It feels intimidating if others are making contributions that seem decisively out of your league. Because people might intuitively underestimate how far you can get with focused reading and learning, it could be good to give tailored advice to people newer to (e.g.) AI risk for how/where they can make contributions that will feel encouraging. For illustration, a few years ago I was playing a computer game for fun for quite a while until I was by chance matched up with the one of the better competitive players and I almost won against them, getting lucky. That experience showed me that I’d have a shot if I actually tried, and it encouraged me to immediately start practicing with the aim of becoming competitive at that game. It changed my mindset over night. Similarly, I think there’s a difference in mindset between “reading and talking about research topics for fun” and “reading and talking about research topics with the intent of seriously contributing”.
I agree with others that a rewarding social environment and people in a similar range of competence you can bounce ideas back-and-forth with are extremely important. If you collaborate with people who are similarly driven to figure things out and discuss ideas with you, that automatically forces you think about your ideas for much longer and in more detail. By yourself you might stop thinking about a topic once you reach a roadblock, but if every morning you wake up to new messages by a collaborator adding criticism or new bits to your thinking, you’re going to keep working on the topic.
I also suspect that people are sometimes too modest (or in the wrong mindset) to develop the habit of “taking stances”. Some people know about a lot of different considerations and can tell you in detail what others have written, but they don’t invest effort coming up with their own opinion – presumably because they don’t consider themselves to be experts. Some of the community norms about not being overconfident might contribute to this failure mode, but the two things are distinct because people can try practicing taking stances with personal “pre-Aumann opinions”, which they are free to largely ignore when deferring to the experts for an all-things-considered judgment.
Speculation about personality traits conducive to generating ideas: OCD was mentioned in the comments. There’s also OCPD and hyperfocus. Carl Shulman’s advice for researchers among other things mentions something about having a strong emotional reaction to people being wrong on the internet (in communities you care about) – I think this might be a symptom of being very invested in the ideas, and it can help further clarify one’s thinking while trying to articulate fervently why something is wrong. Need for closure also seems relevant to me. It has its dangers because it can lead to one-sided thinking. But in me at least I’m often driven by feeling deeply unsatisfied with not having answers to questions that seem strategically important. And, anecdotally, I know some people with low need for closure who I consider to be phenomenal researchers in most important respects, but these people are less creative than I would be with their skills and backgrounds, and their obsessive focus maybe goes into greater width of research rather than zooming in on making progress on the “construction sites”. Finally, I strongly agree with John Maxwell’s point that a “temporary delusion” for thinking that one’s ideas are really good is a great reinforcement mechanism (even though it often leads to embarrassment later on).
I interpreted Wei’s comment as saying that even your reflective life goals would be underdetermined—presumably even now if you hear convincing moral argument A but not B, then you’d have different reflective life goals than if you hear B but not A.
Okay yeah, that also seems broadly correct to me.
I am hoping though that, as long as I’m not subjected to optimization pressures from outside that weren’t crafted to be helpful, it’s very rare that something I’d currently consider very important can end up either staying important or becoming completely unimportant merely based on order of new arguments encountered. And similarly I’m hoping that my value endpoints would still cluster decisively around the things I currently consider most important, – though that’s where it becomes tricky to trade off goal preservation versus openness for philosophical progress.
Thanks! I think I understand the intent of the rephrasing now.
What I meant with “obscure” is that both “true utility function” and “utility function that encodes the optimal actions to take for the best possible universe” have normative terminology in them that I don’t know how to reduce or operationalize.
For instance, imagine I am looking at action sequences and ranking them. Presumably large portions of that process would feel like difficult judgment calls where I’d feel nervous about still making some kind of mistake. Both your phrasings (to my ears) carry the connotation that there is a “best” mistake model, one which is in a relevant sense independent from our own judgment, where we can learn things that will make us more and more confident that now we’re probably not making mistakes anymore because of progress in finding the correct way of thinking about our values. That’s the part that feels obscure to me because I think we’ll always be in this unsatisfying epistemic situation where we’re nervous about making some kind of mistake by the light of a standard that we cannot properly describe.
I do get the intuition for thinking in these terms, though. It feels conceivable that another discovery similar to what cognitive biases did could improve our thinking, and I definitely agree that we want a concept for staying open to this possibility. I’m just pointing out that non-operationalized normative concepts seem obscure. (Though maybe that’s fine if we’re treating them in the same way Yudkowsky treats “magic reality fluid” – as a placeholder for whatever comes once we’re less confused about “measure”.)
This post comes from a theoretical perspective that may be alien to ML researchers; in particular, it makes an argument that simplicity priors do not solve the problem pointed out here, where simplicity is based on Kolmogorov complexity (which is an instantiation of the Minimum Description Length principle). The analog in machine learning would be an argument that regularization would not work.
Out of curiosity, is there an intuitive explanation as to why these are different? Is it mainly because ambitious value learning inevitably has to deal with lots of (systematic) mistakes in the data, whereas normally you’d make sure that the training data doesn’t contain (many) obvious mistakes? Or are there examples in ML where you can retroactively correct mistakes imported from a flawed training set?
(I’m not sure “training set” is the right word for the IRL context. Applied to ambitious value learning, what I mean would be the “human policy”.)
Update: Ah, it seems like the next post is all about this! :) My point about errors seems like it might be vaguely related, but the explanation in the next post feels more satisfying. It’s a different kind of problem because you’re not actually interested in predicting observable phenomena anymore, but instead are trying to infer the “latent variable” – the underlying principle(?) behind the inputs. The next post in the sequence also gives me a better sense of why people say that ML is typically “shallow” or “surface-level reasoning”.
Of course this is all assuming that there does exist a true utility function, but I think we can replace “true utility function” with “utility function that encodes the optimal actions to take for the best possible universe” and everything still follows through.
The replacement feels just as obscure to me as the original.
But more generally, if you think that a different set of life experiences means that you are a different person with different values, then that’s a really good reason to assume that the whole framework of getting the true human utility function is doomed. Not just ambitious value learning, _any_ framework that involves an AI optimizing some expected utility would not work.
This statement feels pretty strong, especially given that I find it trivially true that I’d be a different person under many plausible alternative histories. This makes me think I’m probably misinterpreting something. :)
At first I read your paragraph as the strong claim that if it’s true that individual human values are underdetermined at birth, then ambitious value learning looks doomed. And I’d take it as proof for “individual human values are underdetermined at birth” if, replaying history, I’d now have different values (or a different probability distribution over values) if I had encountered Yudkowsky’s writings before Singer’s, rather than vice-versa. Or if I would be less single-minded about altruism had I encountered EA a couple of years later in life, after already taking on another self-identity.
But these points (especially the second example) seem so trivially true that I’m probably talking about a different thing. In addition, they’re addressed by the solution you propose in your first paragraph, namely taking current-you as the starting point.
Another concern could be that “there is almost never a stable core of an individual human’s values”, i.e., that “even going forward from today, the values of Lukas or Rohin or Wei are going to be heavily underdetermined”. Is that the concern? This seems like it could be possible for most people, but definitely not for all people. And undetermined values are not necessarily that bad (though I find it mildly disconcerting, personally). [Edit: Wei’s comment and your reply to it sounds like this might indeed be the concern. :) Good discussion there!]
The fact that I have a hard time understanding the framework behind your statement is probably because I’m thinking in terms of a different part of my brain when I talk about “my values”. I identify very much with my reflective life goals to a point that seems unusual. I don’t identify much with “What Lukas’s behavior, if you were to put him in different environments and then watch, would indirectly consistently tell you about the things he appears to want – e.g., ‘values’ like being held in high esteem by others, having a comfortable life, romance, having either some kind of overarching purpose or enough distractions to not feel bother by the lack of purpose, etc.“. There is definitely a sense in which the code that runs me is caring about all these implicit goals. But that’s not how I most want to see it. I also know that in all the environments that offer the options to self-modify into a more efficient pursuer of explicitly held personal ideals, I would make substantial use of the option to self-modify. And that seems relevant for the same reason that we wouldn’t want to count cognitive biases as people’s values.
(I should probably continue reading the sequence and then come back to this later if I still feel unclear about it.)
And what about the tradeoff? Is there one?
What you mention in your second possibility (“rote, robotic way”) goes into a similar direction, but I’d be worried about something more specific: Difficulties at big-picture prioritization when it comes to selecting what to be interested in. I envy people who find it easy to delve into all kinds of subjects and absorb a wealth of knowledge. But those same people may then fail to be curious enough when they encounter a piece of information that really would be much more relevant than the information they usually encounter. Or they might spend their time on tasks that don’t produce the most impact.
Admittedly I’m looking at this with a terribly utilitarianism-tainted lens. Probably finding it easy to be interested in many things is generally a huge plus.
But I do suspect that there’s a tradeoff. If reading about the Battle of Cedar Creek felt 30% as interesting to our brains as reading cognitive science or Lesswrong or Peter Singer or whatever got people here hooked on these sorts of things, then maybe fewer of us would have gotten hooked.
I think I’m talking about a different concept than you are talking about. Here’s what I take to be hypocrisy that is probably/definitely bad:
When someone’s brain is really good at selective remembering and selective forgetting, remembering things so they are convenient, and forgetting things that are inconvenient. And when the person is either unconsciously or only semi-consciously acting as an amplifier of opinions, sensing where a group is likely to go and then pushing (and often overshooting) the direction in order to be first to score points. This is where flip-flopping gets its bad reputation. At the extreme the person may fail to distinguish, in terms of mental motions, between what is their actual opinion vs. what opinion they expect to earn praise.
Some of these may not always come together but I think they often do, and the common theme is self-deception and little introspection. For instance, something many people do without noticing: Everyone’s opinions fluctuate over time; sometimes you feel lukewarm about an idea, at other times you’re an ardent supporter. If it later turns out that the idea was great, you remember mostly the times you supported it. If it turns out the idea was absolutely horrible, you’re tempted to specifically remember this one 2-week window half a year before the idea fell out of fashion where you felt lukewarm about it and voiced doubts to someone (or were “almost” going to do that), and you then tell yourself and others that “you called it” even though, in reality, you totally failed to pay attention to your doubts.
Another example: You fail to understand or spot a good idea when you first hear it, then later once the context makes it more obvious that the idea was great, it occurs to you and you think it’s entirely your own idea, so much so that you’d enthusiastically tell it to the person you first heard it from. (Often this is innocent, but if it happens an uncanny number of times maybe it’s a reason to start paying attention.)
I think this type of hypocrisy hinders self growth, can prevent the right people from getting credit and amplifies group biases. So I’d say it’s very bad. But norms against hypocrisy have to be careful because it’s something that everyone might have to some degree, and the costs of enforcing norms need to be kept smaller than the actual problem. Keeping score or arguing over whose memory about something is right can create an atmosphere with effects just as bad as extreme hypocrisy itself. Sometimes hypocrisy is fueled by a desire to be held in high regard, and then being accused of hypocrisy may also worsen the mechanisms at work.
I believe that you’re right about the historicity, but for me at least, any explanations of UDT I came across a couple of years ago seemed too complicated for me to really grasp the implications for anthropics, and ADT (and the appendix of Brian’s article here) were the places where things first fell into place for my thinking. I still link to ADT these days as the best short explanation for reasoning about anthropics, though I think there may be better explanations of UDT now (suggestions?). Edit: I of course agree with giving credit to UDT being good practice.
Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
Perhaps this, in case it turns out to be highly important but difficult to get certain ingredients – e.g. priors or decision theory – exactly right. (But I have no idea, it’s also plausible that suboptimal designs could patch themselves well, get rescued somehow, or just have their goals changed without much fuss.)
Some people at MIRI might be thinking about this under nonperson predicate. (Eliezer’s view on which computations matter morally is different from the one endorsed by Brian, though.) And maybe it’s important to not limit FAI options too much by preventing mindcrime at all costs – if there are benefits against other very bad failure modes (or – cooperatively – just increased controllability for the people who care a lot about utopia-type outcomes), maybe some mindcrime in the early stages to ensure goal-alignment would be the lesser evil.