It seems to me that rationality is extremely fragile and vulnerable, such that even though rationality might serves other goals, you have to be very uncompromising with regards to rationality, especially core things like hiding information from yourself (I was lightly opposed to the negative karma hiding myself) even if it that has appararant costs.
I agree with that. But people can have very different psychologies. Most people are prone to overconfidence, but some people are underconfident and beat themselves up too much over negative feedback. If the site offers an optional feature that is very useful for people of the latter type, it’s at least worth considering whether that’s an overall improvement. I wasn’t even annoyed that people didn’t like the feature; it was more about the way in which the person argued. Generally, more display of awareness of people having different psychologies would please me. :)
There are a bunch of conversations going on about the topic right (some in semi-private which might be public soonish).
Cool! And I appreciate the difficulty of the task at hand. :)
When I model these conversations, one failure mode I’m worried about is that the “more civility” position gets lumped together with other things that Lesswrong is probably right to be scared of.
So, the following is to delineate my own views from things I’m not saying:
I could imagine being fine with Bridgewater culture in many (but not all) contexts. I hate that in “today’s climate” it is difficult to talk about certain topics. I think it’s often the case that people complaining about tone or about not feeling welcome shouldn’t expect to have their needs accommodated.
And yet I still find some features of what I perceive to be “rationalist culture” very off-putting.
I don’t think I phrased it as well in my first comment, but I can fully get behind what Raemon said elsewhere in this thread:
Some of the language about “holding truth sacred” [...] has came across to me with a tone of single-minded focus that feels like not being willing to put an upper bound on a heart transplant, rather than earnestly asking the question “how do we get the most valuable truthseeking the most effective way?”
So it’s not that I’m saying that I’d prefer a culture where truth-seeking is occasionally completely abandoned because of some other consideration. Just that the side that superficially looks more virtuous when it comes to truth-seeking (for instance because they boldly proclaim the importance of not being bothered by tone/tact, downvote notifications, etc.) isn’t automatically what’s best in the long run.
Can you clarify which bit was off-putting? The fact that any norms were being promoted or the specific norms being promoted?
Only the latter. And also the vehemence with which these viewpoints seemed to be held and defended. I got the impression that statements of the sort “yay truth as the only sacred value” received strong support; personally I find that off-putting in many contexts.
Edit: The reason I find it off-putting isn’t that I disagree with the position as site policy. More that sometimes the appropriate thing in a situation isn’t just to respond with some tirade about why it’s good to have an unempathetic site policy.
To give some more context: Only the first instance of this had to do with explicit calls for forum policy. This was probably the same example that inspired the dialogue between Jill and John above.
The second example was a comment on the question of making downvotes less salient. While I agree that the idea has drawbacks, I was a bit perplexed that a comment arguing against it got strongly upvoted despite including claims that felt to me like problematic “rationality for rationality’s sake”: Instead of allowing people to only look at demotivating information at specific times, we declare it antithetical to the “core of rationality” to hide information whether or not it overall makes people accomplish their goals better.
The third instance was an exchange you had about conversational tone and (lack of) charity. Toward the end you said that you didn’t like the way you phrased your initial criticism, but my quick impression (and I probably only skimmed the lengthy exchange and also don’t remember details) was that I generally thought your points seemed pretty defensible, and the way your conversation partner commented would have also thrown me off. “Tone and degree of charity are very important too” is a perspective I’d like to see represented more among LW users. (But if I’m in the minority, that’s fine and I don’t object to communities keeping their defining features if the majority feels that they are benefitting.)
That doesn’t feel true to me.
Maybe I expressed it poorly, but what I meant was just that rationality is not an end in itself. If I complain that some piece of advice is not working for me because it makes me (all-things-considered, long-term) less productive (towards the things that are most important to me) and less happy, and my conversation partner makes some unqualified statement to the degree of “but it’s rational to follow this type of advice”, I will start to suspect that they are misunderstanding what rationality is for.
I liked this post a lot and loved the additional comment about “Feeling and truth-seeking norms” you wrote here.
As a small data point: there have been at least three instances in the past ~three months where I was explicitly noticing certain norm-promoting behavior in the rationalist community (and Lesswrong in particular) that I found off-putting, and “truth-seeking over everything else” captures it really well.
Treating things as sacred can lead to infectiousness where items in the vicinity of the thing are treated as sacred too, even in cases where the link to it becomes increasingly indirect.
For instance, in the discussion about whether downvote notifications should be shown to users as often as upvote notifications, I saw the sentiment expressed that it would be against the “core of rationality” to ever “hide” (by which people really just meant make less salient) certain types of useful information. Maybe this was just an expression of a visceral sentiment and not something the person would 100% endorse, but just in case it was the latter: It is misguided to think of rationality in that way. “It is rational to do x regardless of how it affects people’s quality of life and productivity” should never be an argument. Most people’s life goals aren’t solely about truth-seeking nor about always mastering unhelpful emotions.
I think I’m on board with locking in some core epistemic virtues related to truth-seeking “as though it were sacred”. I think some version of that is going to be best overall for people’s life goals. But it’s an open question how large that core should be. The cluster of things I associate with “epistemic virtue” is large and fuzzy. I am pretty confident that it’s good to treat the core of that cluster as sacred. (For instance, that might include principles like “don’t lie, present arguments rather than persuade, engage productively and listen to others, be completely transparent about moderation decisions such as banning policies,” etc.) I’m less confident it’s good for things that are a bit less central to the cluster. I’m very confident we shouldn’t treat some things in the outer layers as sacred (and doing that would kind of trigger me if I’m being honest).
I guess one could object to my stance by asking: Is it possible to treat only the clearest instances of the truth-seeking virtue cluster as sacred without slipping down the slope of losing all the benefits of having something be treated as sacred at all?
I’m not completely sure, but here are some reasons why I think it ought to be possible:
People seem to be intuitively good at dealing with fuzzy concepts. If Jill (in the OP) is transparent about conversations he’s having like the one shown with John, I am optimistic that the vast majority of the audience could come to conclude that Jill is acting in the realm of what is reasonable, even if they would sometimes draw boundaries in slightly different places.
I feel like tradeoffs are often overstated. In cases where truth-seeking norms conflict with other very important things, the best solution is rarely to have a foundational discussion about what’s more important to then kick out one of the two things. Rather, I have hope that usually one can come up with some alternative solution (such as e.g. moving discussions about veganism to a separate thread, and asking Jill to link to that separate thread with a short and discreet comment, as opposed to Jill riding her hobbyhorse on all the threads she wants to “derail”).
Personally, I think there’s just as much to lose from cultivating an overly large cluster of sacredness than from an overly small one. Goodhearting “rationality for rationality’s sake” and evaporative cooling where people put off by certain community features start contributing less and less both seem like very real risks to me.
I know there’s a strong idea around norms in the rationality community to go full courage (expressing your true beliefs) and have other people mind thmeselves and ignore the consequences (decoupling norms).
“Have other people mind themselves and ignore the consequences” comes in various degrees and flavors. In the discussions about decoupling norms I have seen (mostly in the context of Sam Harris), it appeared me that they (decoupling norms) were treated as the opposite of “being responsible for people uncharitably misunderstanding what you are saying.” So I worry that presenting it as though courage = decoupling norms makes it harder to get your point across, out of worry that people might lump your sophisticated feedback/criticism together with some of the often not-so-sophisticated criticism directed at people like Sam Harris. No matter what one might think of Harris, to me at least he seems to come across as a lot more empathetic and circumspect and less “truth over everything else” than the rationalists whose attitude about truth-seeking’s relation to other virtues I find off-putting.
Having made this caveat, I think you’re actually right that “decoupling norms” can go too far, and that there’s a gradual spectrum from “not feeling responsible for people uncharitably misunderstanding what you are saying” to “not feeling responsible about other people’s feelings ever, unless maybe if a perfect utilitarian robot in their place would also have well-justified instrumental reasons to turn on facial expressions for being hurt or upset”. I just wanted to make clear that it’s compatible to think that decoupling norms are generally good as long as considerateness and tact also come into play. (Hopefully this would mitigate worries that the rationalist community would lose something important by trying to reward considerateness a bit more.)
Thanks for this summary!
In 2017 I commented on the two-player version here.
… if the player bets in [a winning] situation only when holding the best possible hand, then the opponents would know to always fold in response. To cope with this, Pluribus keeps track of the probability it would have reached the current situation with each possible hand according to its strategy. Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand, being careful to balance its strategy across all the hands so as to remain unpredictable to the opponent. Once this balanced strategy across all hands is computed, Pluribus then executes an action for the hand it is actually holding.
Human professional players are trying to approximate this level of balancedness as well, using computer programs (“solvers”). See this youtube video for an example of a hand with solver analysis. In order to get the solver analysis started, one needs to specify input hand ranges one expects people to have in the specific situations, as well as bet sizes for the solver to consider (more than just 2-3 bet sizes would be too much for the solver to handle). To specify those parameters, professionals can make guesses (sometimes based on data) about how other players play. Because the input parameters depend on human learned wisdom rather than worked out game theory, solvers can’t quite be said to have solved poker.
So, like the computer, human players try to simplify the game tree in order to be able to approximate balanced play. However, this is much easier for computers. Pluribus knows its own counterfactuals perfectly, and it can make sure it always covers all the options for cards to have (in order to represent different board textures) and has the right number of bluffs paired with good hands for every state of the game given past actions.
It almost seems kind of easy to beat humans in this way, except that knowing how to simplify and then model the situations in the first place seemed to have been the bottleneck up until 2017.
Donk betting: some kind of uncommon play that’s usually considered dumb (like a donkey). I didn’t figure out what it actually means.
“Donk betting” has a bad reputation because it’s a typical mistake amateur players make, doing it in the wrong type of situations with the wrong types of hands. You can only donk bet in some betting round if you’re first to act, and a general weakness amateur players have is that they don’t understand the value of being last to act (having more information). To at least somewhat mitigate the awfulness of being first to act, good players try to give out as little information as possible. If you played the previous street passively and your opponent displayed strength, you generally want to check because your opponent already expects you to be weaker, and so will do the betting for you often enough because they’re still telling their story of having a stronger hand. If you donk bet when a new card improved you, you telegraph information and your opponent can play perfectly against that, folding their weak hands and continuing only with strong hands. If you check instead, you get more value from your opponent’s bluffs, and you almost always still get to put in your raise after they bet for you, reopening the betting round for you.
However, there are instances where donk betting is clearly good: When a new card is much more likely to improve your range of hands compared to your opponent’s. In certain situations a new card is terrible for one player and good for the other player. In those instances, you can expect thinking opponents to check after you even with most of their strong hands, because they became apprehensive of your range of hands having improved a lot. In that case, you sometimes want to bet out right away (both in some of the cases where you hit, as well as with bluffs).
However, Pluribus disagrees with the folk wisdom that “donk betting” (starting a round by betting when one ended the previous betting round with a call) is a mistake; Pluribus does this far more often than professional humans do.
It might just be that professional humans decide to keep the game tree simple by not developing donk bet strategies for situations where this is complicated to balance and only produces small benefits if done perfectly. But it could be that Pluribus found a more interesting reason to occasionally use donk bets in situations where professional players would struggle to see the immediate use. Unfortunately I couldn’t find any discussion of hand histories illustrating the concept.
For in-person conversations (I know this was meant as a norm for public discourse): Personally I tend to have a hard time digging into my memories for “data points” when I have a negative or positive impression of some person. It’s kind of the same thing with people asking you “What have you been working on the past week?” – I basically never remember anything immediately (even though I do work on stuff). This creates asymmetric incentives where it’s easier to make negative judgments seem unjustified or at least costly to bring up, which can contribute to a culture where justified critical opinions almost never reach enough of a consensus to change something. I definitely think there should be norms similar to the one described in the post, but I also think that there are situations (e.g., if a person has a reliable track record or if they promise to write a paragraph with some bullet points later on once they had time to introspect) were the norm should be less strict than “back the judgment up immediately or retract it.” And okay, probably one can manage to say a few words even on the spot because introspection is not that slow and opaque, but my point is simply that “This sounds unconvincing” is just as cheap a thing to say as cheap criticism, and the balance should be somewhere in between. So maybe instead of “justify” the norm should say something like “gesture at the type of reasons,” and that should be the bare minimum and more transparency is often preferable. (Another point is that introspecting on intuitive judgments helps refine them, so that’s something that people should do occasionally even if they aren’t being put on the spot to back something up.)
Needless to say, lax norms around this can be terrible in social environments where some people tend to talk too negatively about others and where the charitable voices are less frequent, so I think it’s one of those things where the same type of advice can sometimes be really good, and other times can be absolutely terrible.
I’m reluctant to reply because it sounds like you’re looking for rebuttals by explicit proponents of hard takeoff who have thought a great deal about takeoff speeds, and neither of that applies to me. But I could sketch some intuitions why reading the pieces by AI Impacts and by Christiano hasn’t felt wholly convincing to me. (I’ve never run these intuitions past anyone and don’t know if they’re similar to cruxes held by proponents of hard takeoff who are more confident in hard takeoff than I am – therefore I hope people don’t update much further against hard takeoff in case they find the sketch below unconvincing.) I found that it’s easiest for me to explain something if I can gesture towards some loosely related “themes” rather than go through a structured argument, so here are some of these themes and maybe people see underlying connections between them:
Shulman and Sandberg have argued that one way to get hard takeoff is via hardware overhang: when a new algorithmic insight can be used immediately to its full potential, because much more hardware is available than one would have needed to overtake state of the art performance metric with the new algorithms. I think there’s a similar dynamic at work with culture: If you placed an AGI into the stone age, it would be inefficient at taking over the world even with appropriately crafted output channels because stone age tools (which include stone age humans the AGI could manipulate) are neither very useful nor reliable. It would be easier for an AGI to achieve influence in 1995 when the environment contained a greater variety of increasingly far-reaching tools. But with the internet being new, particular strategies to attain power (or even just rapidly acquire knowledge) were not yet available. Today, it is arguably easier than ever for an AGI to quickly and more-or-less single-handedly transform the world.
There’s a sense in which cavemen are similarly intelligent as modern-day humans. If we time-traveled back into the stone age, found the couples with the best predictors for having gifted children, gave these couples access to 21st century nutrition and childbearing assistance, and then took their newborns back into today’s world where they’d grow up in a loving foster family with access to high-quality personalized education, there’s a good chance some of those babies would grow up to be relatively ordinary people of close to average intelligence. Those former(?) cavemen and cavewomen would presumably be capable of dealing with many if not most aspects of contemporary life and modern technology.
However, there’s also a sense in which cavemen are very unintelligent compared to modern-day humans. Culture, education, possibly even things like the Flynn effect, etc. – these really do change the way people think and act in the world. Cavemen are incredibly uneducated and untrained concerning knowledge and skills that are useful in modern, tool-rich environments.
We can think of this difference as the difference between the snapshot of someone’s intelligence at the peak of their development and their (initial) learning potential. Caveman and modern-day humans might be relatively close to each other in terms of the latter, but when considering their abilities at the peak of their personal development, the modern humans are much better at achieving goals in tool-rich environments. I sometimes get the impression that proponents of soft takeoffs underappreciate this difference when addressing comparisons between, for instance, early humans and chimpanzees (this is just a vague general impression which doesn’t apply to the arguments presented by AI impacts or by Paul Christiano).
Both for productive engineers and creative geniuses, it holds that they could only have developed their full potential because they picked up useful pieces of insight from other people. But some people cannot tell the difference between high-quality information and low-quality information, or might make wrong use even of high-quality information, reasoning themselves into biased conclusions. An AI system capable of absorbing the entire internet but terrible at telling good ideas from bad ideas won’t make too much of a splash (at least not in terms of being able to take over the world). But what about an AI system just slightly above some cleverness threshold for adopting an increasingly efficient information diet? Couldn’t it absorb the internet in a highly systematic way rather than just soaking in everything indiscriminately, learning many essential meta-skills on its way, improving how it goes about the task of further learning?
If the child in the chair next to me in fifth grade was slightly more intellectually curious, somewhat more productive, and marginally better dispositioned to adopt a truth-seeking approach and self-image than I am, this could initially mean they score 100%, and I score 95% on fifth-grade tests – no big difference. But as time goes on, their productivity gets them to read more books, their intellectual curiosity and good judgment get them to read more unusually useful books, and their cleverness gets them to integrate all this knowledge in better and increasingly more creative ways. I’ll reach a point where I’m just sort of skimming things because I’m not motivated enough to understand complicated ideas deeply, whereas they find it rewarding to comprehend everything that gives them a better sense of where to go next on their intellectual journey. By the time we graduate university, my intellectual skills are mostly useless, while they have technical expertise in several topics, can match or even exceed my thinking even on areas I specialized in, and get hired by some leading AI company. The point being: an initially small difference in dispositions becomes almost incomprehensibly vast over time.
(I realized that in this title/paragraph, the word “knowing” is meant both in the sense of “knowing how to do x” and “being capable of executing x very well.” It might be useful to try to disentangle this some more.) The standard AI foom narrative sounds a bit unrealistic when discussed in terms of some AI system inspecting itself and remodeling its inner architecture in a very deliberate way driven by architectural self-understanding. But what about the framing of being good at learning how to learn? There’s at least a plausible-sounding story we can tell where such an ability might qualify as the “secret sauce” that gives rise to a discontinuity in the returns of increased AI capabilities. In humans – and admittedly this might be too anthropomorphic – I’d think about it in this way: If my 12-year-old self had been brain-uploaded to a suitable virtual reality, made copies of, and given the task of devouring the entire internet in 1,000 years of subjective time (with no aging) to acquire enough knowledge and skill to produce novel and for-the-world useful intellectual contributions, the result probably wouldn’t be much of a success. If we imagined the same with my 19-year-old self, there’s a high chance the result wouldn’t be useful either – but also some chance it would be extremely useful. Assuming, for the sake of the comparison, that a copy clan of 19-year olds can produce highly beneficial research outputs this way, and a copy clan of 12-year olds can’t, what does the landscape look like in between? I don’t find it evident that the in-between is gradual. I think it’s at least plausible that there’s a jump once the copies reach a level of intellectual maturity to make plans which are flexible enough at the meta-level and divide labor sensibly enough to stay open to reassessing their approach as time goes on and they learn new things. Maybe all of that is gradual, and there are degrees of dividing labor sensibly or of staying open to reassessing one’s approach – but that doesn’t seem evident to me. Maybe this works more as an on/off thing.
It makes sense to be somewhat suspicious about any hypotheses according to which the evolution of general intelligence made a radical jump in Homo sapiens, creating thinking that is “discontinuous” from what came before. If knowing how to learn is an on/off ability that plays a vital role in the ways I described above, how could it evolve?We’re certainly also talking culture, not just genes. And via the Baldwin effect, natural selection can move individuals closer towards picking up surprisingly complex strategies via learning from their environment. At this point at latest, my thinking becomes highly speculative. But here’s one hypothesis: In its generalization, this effect is about learning how to learn. And maybe there is something like a “broad basin of attraction” (inspired by Christiano’s broad basin of attraction for corrigibility) for robustly good reasoning / knowing how to learn. Picking up some of the right ideas initially and early on, combined with being good at picking up things in general, produces in people an increasingly better sense of how to order and structure other ideas, and over time, the best human learners start to increasingly resemble each other, having honed in on the best general strategies.
For most people, the returns of self-improvement literature (by which I mean not just productivity advice, but also information on “how to be more rational,” etc.) might be somewhat useful, but rarely life-changing. People don’t tend to “go foom” from reading self-improvement advice. Why is that, and how does it square with my hypothesis above, that “knowing how to learn” could be a highly valuable skill with potentially huge compounding benefits? Maybe the answer is that the bottleneck is rarely knowledge about self-improvement, but rather the ability to make the best use of such knowledge? This would support the hypothesis mentioned above: If the critical skill is finding useful information in a massive sea of both useful and not-so-useful information, that doesn’t necessarily mean that people will get better at that skill if we gave them curated access to highly useful information (even if it’s information about how to find useful information, i.e., good self-improvement advice). Maybe humans don’t tend to go foom after receiving humanity’s best self-improvement advice because too much of that is too obvious for people who were already unusually gifted and then grew up in modern society where they could observe and learn from other people and their habits. However, now imagine someone who had never read any self-improvement advice, and could never observe others. For that person, we might have more reason to expect them to go foom – at least compared to their previous baseline – after reading curated advice on self-improvement (or, if it is true that self-improvement literature is often somewhat redundant, even just from joining an environment where they can observe and learn from other people and from society). And maybe that’s the situation in which the first AI system above a certain critical capabilities threshold finds itself. The threshold I mean is (something like) the ability to figure out how to learn quickly enough to then approach the information on the internet like the hypothetical 19-year olds (as opposed to the 12-year olds) from the thought experiment above.
(This argument is separate from all the other arguments above.) Here’s something I never really understood about the framing of the hard vs. soft takeoff discussion. Let’s imagine a graph with inputs such as algorithmic insights and compute/hardware on the x-axis, and general intelligence (it doesn’t matter for my purposes whether we use learning potential or snapshot intelligence) on the y-axis. Typically, the framing is that proponents of hard takeoff believe that this graph contains a discontinuity where the growth mode changes, and suddenly the returns (for inputs such as compute) are vastly higher than the outside view would have predicted, meaning that the graph makes a jump upwards in the y-axis. But what about hard takeoff without such a discontinuity? If our graph starts to be steep enough at the point where AI systems reach human-level research capabilities and beyond, then that could in itself allow for some hard (or “quasi-hard”) takeoff. After all, we are not going to be sampling points (in the sense of deploying cutting-edge AI systems) from that curve every day – that simply wouldn’t work logistically even granted all the pressures to be cutting-edge competitive. If we assume that we only sample points from the curve every two months, for instance, is it possible that for whatever increase in compute and algorithmic insights we’d get in those two months, the differential on the y-axis (some measure of general intelligence) could be vast enough to allow for attaining a decisive strategic advantage (DSA) from being first? I don’t have strong intuitions about what the offense-defense balance will shift to once we are close to AGI, but it at least seems plausible that it turns more towards offense, in which case arguably a lower differential is needed for attaining a DSA. In addition, based on the classical arguments put forward by researchers such as Bostrom and Yudkowsky, it also seems at least plausible to me that we are potentially dealing with a curve that is very steep around the human level. So, if one AGI project is two months ahead of another project, and we for the sake of argument assume that there are no inherent discontinuities in the graph in question, it’s still not evident to me that this couldn’t lead to something that very much looks like hard takeoff, just without an underlying discontinuity in the graph.
Leaning on this, someone could write a post about the “infectiousness of realism” since it might be hard to reconcile openness to non-zero probabilities of realism with anti-realist frameworks? :P
For people who believe their actions matter infinitely more if realism is true, this could be modeled as an overriding meta-preference to act as though realism is true. Unfortunately if realism isn’t true this could go in all kinds of directions depending on how the helpful AI system would expect to get into such a judged-to-be-wrong epistemic state.
Probably you were thinking of something like teaching AIs metaphilosophy in order to perhaps improve the procedure? This would be the main alternative I see, and it does feel more robust. I am wondering though whether we’ll know by that point whether we’ve found the right way to do metaphilosophy (and how approaching that question is different from approaching whichever procedures philosophically sophisticated people would pick to settle open issues in something like the above proposals). It seems like there has to come a point where one has to hand off control to some in-advance specified “metaethical framework” or reflection procedure, and judged from my (historically overconfidence-prone) epistemic state it doesn’t feel obvious why something like Stuart’s anti-realism isn’t already close to there (though I’d say there are many open questions and I’d feel extremely unsure about how to proceed regarding for instance “2. A method for synthesising such basic preferences into a single utility function or similar object,” and also to some extent about the premise of squeezing a utility function out of basic preferences absent meta-preferences for doing that). Adding layers of caution sounds good though as long as they don’t complicate things enough to introduce large new risks.
Ethical theories don’t need to be simple. I used to have the belief that ethical theories ought to be simple/elegant/non-arbitrary for us to have a shot at them being the correct theory, a theory that intelligent civilizations with different evolutionary histories would all converge on. This made me think that NU might be that correct theory. Now I’m confident that this sort of thinking was confused: I think there is no reason to expect that intelligent civilizations with different evolutionary histories would converge on the same values, or that there is one correct set of ethics that they “should” converge on if they were approaching the matter “correctly”. So, looking back, my older intuition feels confused now in a similar way as ordering the simplest food in a restaurant in expectation of anticipating what others would order if they also thought that the goal was that everyone orders the same thing. Now I just want to order the “food” that satisfies my personal criteria (and these criteria do happen to include placing value on non-arbitrariness/simplicity/elegance, but I’m a bit less single-minded about it).
Your way of unifying psychological motivations down to suffering reduction is an “externalist” account of why decisions are made, which is different from the internal story people tell themselves. Why think all people who tell different stories are mistaken about their own reasons? The point “it is a straw man argument that NUs don’t value life or positive states“ is unconvincing, as others have already pointed out. I actually share your view that a lot of things people do might in some way trace back to a motivating quality in feelings of dissatisfaction, but (1) there are exceptions to that (e.g., sometimes I do things on auto-pilot and not out of an internal sense of urgency/need, and sometimes I feel agenty and do things in the world to achieve my reflected life goals rather than tend to my own momentary well-being), and (2) that doesn’t mean that whichever parts of our minds we most identify with need to accept suffering reduction as the ultimate justification of their actions. For instance, let’s say you could prove that a true proximate cause why a person refused to enter Nozick’s experience machine was that, when they contemplated the decision, they felt really bad about the prospect of learning that their own life goals are shallower and more self-centered than they would have thought, and *therefore* they refuse the offer. Your account would say: “They made this choice driven by the avoidance of bad feelings, which just shows that ultimately they should accept the offer, or choose whichever offer reduces more suffering all-things-considered.“ Okay yeah, that’s one story to tell. But the person in question tells herself the story that she made this choice because she has strong aspirations about what type of person she wants to be. Why would your externally-imported justification be more valid (for this person’s life) than her own internal justification?
I think I broadly agree with all the arguments to characterize the problem and to motivate indefinability as a solution, but I have a different (meta-)meta-level intuitions about how palatable indefinability would be, and as a result of that, I’d say I have been thinking about similar issues in a differently drawn framework. While you seem to advocate for “salvaging the notion of ’one ethics’“ while highlighting that we then need to live with indefinability, I am usually thinking of it in terms of: “Most of this is underdefined, and that’s unsettling at least in some (but not necessarily all) cases, and if we want to make it less underdefined, the notion of ‘one ethics’ has to give.“ Maybe one reason why I find indefinability harder to tolerate is because in my own thinking, the problem arises forcefully at an earlier/higher-order stage already, and therefore the span of views that “ethics” is indefinable about(?) is larger and already includes questions of high practical significance. Having said that, I think there are some important pragmatic advantages to an “ethics includes indefinability“ framework, and that might be reason enough to adopt it. While different frameworks tend to differ in the underlying intuitions they highlight or move into the background, I think there is more than one parsimonious framework in which people can “do moral philosophy“ in a complete and unconfused way. Translation between frameworks can be difficult though (which is one reason I started to write a sequence about moral reasoning under anti-realism, to establish a starting points for disagreements, but then I got distracted – it’s on hold now).
Some more unorganized comments (apologies for “lazy“ block-quote commenting):
Moral indefinability is the term I use for the idea that there is no ethical theory which provides acceptable solutions to all moral dilemmas, and which also has the theoretical virtues (such as simplicity, precision and non-arbitrariness) that we currently desire.
This idea seems correct to me. And as you indicate later in the paragraph, we can add that it’s plausible that the “theoretical virtues“ are not well-specified either (e.g., there’s disagreement between people’s theoretical desiderata, or there’s vagueness in how to cash out a desideratum such as “non-arbitrariness”).
My claim is that eventually we will also need to change our meta-level intuitions in important ways, because it will become clear that the only theories which match them violate key object-level intuitions.
This recommendation makes sense to me (insofar as one can still do that), but I don’t think it’s completely obvious. Because both meta-level intuitions and object-level intuitions are malleable in humans, and because there’s no(t obviously a) principled distinction between these two types of intuitions, it’s an open question to what degree people want to adjust their meta-level intuitions in order to not have to bite the largest bullets.
If the only reason people were initially tempted to bite the bullets in question (e.g., accept a counterintuitive stance like the repugnant conclusion) was because they had a cached thought that “Moral theories ought to be simple/elegant“, then it makes a lot of sense to adjust this one meta-level intuition after the realization that it seems ungrounded. However, maybe “Moral theories ought to be simple/elegant“ is more than just a cached thought for some people:
Some moral realists buy the “wager” that their actions matter infinitely more in case moral realism is true. I suspect that an underlying reason why they find this wager compelling is that they have strong meta-level intuitions about what they want morality to be like, and it feels to them that it’s pointless to settle for something other than that.
I’m not a moral realist, but I find myself having similarly strong meta-level intuitions about wanting to do something that is “non-arbitrary” and in relevant ways “simple/elegant”. I’m confused about whether that’s literally the whole intuition, or whether I can break it down into another component. But motivationally it feels like this intuition is importantly connected to what makes it easy for me to go “all-in“ for my ethical/altruistic beliefs.
A second reason to believe in moral indefinability is the fact that human concepts tend to be open texture: there is often no unique “correct” way to rigorously define them.
I strongly agree with this point. I think even very high-level concepts in moral philosophy or the philosophy of reason/self-interest are “open texture“ like that. In your post you seem to start with an assumption that people have a rough, shared sense of what “ethics“ is about. But if the fuzziness is already attacking at this very high level, it calls into question whether you can find a solution that seems satisfying to different people’s (fuzzy and underdetermined) sense of what the question/problem is even about.
For instance, there is the narrow interpretations such as “ethics as altruism/caring/doing good“ (which I think roughly captures at least large parts of what you assume, and it also captures the parts I’m personally most interested in). There’s also “ethics as cooperation or contract“. And maybe the two blend into each other.
Then there’s the broader (I label it “existentialist“) sense in which ethics is about “life goals“ or “Why do I get up in the morning?“. And within this broader interpretation of it, you suddenly get narrower subdomains like “realism about rationality“ or “What makes up a person’s self-interest?“ where the connection to the other narrower domains (e.g. “ethics as altruism“) are not always clear.
I think indefinability is a plausible solution (or meta-philosophical framework?) for all of these. But when the scope over which we observe indefinability becomes so broad, it illustrates why it might feel a bit frustrating for some people, because without clearly delineated concepts it can be harder to make progress, and so a framework in which indefinability plays a central role could in some cases obscure conceptual progress in subareas where one might be able to make such progress (at least at the “my personal morality“ level, though not necessarily at the level of a “consensus morality“).
(I’m not sure I’m disagreeing with you BTW; probably I’m just adding thoughts and blowing up the scope of your post.)
I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much—for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs.
I agree. The second part of my comment here tries to talk about this as well.
And even if idealised reflection is a coherent concept, it simply passes the buck to your idealised self, who might then believe my arguments and decide to change their meta-level intuitions.
Yeah. I assume most of us are familiar with a deep sense of uncertainty about whether we found the right approach to ethical deliberation. And one can maybe avoid to feel this uncomfortable feeling of uncertainty by deferring to idealized reflection. But it’s not obvious that this lastingly solves the underlying problem: Maybe we’ll always feel uncertain whenever we enter the mode of “actually making a moral judgment“. If I found myself as a virtual person who is part of a moral reflection procedure such as Paul Christiano’s indirect normativity, I wouldn’t suddenly know and feel confident in how to resolve my uncertainties. And the extra power, and the fact that life in the reflection procedure would be very different from the world I currently know, introduces further risks and difficulties. I think there are still reasons why one might want to value particularly-open-ended moral reflection, but maybe it’s important that people don’t use the uncomfortable feeling of “maybe I’m doing moral philosophy wrong“ as their sole reason to value particularly-open-ended moral reflection. If the reality is that this feeling never goes away, then there seems something wrong with the underlying intuition that valuing particularly-open-ended moral reflection is by default the “safe” or “prudent” thing to do. (And I’m not saying it’s wrong for people value particularly-open-ended moral reflection; I suspect that it depends on one’s higher-order intuitions: For every perspective there’s a place where the buck stops.)
From an anti-realist perspective, I claim that perpetual indefinability would be better.
It prevents fanaticism, which is a big plus. And it plausibly creates more agreement, which is also a plus in some weirder sense (there’s a “non-identity problem” type thing about whether we can harm future agents by setting up the memetic environment such that they’ll end up having less easily satisfiable goals, compared to an alternative where they’d find themselves in larger agreement and therefore with more easily satisfiable goals). A drawback is that it can mask underlying disagreements and maybe harm underdeveloped positions relative to the status quo.
That may be a little more difficult to swallow from a realist perspective, of course. My guess is that the core disagreement is whether moral claims are more like facts, or more like preferences or tastes
That’s a good description. I sometimes use the analogy of “morality is more like career choice than scientific inquiry“.
I don’t think that’s a coincidence: psychologically, humans just aren’t built to be maximisers, and so a true maximiser would be fundamentally adversarial.
This is another good instrumental/pragmatic argument why anti-realists interested in shaping the memetic environment where humans engage in moral philosophy might want to promote the framing of indefinability rather than “many different flavors of consequentialism, and (eventually) we should pick“.
AlphaStar’s innovative league-based training process finds the approaches that are most reliable and least likely to go wrong.
“Go wrong” is still tied to the game’s win condition. So while the league-based training process does find the set of agents whose gameplay is least exploitable (among all the agents they trained), it’s not obvious how this relates to problems in AGI safety such as goal specification or robustness to capability gains. Maybe they’re thinking of things like red teaming. But without more context I’m not sure how safety-relevant this is.
2. The ability to comment on a specific line in a document, with the comment showing up in context.
Yeah, I really like how convenient that is.
For me there’s a huge difference between these two.
In gdocs I feel like it’s more okay to write “unpolished” comments. I think that’s mostly because the expectations are lower. Polishing my comments takes me 3-5x longer, which often takes away the motivation to comment at all.
In a public forum I worry more about provoking misleading impressions. For instance, in a gdoc shared with people who know me well, I’m not worried that a comment like “AIs might do [complex sequence of actions]” will get people to think that I have weirdly confident views about how the future might play out. In public conversations I’d experience a strong urge to qualify statements like that even though it feels tedious to do so.
You need a lot of hindsight bias to say that it was clear from the get go which paradigms were going to win over the last century.
Sure. And I think Kuhn’s main point as summarized by Scott really does give a huge blow to the naive view that you can just compare successful predictions to missed predictions, etc.
But to think that you cannot do better than chance at generating successful new hypotheses is obviously wrong. There would be way too many hypotheses to consider, and not enough scientists to test them. From merely observing science’s success, we can conclude that there has to be some kind of skill (Yudkowksy’s take on this is here and here, among other places) that good scientists employ to do better than chance at picking what to work on. And IMO it’s a strange failure of curiosity to not want to get to the bottom of this when studying Kuhn or the history of science.
When I hear scientists talk about Thomas Kuhn, he sounds very reasonable. [...] When I hear philosophers talk about Thomas Kuhn, he sounds like a madman.
Yes, this! I remember I was extremely confused by the discourse around Kuhn. I’m not sure whether for me the impression was split into scientists vs. non-scientists, but I definitely felt like there was something weird about it and there were too sides to it, one that sounded potentially reasonable, and one that sounded clearly like relativism.
When taking a course on the book, I concluded that both perspectives were appropriate. One thing that went too far into relativism was Kuhn’s insistence that there is no way to tell in advance which paradigm is going to be successful. His description of this is that you pick “teams” initially for all kinds of not-truth-tracking reasons, and you only figure out many years later whether your new paradigm will be winning or not.
But I’m not sure Kuhn even was (at least in The Structure of Scientific Revolutions) explicitly saying “No, you cannot do better than chance at picking sides.” Rather, the weird thing is that I remember feeling like he was not explicitly asking that question, that he was just brushing it under the carpet. Likewise the lecturer of the course, a Kuhn expert, seemed to only be asking the question “How does (human-)science proceed?”, and never “How should science proceed?”
Suppose the agent you’re trying to imitate is itself goal-directed. In order for the imitator to generalize beyond its training distribution, it seemingly has to learn to become goal-directed (i.e., perform the same sort of computations that a goal-directed agent would). I don’t see how else it can predict what the goal-directed agent would do in a novel situation. If the imitator is not able to generalize, then it seems more tool-like than agent-like. On the other hand, if the imitatee is not goal-directed… I guess the agent could imitate humans and be not entirely goal-directed to the extent that humans are not entirely goal-directed. (Is this the point you’re trying to make, or are you saying that an imitation of a goal-directed agent would constitute a non-goal-directed agent?)
I’m not sure these are the points Rohin was trying to make, but there seem to be at least two important points here:
Imitation learning applied to humans produces goal-directed behavior only insofar humans are goal-directed
Imitation learning applied to humans produces agents no more capable than humans. (I think IDA goes beyond this by adding amplification steps, which are separate. And IRL goes beyond this by trying to correct “errors” that the humans make.)
Regarding the second point, there’s a safety-relevant sense in which a human-imitating agent is less goal-directed than the human. Because if you scale the human’s capabilities, the human will become better at achieving its personal objectives. By contrast, if you scale the imitator’s capabilities, it’s only supposed to become even better at imitating the unscaled human.
I believe for some people it’s very important to have a moment of realization that one can get to the frontier of knowledge in a given field of interest. It feels intimidating if others are making contributions that seem decisively out of your league. Because people might intuitively underestimate how far you can get with focused reading and learning, it could be good to give tailored advice to people newer to (e.g.) AI risk for how/where they can make contributions that will feel encouraging. For illustration, a few years ago I was playing a computer game for fun for quite a while until I was by chance matched up with the one of the better competitive players and I almost won against them, getting lucky. That experience showed me that I’d have a shot if I actually tried, and it encouraged me to immediately start practicing with the aim of becoming competitive at that game. It changed my mindset over night. Similarly, I think there’s a difference in mindset between “reading and talking about research topics for fun” and “reading and talking about research topics with the intent of seriously contributing”.
I agree with others that a rewarding social environment and people in a similar range of competence you can bounce ideas back-and-forth with are extremely important. If you collaborate with people who are similarly driven to figure things out and discuss ideas with you, that automatically forces you think about your ideas for much longer and in more detail. By yourself you might stop thinking about a topic once you reach a roadblock, but if every morning you wake up to new messages by a collaborator adding criticism or new bits to your thinking, you’re going to keep working on the topic.
I also suspect that people are sometimes too modest (or in the wrong mindset) to develop the habit of “taking stances”. Some people know about a lot of different considerations and can tell you in detail what others have written, but they don’t invest effort coming up with their own opinion – presumably because they don’t consider themselves to be experts. Some of the community norms about not being overconfident might contribute to this failure mode, but the two things are distinct because people can try practicing taking stances with personal “pre-Aumann opinions”, which they are free to largely ignore when deferring to the experts for an all-things-considered judgment.
Speculation about personality traits conducive to generating ideas: OCD was mentioned in the comments. There’s also OCPD and hyperfocus. Carl Shulman’s advice for researchers among other things mentions something about having a strong emotional reaction to people being wrong on the internet (in communities you care about) – I think this might be a symptom of being very invested in the ideas, and it can help further clarify one’s thinking while trying to articulate fervently why something is wrong. Need for closure also seems relevant to me. It has its dangers because it can lead to one-sided thinking. But in me at least I’m often driven by feeling deeply unsatisfied with not having answers to questions that seem strategically important. And, anecdotally, I know some people with low need for closure who I consider to be phenomenal researchers in most important respects, but these people are less creative than I would be with their skills and backgrounds, and their obsessive focus maybe goes into greater width of research rather than zooming in on making progress on the “construction sites”. Finally, I strongly agree with John Maxwell’s point that a “temporary delusion” for thinking that one’s ideas are really good is a great reinforcement mechanism (even though it often leads to embarrassment later on).
I interpreted Wei’s comment as saying that even your reflective life goals would be underdetermined—presumably even now if you hear convincing moral argument A but not B, then you’d have different reflective life goals than if you hear B but not A.
Okay yeah, that also seems broadly correct to me.
I am hoping though that, as long as I’m not subjected to optimization pressures from outside that weren’t crafted to be helpful, it’s very rare that something I’d currently consider very important can end up either staying important or becoming completely unimportant merely based on order of new arguments encountered. And similarly I’m hoping that my value endpoints would still cluster decisively around the things I currently consider most important, – though that’s where it becomes tricky to trade off goal preservation versus openness for philosophical progress.
Thanks! I think I understand the intent of the rephrasing now.
What I meant with “obscure” is that both “true utility function” and “utility function that encodes the optimal actions to take for the best possible universe” have normative terminology in them that I don’t know how to reduce or operationalize.
For instance, imagine I am looking at action sequences and ranking them. Presumably large portions of that process would feel like difficult judgment calls where I’d feel nervous about still making some kind of mistake. Both your phrasings (to my ears) carry the connotation that there is a “best” mistake model, one which is in a relevant sense independent from our own judgment, where we can learn things that will make us more and more confident that now we’re probably not making mistakes anymore because of progress in finding the correct way of thinking about our values. That’s the part that feels obscure to me because I think we’ll always be in this unsatisfying epistemic situation where we’re nervous about making some kind of mistake by the light of a standard that we cannot properly describe.
I do get the intuition for thinking in these terms, though. It feels conceivable that another discovery similar to what cognitive biases did could improve our thinking, and I definitely agree that we want a concept for staying open to this possibility. I’m just pointing out that non-operationalized normative concepts seem obscure. (Though maybe that’s fine if we’re treating them in the same way Yudkowsky treats “magic reality fluid” – as a placeholder for whatever comes once we’re less confused about “measure”.)