Conversation about paradigms, intellectual progress, social consensus, and AI

  • I have a number of thoughts here that I haven’t gotten around to writing up, and sharing via a public conversation seems like a decent way to try.

    A few opening thoughts:

    • People talk about the field of AI Alignment being “pre-paradigmatic”. I don’t have a sense that people have shared precise models behind this, or at least I don’t know what other people mean when they say it. Most people don’t seem to have read Kuhn for example.

    • I used to think the goal was to “become paradigmatic”. I now think a better goal is to become the kind of field that can go through cycles of paradigm creation, paradigm crisis, and then recreation of a new paradigm.

    • I think there are ways to increase our ability to do that, and I’d like to have people thinking more about them.

    Happy to launch off in any direction that seems interesting or productive to you.

    • I used to think the goal was to “become paradigmatic”. I now think a better goal is to become the kind of field that can go through cycles of paradigm creation, paradigm crisis, and then recreation of a new paradigm.

    Is there a way in which trying to become paradigmatic doesn’t implicitly lead to that cycle? After the first paradigm, it seems to me that later paradigms come about more naturally because of insufficiencies in the original paradigm, i.e. someone discovers an edge case that isn’t handled correctly, or even just less gracefully than seems intuitively obvious.

  • I’m not sure what’s empirically the case. I could see it being easier to go from one paradigm to the next because of clear insufficiencies, but also if there’s an existing paradigm, people might be more resistant to changing than if they were less anchored on anything.

    The update for me had a couple of pieces. Before I at least implicitly thought that most of the work happens in the paradigmatic phase, so to be productive, we should get in it. (I think it appears way more productive and can be scaled up more). Now I think that getting the paradigm right is at least as much of the work, and you want to get it right.

    The other piece of it is I think both pre-paradigmatic and paradigmatic work both rely on a community reaching consensus about what counts as progress in their field, so the better thing to do than worry about “phase” is worry about how good you are at reaching consensus with each other (and reality).

  • if there’s an existing paradigm, people might be more resistant to changing than if they were less anchored on anything

    This seems true historically, but I think you’re probably more interested in the question of how to steer existing (or new) research efforts. Therefore,

    Now I think that getting the paradigm right is at least as much of the work, and you want to get it right.

    Because if you don’t get it right initially, you risk wasting time stuck in a bad paradigm? I think I am pretty skeptical of our ability to do any sort of meaningful aiming here, at least in a top-down way. Apart from the underlying reality they’re attempting to model, paradigms seem driven by:

    • individual decisionmaking—if a paradigm comes from a single person’s research agenda, that person’s decision to do pursue that agenda rather than some other one might be a point of possible intervention

    • social dynamics—to the extent that paradigms capture mindshare among the research community, whatever caused that paradigm to propagate (journals, conferences, social networks, memes, social & institutional hierarchies, etc) could also be points of intervention

    But as far as interventions go, those don’t suggest that we have a lot of direct control over what paradigms we land on.

  • Because if you don’t get it right initially, you risk wasting time stuck in a bad paradigm?

    Yeah. You might eventually get out of it, but with AI Alignment I think we don’t want to be wasting time unnecessarily.

    Before I say more, describe what you mean by we can’t do “any sort of meaningful aiming”?

  • Whatever thing you had in mind that would let us pre-emptively avoid getting stuck in a bad paradigm; I don’t actually know what that would look like (it’s possible you didn’t mean to imply anything like that, but in that case I’m confused about what it would mean to “get it right”).

  • I think we’re on very similar pages here actually both difficulty of doing it top down, and social dynamics being a point of intervention.

    I model it like this: you can think of a research field as a “collective mind that thinks and makes intellectual progress” that is made out of individual researchers. You can make the collective smarter by making the individuals smarter, but you can also make the collective smarter by combining individuals of fixed intelligence in more effective ways.

    Journals, conferences, social networks, memes, forums, etc. are ways the individuals get connected together, and by improving those connections/​infra, you can make for a smarter collective. I’ve been thinking in this frame for many years, but recently had a breakthrough.

    Level 0 thinking about collective intellectual progress:
    You think about helping information propagate. You get stuff written down, archived, distilled, tagged, made searchable, you get people in the same room.

    Level 1 thinking about collective intellectual progress:
    For individual thinkers to combine as a collective intelligence, they need to end up pointing in enough of the same direction to be doing the same thing such that their efforts stack. Pointing in the same direction involves a lot of agreeing on “this is progress” vs “this is not progress”. (There is also an object-level of “and whatever you agree is progress points in the direction of reality itself”)

    Being paradigmatic means you’ve achieved a lot of agreement on what is progress, but what matters is being a collective where you can more efficiently get in agreement with each other (and with reality), e.g. a good process of communally evaluating things. I think that happens via journals, conferences, etc., and we can more explicitly aim to help people not just share info, but evaluate it too.

  • (There is also an object-level of “and whatever you agree is progress points in the direction of reality itself”)

    +1

    Being paradigmatic means you’ve achieved a lot of agreement on what is progress, but what matters is being a collective where you can more efficiently get in agreement with each other (and with reality), e.g. a good process of communally evaluating things. I think that happens via journals, conferences, etc., and we can more explicitly aim to help people not just share info, but evaluate it too.

    Can you go into more detail about the mechanisms you think might be promising here? In the past I’ve been skeptical of various “eigen-evaluation” proposals, but you might have something different in mind.

  • For people reading this, “eigen-evaluation” is a term I coined to describe the way in which research communities decide on which research/​researcher is good, i.e. how do people come to agree that “X is a good researcher”, especially in AI Alignment where no one has ever successfully built an an aligned superintelligence.

    I think happens a social process with the same mechanism as EigenKarma, PageRank, and Eigenmorality:

    ...you can imagine an iterative process where each web page researcher starts out with the same hub/​authority “starting credits,” but then in each round, the pages researchers distribute their credits among their neighbors, so that the most popular pages researcher get more credits, which they can then, in turn, distribute to their neighbors by linking to praising them – Eigenmorality

    For lack of a better name, I call the research process “eigen-evaluation” (of research/​researchers).


  • Can you go into more detail about the mechanisms you think might be promising here?

    Meta: I don’t think many people are thinking explicitly about how this happens right now. Most researchers are focused on the object-level of their research. And I think there are often gains to be had if you’re explicitly and consciously tacking a problem.

    Object-level:

    Improving existing stuff

    • Improve LessWrong’s karma system, e.g. altering strong-vote strength, fixing how lots of low quality engagement still lets you be high karma user, maybe switch to eigenkarma

    • Make the comments section a better experience for top users to show up and evaluate research, e.g. prevent threads getting derailed by users who “don’t get it”

    • Improvements to LessWrong’s Annual Review

    In-progress stuff

    • Having a Dialog Feature (like this one) that facilitates two people surfacing actually-held arguments in various directions (more so than happens if they write posts alone)

    Possibly upcoming

    • Oli’s “review token system” where people are allocated tokens that they can spend of having a post having a 10-15 hour deep-dive review performed on it.

    • A monthly or quarterly journal for AI Alignment research where some editor(s) + process selects research for greater promotion to attention and significance.

    • Aiming for “four levels of conversation” on topics where you get argument, counter-argument, counter-counter-argument, and counter-counter-counter-argument.

    • Distillation of arguments and positions (e.g. Alignment argument wiki) that makes it easy to find arguments and make the case against them.

    I also think there’s a thing here that in the absence of sufficient empirical pressure (e.g. building literally stuff that does literally things), evaluation of which Alignment research is good will be distorted a lot by standard human political popularity pressures. Pushing things in the direction of explicit evaluation via arguments might help with that.

  • Improving existing stuff

    • Improve LessWrong’s karma system, e.g. altering strong-vote strength, fixing how lots of low quality engagement still lets you be high karma user, maybe switch to eigenkarma

    This is level 0, right?

    • Make the comments section a better experience for top users to show up and evaluate research, e.g. prevent threads getting derailed by users who “don’t get it”

    • Improvements to LessWrong’s Annual Review

    (not sufficiently concrete)

    In-progress stuff

    • Having a Dialog Feature (like this one) that facilitates two people surfacing actually-held arguments in various directions (more so than happens if they write posts alone)

    Also seems to be level 0.

    Possibly upcoming

    • Oli’s “review token system” where people are allocated tokens that they can spend of having a post having a 10-15 hour deep-dive review performed on it.

    Also level 0.

    • A monthly or quarterly journal for AI Alignment research where some editor(s) + process selects research for greater promotion to attention and significance.

    Level 1! Also the one I’m most skeptical of. Coincidence? 🤔

    • Aiming for “four levels of conversation” on topics where you get argument, counter-argument, counter-counter-argument, and counter-counter-counter-argument.

    • Distillation of arguments and positions (e.g. Alignment argument wiki) that makes it easy to find arguments and make the case against them.

    Level 0, probably.

    I think my takeaway is that I’m pretty wary of a process which is deliberately aimed at providing some kind of legible metric/​ranking/​evaluation, where that metric is explicitly attempting to aggregate information with the Social type signature. I expect that kind of process to produce worse outcomes than not having any such process at all, because it will make it much cheaper for people to goodhart on it, and also provoke a bunch of bandwagoning which might’ve not happened if people needed to “make up their own mind” and perform their own aggregation step, ideally while being in touch with reality (rather than social consensus).

  • I think I didn’t convey well what I meant by Level 0 and Level 1.

    Level 0 is a model whereby progress happens because you accumulate model/​evidence/​ideas. New stuff builds on old stuff.

    Level 1 realizes that progress requires evaluating content. New content filters on old content. This is all about responses and all about content that’s evaluative of other content. Voting (karma) is evaluative. Comments can build on content, but are often evaluative. Dialogue where I share an idea and you critique is evaluative – much more so than if I just wrote a post.

  • Voting (karma) is evaluative

    But importantly, it’s an evaluation that’s effectively decoupled from object-level reality, right? The claim I’m making is that we should be extremely careful about actively promoting signals that are further removed from reality, rather than signals that strive to be as close to reality as possible.

  • I think I only have a low resolution sense of what your meaning. Can you give me some examples/​sketch the spectrum of “closely coupled to reality vs signals that strive to be close to reality as possible”?

    But in general I’d say it’s important to realize we (and anyone doing any inquiry in any domain) have no direct coupling to reality as far as collaborative intellectual progress goes. It’s all socially mediated.