Rohin Shah comments on Reviewing the Review

Rohin Shah 27 Feb 2020 0:07 UTC
2 points
0
I’m not actually sure I understood the intended point here, wondering if you could rephrase it in somewhat different words.
There are two pretty different approaches to AI safety, which I could uncharitably call MIRI-rationalist vs. everyone else. (I don’t have an accurate charitable name for the difference.) I claim that AF sees mostly just the former perspective. See this comment thread. (Standard disclaimers about “actually this is a spectrum and there are lots of other features that people disagree on”, the point is that this is an important higher-order bit.)
I think that for both sides:
- Their work is plausibly useful
- They don’t have a good model of why the other side’s work is useful
- They don’t expect the other side’s work to be useful on their own models
Given this, I expect that ratings by one side of the other side’s work will not have much correlation with which work is actually useful.
So, such a rating seems to have not much upside, and does have downside, in that non-experts who look at these ratings and believe them will get wrong beliefs about which work is useful.
(I already see people interested in working on CHAI-style stuff who say things that MIRI-rationalist viewpoint says where my internal response is something like “I wish you hadn’t internalized these ideas before coming here”.)
I expect my concerns about “sort by prestige” to be much worse
I agree with this but it’s not my main worry.
The far end of a spectrum would be to not host the review on Alignment Forum, instead creating a new body that’s specifically aiming to be representative of various subfields and paradigms of people who are working on something reasonably called “AI alignment”, and get each of their opinions.
This would be good if it could be done; I’d support it (assuming that you actually get a representative body). I think this is hard, but that doesn’t mean it’s impossible / not worth doing, and I’d want a lot of the effort to be in ensuring that you get a representative body.
A recent observation someone made in person is that the AF is filtered for people who like to comment on the internet, which isn’t the same filter as “people who like Agent Foundations”, but there is some history that sort of conflates that. And meanwhile researchers elsewhere may not want to get dragged into internet discussions in the first place.
I don’t think this is the main selection effect to worry about.
Okay, AF has some kind of opinion on what paradigms are good. That’s supposed to be a relatively broad consensus that people at CHAI, OpenAI, MIRI and at least some people from deepmind.
It’s not a broad consensus. CHAI has ~10 grad students + a few professors, research engineers, and undergrads; only Daniel Filan and I could reasonably said to be part of AF. OpenAI has a pretty big safety team (>10 probably); only Paul Christiano could reasonably be said to be part of AF. Similarly for DeepMind, where only Richard Ngo would count.
But, the optimal version of itself is still somewhat opinionated, compared to the broader landscape.
Seems right; we just seem very far from this version.
And in that case, yes, it’d be evaluating things on different metrics than people wanted for themselves. But… that seems fine? It’s an important part of science as an institution that people get to evaluate you based on different things than you might have wanted to be evaluated on.
Agreed for the optimal version.
Conferences you submit papers to can reject you, journals might be aiming to focus on particular subfields and maybe you think your thing is relevant to their subfield but they don’t.
I’d be pretty worried if a bunch of biology researchers had to decide which physics papers should be published. (This exaggerates the problem, but I think it does qualitatively describe the problem.)
Most nonprofits aren’t trying to optimize for Givewell’s goals, but it was good that Givewell set up a system of evaluating that said “if you care about goal X, we think these are the best nonprofits, here’s why.”
Nonprofits should be accountable to their donors. X-risk research should be accountable to reality. You might think that accountability to an AF review would be a good proxy for this, but I think it is not.
(You might find it controversial to claim that nonprofits should be accountable to donors, in which case I’d ask why it is good for GiveWell to set up such a system of evaluation. Though this is not very cruxy for me so maybe just ignore it.)
- Vanessa Kosoy 27 Feb 2020 7:12 UTC
  24 points
  0
  Parent
  
  I don’t want this. There’s a field of alignment outside of the community that uses the Alignment Forum, with very different ideas about how progress is made; it seems bad to have an evaluation of work they produce according to metrics that they don’t endorse.
  
  This seems like a very strange claim to me. If the proponents of the MIRI-rationalist view think that (say) a paper by DeepMind has valuable insights from the perspective of the MIRI-rationalist paradigm, and should be featured in “best [according to MIRI-rationalists] of AI alignment work in 2018”, how is it bad? On the contrary, it is very valuable the the MIRI-rationalist community is able to draw each other’s attention to this important paper.
  
  So, such a rating seems to have not much upside, and does have downside, in that non-experts who look at these ratings and believe them will get wrong beliefs about which work is useful.
  
  Anything anyone says publicly can be read by a non-expert, and if something wrong was said, and the non-expert believes it, then the non-expert gets wrong beliefs. This is a general problem with non-experts, and I don’t see how is it worse here. Of course if the MIRI-rationalist viewpoint is true then the resulting beliefs will not be wrong at all. But this just brings us back to the object-level question.
  
  (I already see people interested in working on CHAI-style stuff who say things that MIRI-rationalist viewpoint says where my internal response is something like “I wish you hadn’t internalized these ideas before coming here”.)
  
  So, not only is the MIRI-rationalist viewpoint wrong, it is so wrong that it irreversibly poisons the mind of anyone exposed to it? Isn’t it a good idea to let people evaluate ideas on their own merits? If someone endorses a wrong idea, shouldn’t you be able to convince em by presenting counterarguments? If you cannot present counterarguments, how are you so sure the idea is actually wrong? If the person in question cannot understand the counterargument, doesn’t it make em much less valuable for your style of work anyway? Finally, if you actually believe this, doesn’t it undermine the entire principle of AI debate? ;)
  - Rohin Shah 27 Feb 2020 17:29 UTC
    5 points
    0
    Parent
    If the proponents of the MIRI-rationalist view think that (say) a paper by DeepMind has valuable insights from the perspective of the MIRI-rationalist paradigm, and should be featured in “best [according to MIRI-rationalists] of AI alignment work in 2018”
    That seems mostly fine and good to me, but I predict it mostly won’t happen (which is why I said “They don’t expect the other side’s work to be useful on their own models”). I think you still have the “poisoning” problem as you call it, but I’m much less worried about it.
    I’m more worried about the rankings and reviews, which have a much stronger “poisoning” problem.
    Anything anyone says publicly can be read by a non-expert, and if something wrong was said, and the non-expert believes it, then the non-expert gets wrong beliefs. This is a general problem with non-experts, and I don’t see how is it worse here.
    Many more people are likely to read the results of a review, relative to arguments in the comments of a linkpost to a paper.
    Calling something a “review”, with a clear process for generating a ranking, grants it much more legitimacy that one person saying something on the Internet.
    So, not only is the MIRI-rationalist viewpoint wrong, it is so wrong that it irreversibly poisons the mind of anyone exposed to it?
    Not irreversibly.
    Isn’t it a good idea to let people evaluate ideas on their own merits?
    When presented with the strongest arguments for both sides, yes. Empirically that doesn’t happen.
    If someone endorses a wrong idea, shouldn’t you be able to convince em by presenting counterarguments?
    I sometimes can and have. However, I don’t have infinite time. (You think I endorse wrong ideas. Why haven’t you been able to convince me by presenting counterarguments?)
    Also, for non-experts this is not necessarily true (or is true only in some vacuous sense). If a non-expert sees within a community of experts 50 people arguing for A, and 1 person arguing for not-A, even if they find the arguments for not-A compelling, in most cases they should still put high credence on A.
    (The vacuous sense in which it’s true is that the non-expert could become an expert by spending hundreds or thousands of hours becoming an expert, in which case they can evaluate the arguments on their own merits.)
    If you cannot present counterarguments, how are you so sure the idea is actually wrong?
    I in fact can present counterarguments, it just takes a long time.
    If the person in question cannot understand the counterargument, doesn’t it make em much less valuable for your style of work anyway?
    Empirically, it seems that humans have very “sticky” worldviews, such that whichever worldview they first inhabit, it’s very unlikely that they switch to the other worldview. So depending on what you mean by “understand”, I could have two responses:
    They “could have” understood (and generated themselves) the counterargument if they had started out in the opposite worldview
    No one currently in the field is able to “understand” the arguments of the other side, so it’s not a sign of incompetence if a new person cannot “understand” such an argument
    Obviously ideal Bayesians wouldn’t have “sticky” worldviews; it turns out humans aren’t ideal Bayesians.
    Finally, if you actually believe this, doesn’t it undermine the entire principle of AI debate?
    If you mean debate as a proposal for AI alignment, you might hope that we can create AI systems that are closer to ideal Bayesian reasoners than we are, or you might hope that humans who think for a very long time are closer to ideal Bayesian reasoners. Either way, I agree this is a problem that would have to be dealt with.
    If you mean debate as in “through debate, AI alignment researchers will have better beliefs”, then yes, it does undermine this principle. (You might have noticed that not many alignment researchers try to do this sort of debate.)
- Raemon 27 Feb 2020 0:50 UTC
  4 points
  0
  Parent
  A lot of those concerns seem valid. I recalled the earlier comment thread and had it in mind while I was writing the response comment. (I agree that “viewpoint X” is a thing, and I don’t even think it’s that uncharitable to call it the MIRI/rationalist viewpoint, although it’s simplified)
  Fwiw, while I prefer option #3 (I just added #s to the options for easier reference), #2 and #4 both seem pretty fine. And whichever option one went with, getting representative members seems like an important thing to put a lot of effort into.
  My current sense is that AF was aiming to be a place where people-other-than-Paul at OpenAI would feel comfortable participating. I can imagine it turns out “AF already failed to be this sufficiently that if you want that, you need to start over,” but it is moderately expensive to start over. I would agree that this would require a lot of work, but seems potentially quite important and worthwhile.
  What are the failure modes you imagine, and/or how much harder do you think it is, to host the review on AF, while aiming for a broader base of participants that AF currently feels oriented towards? (As compared to the “try for a broad base of participants and host it somewhere other than AF”)
  - Raemon 27 Feb 2020 1:08 UTC
    2 points
    0
    Parent
    Random other things I thought about:
    I can definitely imagine “it turns out to get people involved you need more anonymity/plausible-deniability than a public forum affords”, so starting from a different vantage point is better.
    One of the options someone proposed was “CHAI, MIRI, OpenAI and Deepmind [potentially other orgs] are each sort of treated as an entity in a parliament, with N vote-weight each. It’s up to them how they distribute that vote weight among their internal teams.” I think I’d weakly prefer “actually you just really try to get more people from each team to participate, so you end up with information from 20 individuals rather than 4 opaque orgs”, but I can imagine a few reasons why the former is more practical (with the plausible deniability being a feature/bug combo)
    - Rohin Shah 27 Feb 2020 16:59 UTC
      2 points
      0
      Parent
      My current sense is that AF was aiming to be a place where people-other-than-Paul at OpenAI would feel comfortable participating.
      Agreed that that was the goal; I’m arguing that it has failed at this. (Or, well, maybe they’d be comfortable participating, but they don’t see the value in participating.)
      What are the failure modes you imagine, and/or how much harder do you think it is, to host the review on AF, while aiming for a broader base of participants that AF currently feels oriented towards?
      Mainly I think it would be really hard to get that broader base of participants. I imagine trying to convince specific people (not going to name names) that they should be participating, and the only argument that I think might be convincing to them would be “if we don’t participate, then our work will be evaluated by MIRI-rationalist standards, and future entrants to the field will forever misunderstand our work in the same way that people forever misunderstand CIRL”. It seems pretty bad to rely on that argument.
      I think you might be underestimating how different these two groups are. Like, it’s not just that they work on different things, they also have different opinions on the best ways to publish, what should count as good work, the value of theoretical vs. conceptual vs. empirical work, etc. Certainly most are glad that the other exists in the sense that they think it is better than nothing (but not everyone meets even this low bar), but beyond that there’s not much agreement on anything. I expect the default reaction to be “this review isn’t worth my time”.
      I can definitely imagine “it turns out to get people involved you need more anonymity/plausible-deniability than a public forum affords”, so starting from a different vantage point is better.
      As above, I expect the default reaction to be “this review isn’t worth my time”, rather than something like “I need plausible deniability to evaluate other people’s work”.
      One of the options someone proposed was “CHAI, MIRI, OpenAI and Deepmind [potentially other orgs] are each sort of treated as an entity in a parliament, with N vote-weight each. It’s up to them how they distribute that vote weight among their internal teams.”
      This sort of mechanism doesn’t address the “review isn’t worth my time” problem. It would probably give you a more unbiased estimate of what the “field” thinks, but only because e.g. Richard and I would get a very large vote weight. (And even that isn’t unbiased—Richard and I are much closer to the MIRI-rationalist viewpoint than the average for our orgs.)
- Raemon 27 Feb 2020 0:59 UTC
  2 points
  0
  Parent
  On the Givewell example:
  Some noteworthy things about Givewell is that it’s not really trying to make all nonprofits accountable to donors (since most nonprofits aren’t even ranked). It’s trying to answer a particular question, for a subset of the donor population.
  By contrast, something like CharityNavigator is aiming to cover a broad swath of nonprofits and is more implicitly claiming that all nonprofits should be more accountable-on-average than they currently are.
  It’s also noteworthy that Givewell’s paradigm is distinct from the general claims of “nonprofits should be accountable”, or utilitarianism, or other EA frameworks. Givewell is doing one fairly specific thing, which is different from what CharityNavigator or OpenPhil are doing.
  I do think CharityNavigator is an important and perhaps relevant example since they’re optimizing a metric that I think is wrong. I think it’s probably still at least somewhat good that CharityNavigator exists, since it moves the overall conversation of “we should be trying to evaluate nonprofits” forward, and creating more transparency than there used to be. I could be persuaded that CharityNavigator was net-negative though.
  I’d be pretty worried if a bunch of biology researchers had to decide which physics papers should be published. (This exaggerates the problem, but I think it does qualitatively describe the problem.)
  There’s a pretty big distinction between “decide which papers get published.” If some biologists started a journal that dealt with physics (because they thought they had some reason to believe they had a unique and valuable take on Physics And Biology) that might be weird, perhaps bad. But, it wouldn’t be “decide what physics things get published.” It’d be “some biologists start a weird Physics Journal with it’s own kinda weird submission criteria.”
  (I think that might potentially be bad, from a “affecting signal/noise ratio” axis, but also I don’t think the metaphor is that good – the only reason it feels potentially bad is because of the huge disconnect between physics and biology, and and “biologists start a journal about some facet of biology that intersects with some other field that’s actually plausibly relevant to biology” feels fine)
  - Rohin Shah 27 Feb 2020 17:06 UTC
    2 points
    0
    Parent
    If some biologists started a journal that dealt with physics (because they thought they had some reason to believe they had a unique and valuable take on Physics And Biology) that might be weird, perhaps bad. But, it wouldn’t be “decide what physics things get published.” It’d be “some biologists start a weird Physics Journal with it’s own kinda weird submission criteria.”
    I in fact meant “decide what physics things get published”; in this counterfactual every physics journal / conference sends their submissions to biologists for peer review and a decision on whether it should be published. I think that is more correctly pointing at the problems I am worried about than “some biologists start a new physics journal”.
    Like, it is not the case that there already exists a public evaluation mechanism for work coming out of CHAI / OpenAI / DeepMind. (I guess you could look at whether the papers they produce are published in some top conference, but this isn’t something OpenAI and DeepMind try very hard to do, and in any case that’s a pretty bad evaluation mechanism because it’s evaluating by the standards of the regular AI field, not the standards of AI safety.) So creating a public evaluation mechanism when none exists is automatically going to get some of the legitimacy, at least for non-experts.