So, the point of my comments was to draw a contrast between having a low opinion of “experimental work and not doing only decision theory and logic”, and having a low opinion of “mainstream ML alignment work, and of nearly all work outside the HRAD-ish cluster of decision theory, logic, etc.” I didn’t intend to say that the latter is obviously-wrong; my goal was just to point out how different those two claims are, and say that the difference actually matters, and that this kind of hyperbole (especially when it never gets acknowledged later as ‘oh yeah, that’s not true and wasn’t what I meant’) is not great for discussion.
It occurs to me that part of the problem may be preciselythat Adam et al. don’t think there’s a large difference between these two claims (that actually matters). For example, when I query my (rough, coarse-grained) model of [your typical prosaic alignment optimist], the model in question responds to your statement with something along these lines:
If you remove “mainstream ML alignment work, and nearly all work outside of the HRAD-ish cluster of decision theory, logic, etc.” from “experimental work”, what’s left? Perhaps there are one or two (non-mainstream, barely-pursued) branches of “experimental work” that MIRI endorses and that I’m not aware of—but even if so, that doesn’t seem to me to be sufficient to justify the idea of a large qualitative difference between these two categories.
In a similar vein to the above: perhaps one description is (slightly) hyperbolic and the other isn’t. But I don’t think replacing the hyperbolic version with the non-hyperbolic version would substantially change my assessment of MIRI’s stance; the disagreement feels non-cruxy to me. In light of this, I’m not particularly bothered by either description, and it’s hard for me to understand why you view it as such an important distinction.
Moreover: I don’t think [my model of] the prosaic alignment optimist is being stupid here. I think, to the extent that his words miss an important distinction, it is because that distinction is missing from his very thoughts and framing, not because he happened to use choose his words somewhat carelessly when attempting to describe the situation. Insofar as this is true, I expect him to react to your highlighting of this distinction with (mostly) bemusement, confusion, and possibly even some slight suspicion (e.g. that you’re trying to muddy the waters with irrelevant nitpicking).
To be clear: I don’t think you’re attempting to muddy the waters with irrelevant nitpicking here. I think you think the distinction in question is important because it’s pointing to something real, true, and pertinent—but I also think you’re underestimating how non-obvious this is to people who (A) don’t already deeply understand MIRI’s view, and (B) aren’t in the habit of searching for ways someone’s seemingly pointless statement might actually be right.
I don’t consider myself someone who deeply understands MIRI’s view. But I do want to think of myself as someone who, when confronted with a puzzling statement [from someone whose intellectual prowess I generally respect], searches for ways their statement might be right. So, here is my attempt at describing the real crux behind this disagreement:
(with the caveat that, as always, this is my view, not Rob’s, MIRI’s, or anybody else’s)
(and with the additional caveat that, even if my read of the situation turns out to be correct, I think in general the onus is on MIRI to make sure they are understood correctly, rather than on outsiders to try to interpret them—at least, assuming that MIRI wants to make sure they’re understood correctly, which may not always be the best use of researcher time)
I think the disagreement is mostly about MIRI’s counterfactual behavior, not about their actual behavior. I think most observers (including both Adam and Rob) would agree that MIRI leadership has been largely unenthusiastic about a large class of research that currently falls under the umbrella “experimental work”, and that the amount of work in this class MIRI has been unenthused about significantly outweighs the amount of work they have been excited about.
Where I think Adam and Rob diverge is in their respective models of the generator of this observed behavior. I think Adam (and those who agree with him) thinks that the true boundary of the category [stuff MIRI finds unpromising] roughly coincides with the boundary of the category [stuff most researchers would call “experimental work”], such that anything that comes too close to “running ML experiments and seeing what happens” will be met with an immediate dismissal from MIRI. In other words, [my model of] Adam thinks MIRI’s generator is configured such that the ratio of “experimental work” they find promising-to-unpromising would be roughly the same across many possible counterfactual worlds, even if each of those worlds is doing “experiments” investigating substantially different hypotheses.
Conversely, I think Rob thinks the true boundary of the category [stuff MIRI finds unpromising] is mostly unrelated to the boundary of the category [stuff most researchers would call “experimental work”], and that—to the extent MIRI finds most existing “experimental work” unpromising—this is mostly because the existing work is not oriented along directions MIRI finds promising. In other words, [my model of] Rob thinks MIRI’s generator is configured such that the ratio of “experimental work” they find promising-to-unpromising would vary significantly across counterfactual worlds where researchers investigate different hypotheses; in particular, [my model of] Rob thinks MIRI would find most “experimental work” highly promising in the world where the “experiments” being run are those whose results Eliezer/Nate/etc. would consider difficult to predict in advance, and therefore convey useful information regarding the shape of the alignment problem.
I think Rob’s insistence on maintaining the distinction between having a low opinion of “experimental work and not doing only decision theory and logic”, and having a low opinion of “mainstream ML alignment work, and of nearly all work outside the HRAD-ish cluster of decision theory, logic, etc.” is in fact an attempt to gesture at the underlying distinction outlined above, and I think that his stringency on this matter makes significantly more sense in light of this. (Though, once again, I note that I could be completely mistaken in everything I just wrote.)
Assuming, however, that I’m (mostly) not mistaken, I think there’s an obvious way forward in terms of resolving the disagreement: try to convey the underlying generators of MIRI’s worldview. In other words, do the thing you were going to do anyway, and save the discussions about word choice for afterwards.
I also think I naturally interpreted the terms in Adam’s comment as pointing to specific clusters of work in today’s world, rather than universal claims about all work that could ever be done. That is, when I see “experimental work and not doing only decision theory and logic”, I automatically think of “experimental work” as pointing to a specific cluster of work that exists in today’s world (which we might call mainstream ML alignment), rather than “any information you can get by running code”. Whereas it seems you interpreted it as something closer to “MIRI thinks there isn’t any information to get by running code”.
My brain insists that my interpretation is the obvious one and is confused how anyone (within the AI alignment field, who knows about the work that is being done) could interpret it as the latter. (Although the existence of non-public experimental work that isn’t mainstream ML is a good candidate for how you would start to interpret “experimental work” as the latter.) But this seems very plausibly a typical mind fallacy.
EDIT: Also, to explicitly say it, sorry for misunderstanding what you were trying to say. I did in fact read your comments as saying “no, MIRI is not categorically against mainstream ML work, and MIRI is not only working on HRAD-ish stuff like decision theory and logic, and furthermore this should be pretty obvious to outside observers”, and now I realize that is not what you were saying.
This is a good comment! I also agree that it’s mostly on MIRI to try to explain its views, not on others to do painstaking exegesis. If I don’t have a ready-on-hand link that clearly articulates the thing I’m trying to say, then it’s not surprising if others don’t have it in their model.
And based on these comments, I update that there’s probably more disagreement-about-MIRI than I was thinking, and less (though still a decent amount of) hyperbole/etc. If so, sorry about jumping to conclusions, Adam!
It occurs to me that part of the problem may be precisely that Adam et al. don’t think there’s a large difference between these two claims (that actually matters). For example, when I query my (rough, coarse-grained) model of [your typical prosaic alignment optimist], the model in question responds to your statement with something along these lines:
Moreover: I don’t think [my model of] the prosaic alignment optimist is being stupid here. I think, to the extent that his words miss an important distinction, it is because that distinction is missing from his very thoughts and framing, not because he happened to use choose his words somewhat carelessly when attempting to describe the situation. Insofar as this is true, I expect him to react to your highlighting of this distinction with (mostly) bemusement, confusion, and possibly even some slight suspicion (e.g. that you’re trying to muddy the waters with irrelevant nitpicking).
To be clear: I don’t think you’re attempting to muddy the waters with irrelevant nitpicking here. I think you think the distinction in question is important because it’s pointing to something real, true, and pertinent—but I also think you’re underestimating how non-obvious this is to people who (A) don’t already deeply understand MIRI’s view, and (B) aren’t in the habit of searching for ways someone’s seemingly pointless statement might actually be right.
I don’t consider myself someone who deeply understands MIRI’s view. But I do want to think of myself as someone who, when confronted with a puzzling statement [from someone whose intellectual prowess I generally respect], searches for ways their statement might be right. So, here is my attempt at describing the real crux behind this disagreement:
(with the caveat that, as always, this is my view, not Rob’s, MIRI’s, or anybody else’s)
(and with the additional caveat that, even if my read of the situation turns out to be correct, I think in general the onus is on MIRI to make sure they are understood correctly, rather than on outsiders to try to interpret them—at least, assuming that MIRI wants to make sure they’re understood correctly, which may not always be the best use of researcher time)
I think the disagreement is mostly about MIRI’s counterfactual behavior, not about their actual behavior. I think most observers (including both Adam and Rob) would agree that MIRI leadership has been largely unenthusiastic about a large class of research that currently falls under the umbrella “experimental work”, and that the amount of work in this class MIRI has been unenthused about significantly outweighs the amount of work they have been excited about.
Where I think Adam and Rob diverge is in their respective models of the generator of this observed behavior. I think Adam (and those who agree with him) thinks that the true boundary of the category [stuff MIRI finds unpromising] roughly coincides with the boundary of the category [stuff most researchers would call “experimental work”], such that anything that comes too close to “running ML experiments and seeing what happens” will be met with an immediate dismissal from MIRI. In other words, [my model of] Adam thinks MIRI’s generator is configured such that the ratio of “experimental work” they find promising-to-unpromising would be roughly the same across many possible counterfactual worlds, even if each of those worlds is doing “experiments” investigating substantially different hypotheses.
Conversely, I think Rob thinks the true boundary of the category [stuff MIRI finds unpromising] is mostly unrelated to the boundary of the category [stuff most researchers would call “experimental work”], and that—to the extent MIRI finds most existing “experimental work” unpromising—this is mostly because the existing work is not oriented along directions MIRI finds promising. In other words, [my model of] Rob thinks MIRI’s generator is configured such that the ratio of “experimental work” they find promising-to-unpromising would vary significantly across counterfactual worlds where researchers investigate different hypotheses; in particular, [my model of] Rob thinks MIRI would find most “experimental work” highly promising in the world where the “experiments” being run are those whose results Eliezer/Nate/etc. would consider difficult to predict in advance, and therefore convey useful information regarding the shape of the alignment problem.
I think Rob’s insistence on maintaining the distinction between having a low opinion of “experimental work and not doing only decision theory and logic”, and having a low opinion of “mainstream ML alignment work, and of nearly all work outside the HRAD-ish cluster of decision theory, logic, etc.” is in fact an attempt to gesture at the underlying distinction outlined above, and I think that his stringency on this matter makes significantly more sense in light of this. (Though, once again, I note that I could be completely mistaken in everything I just wrote.)
Assuming, however, that I’m (mostly) not mistaken, I think there’s an obvious way forward in terms of resolving the disagreement: try to convey the underlying generators of MIRI’s worldview. In other words, do the thing you were going to do anyway, and save the discussions about word choice for afterwards.
^ This response is great.
I also think I naturally interpreted the terms in Adam’s comment as pointing to specific clusters of work in today’s world, rather than universal claims about all work that could ever be done. That is, when I see “experimental work and not doing only decision theory and logic”, I automatically think of “experimental work” as pointing to a specific cluster of work that exists in today’s world (which we might call mainstream ML alignment), rather than “any information you can get by running code”. Whereas it seems you interpreted it as something closer to “MIRI thinks there isn’t any information to get by running code”.
My brain insists that my interpretation is the obvious one and is confused how anyone (within the AI alignment field, who knows about the work that is being done) could interpret it as the latter. (Although the existence of non-public experimental work that isn’t mainstream ML is a good candidate for how you would start to interpret “experimental work” as the latter.) But this seems very plausibly a typical mind fallacy.
EDIT: Also, to explicitly say it, sorry for misunderstanding what you were trying to say. I did in fact read your comments as saying “no, MIRI is not categorically against mainstream ML work, and MIRI is not only working on HRAD-ish stuff like decision theory and logic, and furthermore this should be pretty obvious to outside observers”, and now I realize that is not what you were saying.
This is a good comment! I also agree that it’s mostly on MIRI to try to explain its views, not on others to do painstaking exegesis. If I don’t have a ready-on-hand link that clearly articulates the thing I’m trying to say, then it’s not surprising if others don’t have it in their model.
And based on these comments, I update that there’s probably more disagreement-about-MIRI than I was thinking, and less (though still a decent amount of) hyperbole/etc. If so, sorry about jumping to conclusions, Adam!