I’m a researcher on the technical governance team at MIRI.
Views expressed are my own, and should not be taken to represent official MIRI positions. Similarly, views within the technical governance team do vary.
Previously:
Helped with MATS, running the technical side of the London extension (pre-LISA).
Worked for a while on Debate (this kind of thing).
Quick takes on the above:
I think MATS is great-for-what-it-is. My misgivings relate to high-level direction.
Worth noting that PIBBSS exists, and is philosophically closer to my ideal.
The technical AISF course doesn’t have the emphasis I’d choose (which would be closer to Key Phenomena in AI Risk). It’s a decent survey of current activity, but only implicitly gets at fundamentals—mostly through a [notice what current approaches miss, and will continue to miss] mechanism.
I don’t expect research on Debate, or scalable oversight more generally, to help significantly in reducing AI x-risk. (I may be wrong! - some elaboration in this comment thread)
Interesting. I mostly agree with the gist.
The following are a few thoughts that occur to me. Presented as potentially useful pointers, rather than well-thought-through arguments/conclusions.
I don’t think “pseudo-mechanisms” is a useful label. Feels a bit too binary (and/or post-hoc) in a highly grey situation.
I’m not sure what you mean by “mechanistic model” vs “stable phenomenological compressions”.
I’m not saying I have no idea what you’re talking about—just that I’m not clear quite how you want to distinguish these things. (note that I haven’t read many of your previous posts—yet! :))
As soon as I’m calling something a “stable” pattern in the data, there’s at least an implicit [...and this pattern will continue to hold] hypothesis.
What makes something a “mechanistic model”? E.g. does it need to involve a temporal pattern, so that I’m likely to think of it in causal terms?
Is the problem here that humans tend to prematurely place too high a probability on causal hypotheses?
If so, is the general thing to be wary of more [hypotheses I’m likely to believe too strongly given the (lack of) evidence]?
E.g. a mathematician might be wary of elegant hypotheses on this basis.
This seems a plausible explanation for a [practicality is inversely proportional to attachment to mechanisms]. The less practical a field, the more its practitioners will tend to be attracted by some kind of aesthetic sense—and that’s a potential source of bias and premature attachment. (and in many cases there’s the [less straightforwardly falsifiable] factor)
Here I remain unclear, as above. (I don’t know what separates [have a good handle on data patterns] from [have explanations])
It seems to me our thinking is always going to be inseparable from a huge number of mechanistic expectations and assumptions (often implicit).
It seems a lesson here is something like:
Be aware of the mechanistic models we’re relying on.
Be aware of the tendency to get prematurely attached to an explanation.
Adapt accordingly.
I don’t think jumping to a mechanistic model is itself an error—stubbornly sticking to one seems to be the problem.
Nimbleness seems desirable.
In particular since [assuming too much] and [assuming too little] are both sources of inefficiency.
Similarly here, I think it’s asking for trouble to imagine that [Gabe’s characterization and extrapolation of [that]] doesn’t already rely on a bunch of intent-based expectations and assumptions. (these will usually be more reliable than guesses we’d tend to label “psycho-analysis”—but they’re present and important)
For this reason, [be aware of the degree to which you’re x-ing, and the implications] seems safer advice than [avoid x-ing], for many x.