Another interesting idea for discussion, is the value of making a long-term commitment to keeping research within a contained environment (i.e. what the OP calls ‘nondisclosed-by-default’).
There’s a bunch of args. Many seem straightforward to me (early research doesn’t translate well into papers at all, it might accidentally turn out to move capabilities forward and you want to see it develop a while to be sure it won’t, etc) but this one surprised me more, and I’d be interested to know if it resonates/is-dissonant with others’ experiences.
We need our researchers to not have walls within their own heads
We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,13 researchers report that they can think more freely, that their brainstorming sessions extend further and wider, and so on.
This sort of inhibition seems quite bad for research progress. It is not a small area that our researchers were (un- or semi-consciously) holding back from; it’s a reasonably wide swath that may well include most of the deep ideas or insights we’re looking for.
At the same time, this kind of caution is an unavoidable consequence of doing deconfusion research in public, since it’s very hard to know what ideas may follow five or ten years after a given insight. AI alignment work and AI capabilities work are close enough neighbors that many insights in the vicinity of AI alignment are “potentially capabilities-relevant until proven harmless,” both for reasons discussed above and from the perspective of the conservative security mindset we try to encourage around here.
In short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.
If our goal is to make research progress as quickly as possible, in hopes of having concepts coherent enough to allow rigorous safety engineering by the time AGI arrives, then it seems worth finding ways to allow our researchers to think without constraints, even when those ways are somewhat expensive.
Focus seems unusually useful for this kind of work
There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.
Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using [emphasis added]. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.
Once we realized this was going on, we realized that in retrospect, we may have been ignoring common practice, in a way. Many startup founders have reported finding stealth mode, and funding that isn’t from VC outsiders, tremendously useful for focus. For this reason, we’ve also recently been encouraging researchers at MIRI to worry less about appealing to a wide audience when doing public-facing work. We want researchers to focus mainly on whatever research directions they find most compelling, make exposition and distillation a secondary priority, and not worry about optimizing ideas for persuasiveness or for being easier to defend.
Yes, this very much resonates with me, especially because a parallel issue exists in biosecurity, where we don’t want to talk publicly about how to work to prevent things that we’re worried about because it could prompt bad actors to look into those things.
The issues here are different, but the need to have walls between what you think about and what you discuss imposes a real cost.
When it comes to disclosure policies, if I’m uncertain between the “MIRI view” and the “Paul Christiano” view, should I bite the bullet and back one approach over the other? Or can I aim to support both views, without worrying that they’re defeating each other?
My current understanding is that it’s coherent to support both at once. That is, I can think that possibly intelligence needs lots of fundamental insights, and that safety needs lots of similar insights (this is supposed to be a characterisation of a MIRI-ish view). I can think that work done on figuring out more about intelligence and how to control it should only be shared cautiously, because it may accelerate the creation of AGI.
I can also think that prosaic AGI is possible, and fundamental insights aren’t needed. Then I might think that I could do research that would help align prosaic AGIs but couldn’t possibly align (or contribute to) an agent-based AGI.
Is the above consistent? Also do people (with better emulators of people) who worry about disclosure think that this makes sense from their point of view?
Edited: Added a key section at the end.
Another interesting idea for discussion, is the value of making a long-term commitment to keeping research within a contained environment (i.e. what the OP calls ‘nondisclosed-by-default’).
There’s a bunch of args. Many seem straightforward to me (early research doesn’t translate well into papers at all, it might accidentally turn out to move capabilities forward and you want to see it develop a while to be sure it won’t, etc) but this one surprised me more, and I’d be interested to know if it resonates/is-dissonant with others’ experiences.
Yes, this very much resonates with me, especially because a parallel issue exists in biosecurity, where we don’t want to talk publicly about how to work to prevent things that we’re worried about because it could prompt bad actors to look into those things.
The issues here are different, but the need to have walls between what you think about and what you discuss imposes a real cost.
When it comes to disclosure policies, if I’m uncertain between the “MIRI view” and the “Paul Christiano” view, should I bite the bullet and back one approach over the other? Or can I aim to support both views, without worrying that they’re defeating each other?
My current understanding is that it’s coherent to support both at once. That is, I can think that possibly intelligence needs lots of fundamental insights, and that safety needs lots of similar insights (this is supposed to be a characterisation of a MIRI-ish view). I can think that work done on figuring out more about intelligence and how to control it should only be shared cautiously, because it may accelerate the creation of AGI.
I can also think that prosaic AGI is possible, and fundamental insights aren’t needed. Then I might think that I could do research that would help align prosaic AGIs but couldn’t possibly align (or contribute to) an agent-based AGI.
Is the above consistent? Also do people (with better emulators of people) who worry about disclosure think that this makes sense from their point of view?