Who (besides yourself) has this position? I feel like believing the safety research we do now is bullshit is highly correlated with thinking its also useless and we should do something else.
I do, though maybe not this extreme. Roughly every other day I bemoan the fact that AIs aren’t misaligned yet (limiting the excitingness of my current research) and might not even be misaligned in future, before reminding myself our world is much better to live in than the alternative. I think there’s not much else to do with a similar impact given how large even a 1% p(doom) reduction is. But I also believe that particularly good research now can trade 1:1 with crunch time.
Theoretical work is just another step removed from the problem and should be viewed with at least as much suspicion.
I like your emphasis on good research. I agree that the best current research does probably trade 1:1 with crunch time.
I think we should apply the same qualification to theoretical research. Well-directed theory is highly useful; poorly-directed theory is almost useless in expectation.
I think theory directed specifically at LLM-based takeover-capable systems is neglected, possibly in part because empiricists focused on LLMs distrust theory, while theorists tend to dislike messy LLMs.
I share almost exactly this opinion, and I hope it’s fairly widespread.
The issue is that almost all of the “something elses” seem even less productive on expectation.
(That’s for technical approaches. The communication-minded should by all means be working on spreading the alarm and so slowing progress and raising the ambient levels fo risk-awareness).
LLM research could and should get a lot more focused on future risks instead of current ones. But I don’t see alternatives that realistically have more EV.
It really looks like the best guess is that AGI is now quite likely to be descended from LLMs. And I see little practical hope of pausing that progress. So accepting the probabilities on the game board and researching LLMs/transformers makes sense even when it’s mostly practice and gaining just a little bit of knowledge of how LLMs/transformers/networks represent knowledge and generate behaviors.
It’s of course down to individual research programs; there’s a bunch of really irrelevant LLM research that would be better directed elsewhere. And having a little effort directed to unlikely scenarios where we get very different AGI is also defensible—as long as it’s defended, not just hope-based.
This is of course a major outstanding debate, and needs to be had carefully. But I’d really like to see more of this type of careful thinking about the likely efficiency of different research routes.
I think there’s low-hanging fruit in trying to improve research on LLMs to anticipate the new challenges that arrive when LLM-descended AGI becomes actually dangerous. My recent post LLM AGI may reason about its goals and discover misalignments by default suggests research addressing one fairly obvious possible new risk when LLM-based systems become capable of competent reasoning and planning.
IIRC I heard the “we’re spending months now to save ourselves days (or even hours) later” from the control guys, but I don’t know if they’d endorse the perspective I’ve outlined
I do, which is why I’ve always placed much more emphasis on figuring out how to do automated AI safety research as safely as we can, rather than trying to come up with some techniques that seem useful at the current scale but will ultimately be a weak proxy (but are good for gaining reputation in and out of the community, cause it looks legit).
That said, I think one of the best things we can hope for is that these techniques at least help us to safely get useful alignment research in the lead up to where it all breaks and that it allows us to figure out better techniques that do scale for the next generation while also having a good safety-usefulness tradeoff.
rather than trying to come up with some techniques that seem useful at the current scale but will ultimately be a weak proxy
To clarify, this means you don’t hold the position I expressed. On the view I expressed, experiments using weak proxies are worthwhile even though they aren’t very informative
Hmm, so I still hold the view that they are worthwhile even if they are not informative, particularly for the reasons you seem to have pointed to (i.e. training up good human researchers to identify who has a knack for a specific style of research s.t. we can use them for providing initial directions to AIs automating AI safety R&D as well as serving as model output verifiers OR building infra that ends up being used by AIs that are good enough to do tons of experiments leveraging that infra but not good enough to come up with completely new paradigms).
Who (besides yourself) has this position? I feel like believing the safety research we do now is bullshit is highly correlated with thinking its also useless and we should do something else.
I do, though maybe not this extreme. Roughly every other day I bemoan the fact that AIs aren’t misaligned yet (limiting the excitingness of my current research) and might not even be misaligned in future, before reminding myself our world is much better to live in than the alternative. I think there’s not much else to do with a similar impact given how large even a 1% p(doom) reduction is. But I also believe that particularly good research now can trade 1:1 with crunch time.
Theoretical work is just another step removed from the problem and should be viewed with at least as much suspicion.
I like your emphasis on good research. I agree that the best current research does probably trade 1:1 with crunch time.
I think we should apply the same qualification to theoretical research. Well-directed theory is highly useful; poorly-directed theory is almost useless in expectation.
I think theory directed specifically at LLM-based takeover-capable systems is neglected, possibly in part because empiricists focused on LLMs distrust theory, while theorists tend to dislike messy LLMs.
I share almost exactly this opinion, and I hope it’s fairly widespread.
The issue is that almost all of the “something elses” seem even less productive on expectation.
(That’s for technical approaches. The communication-minded should by all means be working on spreading the alarm and so slowing progress and raising the ambient levels fo risk-awareness).
LLM research could and should get a lot more focused on future risks instead of current ones. But I don’t see alternatives that realistically have more EV.
It really looks like the best guess is that AGI is now quite likely to be descended from LLMs. And I see little practical hope of pausing that progress. So accepting the probabilities on the game board and researching LLMs/transformers makes sense even when it’s mostly practice and gaining just a little bit of knowledge of how LLMs/transformers/networks represent knowledge and generate behaviors.
It’s of course down to individual research programs; there’s a bunch of really irrelevant LLM research that would be better directed elsewhere. And having a little effort directed to unlikely scenarios where we get very different AGI is also defensible—as long as it’s defended, not just hope-based.
This is of course a major outstanding debate, and needs to be had carefully. But I’d really like to see more of this type of careful thinking about the likely efficiency of different research routes.
I think there’s low-hanging fruit in trying to improve research on LLMs to anticipate the new challenges that arrive when LLM-descended AGI becomes actually dangerous. My recent post LLM AGI may reason about its goals and discover misalignments by default suggests research addressing one fairly obvious possible new risk when LLM-based systems become capable of competent reasoning and planning.
Bullshit was a poor choice of words. A better choice would’ve been “weak proxy”. On this view, this is still very worthwhile. See footnote.
IIRC I heard the “we’re spending months now to save ourselves days (or even hours) later” from the control guys, but I don’t know if they’d endorse the perspective I’ve outlined
I do, which is why I’ve always placed much more emphasis on figuring out how to do automated AI safety research as safely as we can, rather than trying to come up with some techniques that seem useful at the current scale but will ultimately be a weak proxy (but are good for gaining reputation in and out of the community, cause it looks legit).
That said, I think one of the best things we can hope for is that these techniques at least help us to safely get useful alignment research in the lead up to where it all breaks and that it allows us to figure out better techniques that do scale for the next generation while also having a good safety-usefulness tradeoff.
To clarify, this means you don’t hold the position I expressed. On the view I expressed, experiments using weak proxies are worthwhile even though they aren’t very informative
Hmm, so I still hold the view that they are worthwhile even if they are not informative, particularly for the reasons you seem to have pointed to (i.e. training up good human researchers to identify who has a knack for a specific style of research s.t. we can use them for providing initial directions to AIs automating AI safety R&D as well as serving as model output verifiers OR building infra that ends up being used by AIs that are good enough to do tons of experiments leveraging that infra but not good enough to come up with completely new paradigms).