I’d be curious to hear more about this “contributes significantly in expectation” bit. Like, suppose I have some plan that (if it doesn’t work) burns timelines by X, but (if it does work) gets us 10% of the way towards aligned AGI (e.g. ~10 plans like this succeeding would suffice to achieve aligned AGI) and moreover there’s a 20% chance that this plan actually buys time by providing legible evidence of danger to regulators who then are more likely to regulate and more likely to make the regulation actually useful instead of harmful. So we have these three paths to impact (one negative, two positive) and I’m trying to balance the overall considerations. I suppose you’d say (a) do the math and see what it says, and (b) be vigilant against rationalization / wishful thinking biasing your math towards saying the benefits outweigh the costs. Is that right? Anything else you want to say here?
(A concrete example here might be ARC Evals’ research which may have inadvertently burned timelines a bit by inspiring the authors of AutoGPT who read the GPT-4 system card, but iiuc lots of people like Langchain were doing stuff like that anyway so it probably didn’t make more than a few week’s difference, and meanwhile the various beneficial effects of their evals work seem quite strong.)
(Perhaps a useful prompt would be: Do you think it’s useful to distinguish between capabilities research and research-which-has-a-byproduct-of-giving-people-capabilities-ideas? Why or why not?)
I’m trying to make a basic point here, that pushing the boundaries of the capabilities frontier, by your own hands and for that direct purpose, seems bad to me. I emphatically request that people stop doing that, if they’re doing that.
I am not requesting that people never take any action that has some probability of advancing the capabilities frontier. I think that plenty of alignment research is potentially entangled with capabilities research (and/or might get more entangled as it progresses), and I think that some people are making the tradeoffs in ways I wouldn’t personally make them, but this request isn’t for people who are doing alignment work while occasionally mournfully incurring a negative externality of pushing the capabilities frontier.
(I acknowledge that some people who just really want to do capabilities research will rationalize it away as alignment-relevant somehow, but here on Earth we have plenty of people pushing the boundaries of the capabilities frontier by their own hands and for that direct purpose, and it seems worth asking them to stop.)
… but this request isn’t for people who are doing alignment work while occasionally mournfully incurring a negative externality of pushing the capabilities frontier.
So then what’s the point of posting it?
Anyone that could possibly stumble across this post would not believe themselves to be the villain, just “occasionally mournfully incurring a negative externality of pushing the capabilities frontier”, in or outside of ‘alignment work’.
Obviously many people exist who think that pushing the capabilities frontier is not a cost but rather a benefit. Your comment, taken literally, is saying that such people “could [not] possibly stumble across this post”. You don’t really believe that, right? It’s a blog post on the internet, indexed by search engines etc. Anyone could stumble across it!
I was one of the people who upvoted but disagreed—I think it’s a good point you raise, M. Y. Zuo, that So8res’ qualifications blunt the blow and give people an out, a handy rationalization to justify continuing working on capabilities. However, there’s still a non-zero (and I’d argue substantial) effect remaining.
I’d be curious to hear more about this “contributes significantly in expectation” bit. Like, suppose I have some plan that (if it doesn’t work) burns timelines by X, but (if it does work) gets us 10% of the way towards aligned AGI (e.g. ~10 plans like this succeeding would suffice to achieve aligned AGI) and moreover there’s a 20% chance that this plan actually buys time by providing legible evidence of danger to regulators who then are more likely to regulate and more likely to make the regulation actually useful instead of harmful. So we have these three paths to impact (one negative, two positive) and I’m trying to balance the overall considerations. I suppose you’d say (a) do the math and see what it says, and (b) be vigilant against rationalization / wishful thinking biasing your math towards saying the benefits outweigh the costs. Is that right? Anything else you want to say here?
(A concrete example here might be ARC Evals’ research which may have inadvertently burned timelines a bit by inspiring the authors of AutoGPT who read the GPT-4 system card, but iiuc lots of people like Langchain were doing stuff like that anyway so it probably didn’t make more than a few week’s difference, and meanwhile the various beneficial effects of their evals work seem quite strong.)
(Perhaps a useful prompt would be: Do you think it’s useful to distinguish between capabilities research and research-which-has-a-byproduct-of-giving-people-capabilities-ideas? Why or why not?)
I’m trying to make a basic point here, that pushing the boundaries of the capabilities frontier, by your own hands and for that direct purpose, seems bad to me. I emphatically request that people stop doing that, if they’re doing that.
I am not requesting that people never take any action that has some probability of advancing the capabilities frontier. I think that plenty of alignment research is potentially entangled with capabilities research (and/or might get more entangled as it progresses), and I think that some people are making the tradeoffs in ways I wouldn’t personally make them, but this request isn’t for people who are doing alignment work while occasionally mournfully incurring a negative externality of pushing the capabilities frontier.
(I acknowledge that some people who just really want to do capabilities research will rationalize it away as alignment-relevant somehow, but here on Earth we have plenty of people pushing the boundaries of the capabilities frontier by their own hands and for that direct purpose, and it seems worth asking them to stop.)
So then what’s the point of posting it?
Anyone that could possibly stumble across this post would not believe themselves to be the villain, just “occasionally mournfully incurring a negative externality of pushing the capabilities frontier”, in or outside of ‘alignment work’.
Obviously many people exist who think that pushing the capabilities frontier is not a cost but rather a benefit. Your comment, taken literally, is saying that such people “could [not] possibly stumble across this post”. You don’t really believe that, right? It’s a blog post on the internet, indexed by search engines etc. Anyone could stumble across it!
I was one of the people who upvoted but disagreed—I think it’s a good point you raise, M. Y. Zuo, that So8res’ qualifications blunt the blow and give people an out, a handy rationalization to justify continuing working on capabilities. However, there’s still a non-zero (and I’d argue substantial) effect remaining.