ignoring whether anthropic should exist or not, the claim
successful alignment work is most likely to come out of people who work closely with cutting edge AI and who are using the modern deep learning paradigm
(which I agree with wholeheartedly,)
does not seem like the opposite of the claim
there was no groundbreaking safety progress at or before Anthropic
both could be true in some world. and then,
pragmatic approaches by frontier labs are very unlikely to succeed
I believe this claim, if by “succeed” we mean “directly result in solving the technical problem well enough that the only problems that remain are political, and we now could plausibly make humanity’s consensus nightwatchman ai and be sure it’s robust to further superintelligence, if there was political will to do so”
but,
alternative theoretical work that is unrelated to modern AI has a high chance of success
I don’t buy this claim. I actually doubt there are other general learning techniques out there in math space at all, because I think we’re already just doing “approximation of bayesian updating on circuits”. BUT, I also currently think we cannot succeed (as above) without theoretical work that can get us from “well we found some concepts in the model...” to ”...and now we have certified the decentralized nightwatchman for good intentions sufficient to withstand the weight of all other future superhuman minds’ mutation-inducing exploratory effort”.
I claim theoretical work of relevance needs to be immediately and clearly relevant to deep learning as soon as it comes out if it’s going to be of use. Something that can’t be used on deep learning can’t be useful. (And I don’t think all of MIRI’s work fails this test, though most does, I could go step through and classify if someone wants.)
I don’t think I can make reliably true claims about anthropic’s effects with the amount of information I have, but their effects seem suspiciously business-success-seeking to me, in a way that seems like it isn’t prepared to overcome the financial incentives I think are what mostly kill us anyway.
I actually doubt there are other general learning techniques out there in math space at all, because I think we’re already just doing “approximation of bayesian updating on circuits”
Interesting perspective! I think I agree with this in practice although not in theory (I imagine there are some other ways to make it work, I just think they’re very impractical compared to deep learning).
I don’t think I can make reliably true claims about anthropic’s effects with the amount of information I have, but their effects seem suspiciously business-success-seeking to me, in a way that seems like it isn’t prepared to overcome the financial incentives I think are what mostly kill us anyway.
Part of my frustration is that I agree there are tons of difficult pressures on people at frontier AI companies, and I think sometimes they bow to these pressures. They hedge about AI risk, they shortchange safety efforts, they unnecessarily encourage race dynamics. I view them as being in a vitally important and very difficult position where some mistakes are inevitable, and I view this as just another type of mistake that should be watched for and fixed.
But instead, these mistakes are used as just another rock to throw—any time they do something wrong, real or imagined, people use this as a black mark against them that proves they’re corrupt or evil. I think that’s both untrue and profoundly unhelpful.
ignoring whether anthropic should exist or not, the claim
(which I agree with wholeheartedly,)
does not seem like the opposite of the claim
both could be true in some world. and then,
I believe this claim, if by “succeed” we mean “directly result in solving the technical problem well enough that the only problems that remain are political, and we now could plausibly make humanity’s consensus nightwatchman ai and be sure it’s robust to further superintelligence, if there was political will to do so”
but,
I don’t buy this claim. I actually doubt there are other general learning techniques out there in math space at all, because I think we’re already just doing “approximation of bayesian updating on circuits”. BUT, I also currently think we cannot succeed (as above) without theoretical work that can get us from “well we found some concepts in the model...” to ”...and now we have certified the decentralized nightwatchman for good intentions sufficient to withstand the weight of all other future superhuman minds’ mutation-inducing exploratory effort”.
I claim theoretical work of relevance needs to be immediately and clearly relevant to deep learning as soon as it comes out if it’s going to be of use. Something that can’t be used on deep learning can’t be useful. (And I don’t think all of MIRI’s work fails this test, though most does, I could go step through and classify if someone wants.)
I don’t think I can make reliably true claims about anthropic’s effects with the amount of information I have, but their effects seem suspiciously business-success-seeking to me, in a way that seems like it isn’t prepared to overcome the financial incentives I think are what mostly kill us anyway.
Interesting perspective! I think I agree with this in practice although not in theory (I imagine there are some other ways to make it work, I just think they’re very impractical compared to deep learning).
Part of my frustration is that I agree there are tons of difficult pressures on people at frontier AI companies, and I think sometimes they bow to these pressures. They hedge about AI risk, they shortchange safety efforts, they unnecessarily encourage race dynamics. I view them as being in a vitally important and very difficult position where some mistakes are inevitable, and I view this as just another type of mistake that should be watched for and fixed.
But instead, these mistakes are used as just another rock to throw—any time they do something wrong, real or imagined, people use this as a black mark against them that proves they’re corrupt or evil. I think that’s both untrue and profoundly unhelpful.