there is a bunch of different safety problems where solving some but not all of them at the same time can make the overall situation worse
But isn’t the relevant thing here whether solving some makes the overall situation worse in expectation? Like sure, it could be the case that e.g. solving intent alignment makes things worse, but it seems relatively unlikely. It seems implausible that the current set of problems we have now cancel each other out optimally. Or maybe we should try to introduce new problems to further mitigate the set of existing problems!
But isn’t the relevant thing here whether solving some makes the overall situation worse in expectation?
It’s really unclear. On a meta level, it seems that we can’t trust our expectation estimates, due to tendency to ignore risks for status/power reasons. On the object level, there’s a very plausible mechanism why solving some problems can make things worse, and note that the choice isn’t between solving vs never solving some problem, but when to solve it.
Or maybe we should try to introduce new problems to further mitigate the set of existing problems!
That’s sort of what some people are trying to do, right, when they talk about bombing data centers, or doing hunger strikes in front of AI companies?
On the object level, there’s a very plausible mechanism why solving some problems can make things worse
Hmm, I see. But it seems to me that the linked post is overoptimistic that making problems legible will cause people to slow down until they are solved. Already, lots of problems that are “legible” to this community are not “legible” enough to make the labs or governments want to slow down. So working to solve them(by our standards, since apparently other people already consider them solved or unimportant) could still be useful.
Of course you could then say that what we should be doing is trying to make problems legible to this community legible to the wider world, which is pretty much what MIRI is trying to do at the moment. Certainly that seems like a valuable thing to do. But far from guaranteed to succeed. And I think the fact that LW-legibility of takeover isn’t on its own enough to cause the wider world to slow down, should make us less worried that solving one problem to our standards will make the world push ahead more recklessly, since they seemingly aren’t that responsive to what problems are considered solved by our standards.
But isn’t the relevant thing here whether solving some makes the overall situation worse in expectation? Like sure, it could be the case that e.g. solving intent alignment makes things worse, but it seems relatively unlikely. It seems implausible that the current set of problems we have now cancel each other out optimally. Or maybe we should try to introduce new problems to further mitigate the set of existing problems!
It’s really unclear. On a meta level, it seems that we can’t trust our expectation estimates, due to tendency to ignore risks for status/power reasons. On the object level, there’s a very plausible mechanism why solving some problems can make things worse, and note that the choice isn’t between solving vs never solving some problem, but when to solve it.
That’s sort of what some people are trying to do, right, when they talk about bombing data centers, or doing hunger strikes in front of AI companies?
Hmm, I see. But it seems to me that the linked post is overoptimistic that making problems legible will cause people to slow down until they are solved. Already, lots of problems that are “legible” to this community are not “legible” enough to make the labs or governments want to slow down. So working to solve them(by our standards, since apparently other people already consider them solved or unimportant) could still be useful.
Of course you could then say that what we should be doing is trying to make problems legible to this community legible to the wider world, which is pretty much what MIRI is trying to do at the moment. Certainly that seems like a valuable thing to do. But far from guaranteed to succeed. And I think the fact that LW-legibility of takeover isn’t on its own enough to cause the wider world to slow down, should make us less worried that solving one problem to our standards will make the world push ahead more recklessly, since they seemingly aren’t that responsive to what problems are considered solved by our standards.
Would you say that lobbying or worker unions are “introducing new problems to mitigate existing problems”?