These may be among the ‘most direct’ or ‘simplest to imagine’ possible actions, but in the case of superintelligence, simplicity is not a constraint.
I think it is considered a constraint by some because they think that it would be easier/safer to use a superintelligent AI to do simpler actions, while alignment is not yet fully solved. In other words, if alignment was fully solved, then you could use it to do complicated things like what you suggest, but there could be an intermediate stage of alignment progress where you could safely use SI to do something simple like “melt GPUs” but not to achieve more complex goals.
it is considered a constraint by some because they think that it would be easier/safer to use a superintelligent AI to do simpler actions, while alignment is not yet fully solved
Agreed that some think this, and agreed that formally specifying a simple action policy is easier than a more complex one.[1]
I have a different model of what the earliest safe ASI will look like, in most futures where one exists. Rather than a ‘task-aligned’ agent, I expect it to be a non-agentic system which can be used to e.g come up with pivotal actions for the human group to take / information to act on.[2]
although formal ‘task-aligned agency’ seems potentially more complex than the attempt at a ‘full’ outer alignment solution that I’m aware of (QACI), as in specifying what a {GPU, AI lab, shutdown of an AI lab} is seems more complex than it.
I think these systems are more attainable, see this post to possibly infer more info (it’s proven very difficult for me to write in a way that I expect will be moving to people who have a model focused on ‘formal inner + formal outer alignment’, but I think evhub has done so well).
Reflecting on this more, I wrote in a discord server (then edited to post here):
I wasn’t aware the concept of pivotal acts was entangled with the frame of formal inner+outer alignment as the only (or only feasible?) way to cause safe ASI.
I suspect that by default, I and someone operating in that frame might mutually believe each others agendas to be probably-doomed. This could make discussion more valuable (as in that case, at least one of us should make a large update).
For anyone interested in trying that discussion, I’d be curious what you think of the post linked above. As a comment on it says:
I found myself coming back to this now, years later, and feeling like it is massively underrated. Idk, it seems like the concept of training stories is great and much better than e.g. “we have to solve inner alignment and also outer alignment” or “we just have to make sure it isn’t scheming.”
In my view, solving formal inner alignment, i.e. devising a general method to create ASI with any specified output-selection policy, is hard enough that I don’t expect it to be done.[1] This is why I’ve been focusing on other approaches which I believe are more likely to succeed.
Though I encourage anyone who understands the problem and thinks they can solve it to try to prove me wrong! I can sure see some directions and I think a very creative human could solve it in principle. But I also think a very creative human might find a different class of solution that can be achieved sooner. (Like I’ve been trying to do :)
I think it is considered a constraint by some because they think that it would be easier/safer to use a superintelligent AI to do simpler actions, while alignment is not yet fully solved. In other words, if alignment was fully solved, then you could use it to do complicated things like what you suggest, but there could be an intermediate stage of alignment progress where you could safely use SI to do something simple like “melt GPUs” but not to achieve more complex goals.
Agreed that some think this, and agreed that formally specifying a simple action policy is easier than a more complex one.[1]
I have a different model of what the earliest safe ASI will look like, in most futures where one exists. Rather than a ‘task-aligned’ agent, I expect it to be a non-agentic system which can be used to e.g come up with pivotal actions for the human group to take / information to act on.[2]
although formal ‘task-aligned agency’ seems potentially more complex than the attempt at a ‘full’ outer alignment solution that I’m aware of (QACI), as in specifying what a {GPU, AI lab, shutdown of an AI lab} is seems more complex than it.
I think these systems are more attainable, see this post to possibly infer more info (it’s proven very difficult for me to write in a way that I expect will be moving to people who have a model focused on ‘formal inner + formal outer alignment’, but I think evhub has done so well).
Reflecting on this more, I wrote in a discord server (then edited to post here):
I wasn’t aware the concept of pivotal acts was entangled with the frame of formal inner+outer alignment as the only (or only feasible?) way to cause safe ASI.
I suspect that by default, I and someone operating in that frame might mutually believe each others agendas to be probably-doomed. This could make discussion more valuable (as in that case, at least one of us should make a large update).
For anyone interested in trying that discussion, I’d be curious what you think of the post linked above. As a comment on it says:
In my view, solving formal inner alignment, i.e. devising a general method to create ASI with any specified output-selection policy, is hard enough that I don’t expect it to be done.[1] This is why I’ve been focusing on other approaches which I believe are more likely to succeed.
Though I encourage anyone who understands the problem and thinks they can solve it to try to prove me wrong! I can sure see some directions and I think a very creative human could solve it in principle. But I also think a very creative human might find a different class of solution that can be achieved sooner. (Like I’ve been trying to do :)