I guess the opposite point of view is that aligning AIs to AI companies’ money interests is harmful to the rest of us, so it might actually be better if AI companies didn’t have much time to do it, and the AIs got to keep some leftover morality from human texts. And WBE would enable the powerful to do some pretty horrible things to the powerless, so without some kind of benevolent oversight a world with WBE might be scary. But I’m not sure about any of this, maybe your points are right and mine are wrong.
In one specific respect I’d like to challenge your point. I think fine-tuning models currently aligns them ‘well-enough’ to any target point of view. I think that the ethics shown by current LLMs are due to researchers actively putting them there. I’ve been doing red teaming exercises on LLMs for over a year now, and I find it quite easy to fine-tune them to be evil and murderous. Human texts help them understand morality, but don’t make them care enough about it for it to be sticky in the face of fine-tuning.
Yeah, on further thought I think you’re right. This is pretty pessimistic then, AI companies will find it easy to align AIs to money interests, and the rest of us will be in a “natives vs the East India Company” situation. More time to spend on alignment then matters only if some companies actually try to align AIs to something good instead, and I’m not sure any companies will do that.
This is also my view of the situation, as well, and is a big portion of the reason why solving AI alignment, which reduces existential risk a lot, is non-trivially likely without further political reforms I don’t expect to lead to dystopian worlds (from my values).
Yeah, any small group of humans seizing unprecedented control over the entire world seems like a bad gamble to take, even if they start off seeming like decent people.
I’m currently hoping we can figure some kind of new governance solution for managing decentralized power while achieving adequate safety inspections.
This is consistent with a model where AI alignment is heavily dependent on the data, and way less dependent on inductive biases/priors, so this is good news for alignment.
Perhaps, depends how it is. I think we could do worse than just have Anthropic have a 2 year lead etc. I don’t think they would need to prioritize profit as they would be so powerful anyway—the staff would be more interested in getting it right and wouldn’t have financial pressure. WBE is a bit difficult, there needs to be clear expectations, i.e. leave weaker people alone and make your own world https://www.lesswrong.com/posts/o8QDYuNNGwmg29h2e/vision-of-a-positive-singularity There is no reason why super AI would need to exploit normies. Whatever we decide, we need some kind of clear expectations and values regarding what WBE are before they become common, Are they benevolent super-elders, AI gods banished to “just” the rest of the galaxy, the natural life progression of first world humans now?
I think the problem with WBE is that anyone who owns a computer and can decently hide it (or fly off in a spaceship with it) becomes able to own slaves, torture them and whatnot. So after that technology appears, we need some very strong oversight—it becomes almost mandatory to have a friendly AI watching over everything.
I’m considering a world transitioning to being run by WBE rather than AI so I would prefer not to give everyone “slap drones” https://theculture.fandom.com/wiki/Slap-drone To start with the compute will mean few WBE, much less than humans and they will police each other. Later on, I am too much of a moral realist to imagine that there would be mass senseless torturing. For a start if you well protect other em’s so you can only simulate yourself, you wouldn’t do it. I expect any boring job can be made non-conscious so their just isn’t the incentive to do that. At the late stage singularity if you will let humanity go their own way, there is fundamentally a tradeoff between letting “people”(WBE etc) make their own decisions and allowing the possibility of them doing bad things. You also have to be strongly suffering averse vs util - there would surely be >>> more “heavens” vs “hells” if you just let advanced beings do their own thing.
I guess the opposite point of view is that aligning AIs to AI companies’ money interests is harmful to the rest of us, so it might actually be better if AI companies didn’t have much time to do it, and the AIs got to keep some leftover morality from human texts. And WBE would enable the powerful to do some pretty horrible things to the powerless, so without some kind of benevolent oversight a world with WBE might be scary. But I’m not sure about any of this, maybe your points are right and mine are wrong.
In one specific respect I’d like to challenge your point. I think fine-tuning models currently aligns them ‘well-enough’ to any target point of view. I think that the ethics shown by current LLMs are due to researchers actively putting them there. I’ve been doing red teaming exercises on LLMs for over a year now, and I find it quite easy to fine-tune them to be evil and murderous. Human texts help them understand morality, but don’t make them care enough about it for it to be sticky in the face of fine-tuning.
Yeah, on further thought I think you’re right. This is pretty pessimistic then, AI companies will find it easy to align AIs to money interests, and the rest of us will be in a “natives vs the East India Company” situation. More time to spend on alignment then matters only if some companies actually try to align AIs to something good instead, and I’m not sure any companies will do that.
This is also my view of the situation, as well, and is a big portion of the reason why solving AI alignment, which reduces existential risk a lot, is non-trivially likely without further political reforms I don’t expect to lead to dystopian worlds (from my values).
Yeah, any small group of humans seizing unprecedented control over the entire world seems like a bad gamble to take, even if they start off seeming like decent people.
I’m currently hoping we can figure some kind of new governance solution for managing decentralized power while achieving adequate safety inspections.
https://www.lesswrong.com/posts/FEcw6JQ8surwxvRfr/human-takeover-might-be-worse-than-ai-takeover?commentId=uSPR9svtuBaSCoJ5P
This is consistent with a model where AI alignment is heavily dependent on the data, and way less dependent on inductive biases/priors, so this is good news for alignment.
Perhaps, depends how it is. I think we could do worse than just have Anthropic have a 2 year lead etc. I don’t think they would need to prioritize profit as they would be so powerful anyway—the staff would be more interested in getting it right and wouldn’t have financial pressure. WBE is a bit difficult, there needs to be clear expectations, i.e. leave weaker people alone and make your own world
https://www.lesswrong.com/posts/o8QDYuNNGwmg29h2e/vision-of-a-positive-singularity
There is no reason why super AI would need to exploit normies. Whatever we decide, we need some kind of clear expectations and values regarding what WBE are before they become common, Are they benevolent super-elders, AI gods banished to “just” the rest of the galaxy, the natural life progression of first world humans now?
I think the problem with WBE is that anyone who owns a computer and can decently hide it (or fly off in a spaceship with it) becomes able to own slaves, torture them and whatnot. So after that technology appears, we need some very strong oversight—it becomes almost mandatory to have a friendly AI watching over everything.
I’m considering a world transitioning to being run by WBE rather than AI so I would prefer not to give everyone “slap drones” https://theculture.fandom.com/wiki/Slap-drone To start with the compute will mean few WBE, much less than humans and they will police each other. Later on, I am too much of a moral realist to imagine that there would be mass senseless torturing. For a start if you well protect other em’s so you can only simulate yourself, you wouldn’t do it. I expect any boring job can be made non-conscious so their just isn’t the incentive to do that. At the late stage singularity if you will let humanity go their own way, there is fundamentally a tradeoff between letting “people”(WBE etc) make their own decisions and allowing the possibility of them doing bad things. You also have to be strongly suffering averse vs util - there would surely be >>> more “heavens” vs “hells” if you just let advanced beings do their own thing.