I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it. The people who work at the top labs make me ashamed to be human. It’s a shandah.
This probably won’t make a difference, but I’ll write this anyways. If you’re working on AI-control, do you trust the people who end up in charge of the technology to wield it well? If you don’t, why are you working on AI control?
I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
Most s-risk scenarios vaguely analogous to historical situations don’t happen in a post-AGI world, because there humans aren’t useful for anything, either economically or in terms of maintaining power (unlike how they were throughout human history). It’s not useful for the entities in power to do any of the things with traditionally terrible side effects.
Absence of feedback loops for treating people well (at the level of humanity as a whole) is its own problem, but it’s a distinct kind of problem. It doesn’t necessarily settle poorly (at the level of individuals and smaller communities) in a world with radical abundance, if indeed even a tiny fraction of the global resources gets allocated to the future of humanity, which is the hard part to ensure.
I might be misunderstanding, but doesn’t this sort of assume that all tyranny is purely about resources?
No matter the level of abundance, its not clear that this makes power any less appealing to the power hungry, or suffering any less enjoyable to the sadists. So I don’t see why power-centralisation in the wrong hands would not be a problem in a post-AGI world.
Power-centralisation in a post-AGI world is not about wielding humans, unlike in a pre-AGI world. Power is no longer power over humans doing your bidding, because humans doing your bidding won’t give you power. By orthogonality, any terrible thing can in principle be someone’s explicit intended target (an aspiration, not just a habit shaped by circumstance), but that’s rare. Usually the terrible things are (a side effect of) an instrumentally useful course of action that has other intended goals, even where in the final analysis the justification doesn’t quite work.
How bad do you think power centralization is? It’s not obvious to me that power centralization guarantees S-risk. In general, I feel pretty confused about how a human god-emperor would behave, especially because many of the reasons that pushed past dictators to draconian rule may not apply when ASI is in the picture. For example, draconian dictators often faced genuine threats to their rule from rival factions, requiring brutal purges and surveillance states to maintain power, or they were stupid / overly paranoid (an ASI advisor could help them have better epistemics), etc. I’m keen to understand your POV better.
I think most people who work on control think that its a necessary intermediary step towards alignment, because aligning ASI will require use of (potentially not yet aligned) AI.
Yes, concentrated power* is bad, and I for one is 100 % always keeping this top of mind.
*EDIT: Too much unchecked power is bad.
But when it comes to control, it’s not at all as simple as you put it. Sure, solving control issues is not enough, but it is not bad on its own either.
First, S-risks from rogue AI seem just as likely, so why would control be a worse outcome? Maybe I misunderstand. If so, you should be more clear what you mean
Secondly, and more importantly, control problems need to be solved even for current and human-level AIs.
Thirdly, if we fear SI (ASI), then having control solutions apolied to its progenitors can buy precious time.
Fourth, you can go for control solutions pre-SI that are decentralized.
An idea I wrote about recently is something as simple as batteries. (It illustrates the point easily.) You can have one battery reliance. Or 100. You can put batteries in APIs, critical GPU clocks, etc. in various data centers, and have that service be operated by local authorities.
Control solutions are factors in a game.
The end.
PS. Sometimes here it seems people unconsciously have belief in belief, that there is just one or two outcomes and one or two solutions, and that everything will resolve in one or two steps. Black and white thinking, in other words. We must watch out for this fallacy and remain vigilant.
I think it leads to S-risks. I think people will remain in charge and use AI as a power-amplifier. The people most likely to end up with power like having power. They like having control over other people and dominating them. This is completely apparent if you spend the (unpleasant) time reading the Epstein documents that the House has released. We need societal and governmental reform before we even think about playing with any of this technology.
The answer to the world’s problems doesn’t rely on a bunch of individuals who are good at puzzles solving a puzzle and then we get utopia. It involves people recognizing the humanity of everyone around them and working on societal and governmental reform. And sure this stuff sounds like a long-shot but we’ve got to try. I wish I had a less vague answer but I don’t.
I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it. The people who work at the top labs make me ashamed to be human. It’s a shandah.
This probably won’t make a difference, but I’ll write this anyways. If you’re working on AI-control, do you trust the people who end up in charge of the technology to wield it well? If you don’t, why are you working on AI control?
I don’t understand how working on “AI control” here is any worse than working on AI alignment (I’m assuming you don’t feel the same about alignment since you don’t mention it).
In my mind, two different ways AI could cause bad things to happen are: (1) misuse: people use the AI use it for bad things, and (2) misalignment: regardless of anyone’s intent, the AI does bad things of its own accord.
Both seem bad. Alignment research and control are both ways to address misalignment problems, I don’t see how they differ for the purposes of your argument (though maybe I’m failing to understand your argument).
Addressing misalignment slightly increases people’s ability to misuse AI, but I think the effect is fairly small and outweighed by the benefit of decreasing the odds a misaligned AI takes catastrophic actions.
It’s not. Alignment is de facto capabilities (principal agent problem makes aligned employees more economically valuable) and unless we have a surefire way to ensure that the AI is aligned to some “universal,” or even cultural, values, it’ll be aligned by default to Altman, Amodei, et. al.
We don’t know of an alignment target that everyone can agree on, so solving alignment pretty much guarantees misuse by at least some people’s lights.
I mean “not solving alignment” pretty much guarantees misuse by everyone’s lights? (In both cases conditional on building ASI)
It pretty much guarantees extinction, but people can have different opinions on how bad that is relative to disempowerment, S-risks, etc.
Most s-risk scenarios vaguely analogous to historical situations don’t happen in a post-AGI world, because there humans aren’t useful for anything, either economically or in terms of maintaining power (unlike how they were throughout human history). It’s not useful for the entities in power to do any of the things with traditionally terrible side effects.
Absence of feedback loops for treating people well (at the level of humanity as a whole) is its own problem, but it’s a distinct kind of problem. It doesn’t necessarily settle poorly (at the level of individuals and smaller communities) in a world with radical abundance, if indeed even a tiny fraction of the global resources gets allocated to the future of humanity, which is the hard part to ensure.
I might be misunderstanding, but doesn’t this sort of assume that all tyranny is purely about resources?
No matter the level of abundance, its not clear that this makes power any less appealing to the power hungry, or suffering any less enjoyable to the sadists. So I don’t see why power-centralisation in the wrong hands would not be a problem in a post-AGI world.
Power-centralisation in a post-AGI world is not about wielding humans, unlike in a pre-AGI world. Power is no longer power over humans doing your bidding, because humans doing your bidding won’t give you power. By orthogonality, any terrible thing can in principle be someone’s explicit intended target (an aspiration, not just a habit shaped by circumstance), but that’s rare. Usually the terrible things are (a side effect of) an instrumentally useful course of action that has other intended goals, even where in the final analysis the justification doesn’t quite work.
How bad do you think power centralization is? It’s not obvious to me that power centralization guarantees S-risk. In general, I feel pretty confused about how a human god-emperor would behave, especially because many of the reasons that pushed past dictators to draconian rule may not apply when ASI is in the picture. For example, draconian dictators often faced genuine threats to their rule from rival factions, requiring brutal purges and surveillance states to maintain power, or they were stupid / overly paranoid (an ASI advisor could help them have better epistemics), etc. I’m keen to understand your POV better.
I think most people who work on control think that its a necessary intermediary step towards alignment, because aligning ASI will require use of (potentially not yet aligned) AI.
I partly agree in spirit.
Yes, concentrated power* is bad, and I for one is 100 % always keeping this top of mind.
*EDIT: Too much unchecked power is bad.
But when it comes to control, it’s not at all as simple as you put it. Sure, solving control issues is not enough, but it is not bad on its own either.
First, S-risks from rogue AI seem just as likely, so why would control be a worse outcome? Maybe I misunderstand. If so, you should be more clear what you mean
Secondly, and more importantly, control problems need to be solved even for current and human-level AIs.
Thirdly, if we fear SI (ASI), then having control solutions apolied to its progenitors can buy precious time.
Fourth, you can go for control solutions pre-SI that are decentralized.
An idea I wrote about recently is something as simple as batteries. (It illustrates the point easily.) You can have one battery reliance. Or 100. You can put batteries in APIs, critical GPU clocks, etc. in various data centers, and have that service be operated by local authorities.
Control solutions are factors in a game.
The end.
PS. Sometimes here it seems people unconsciously have belief in belief, that there is just one or two outcomes and one or two solutions, and that everything will resolve in one or two steps. Black and white thinking, in other words. We must watch out for this fallacy and remain vigilant.
What do you think is realistic if alignment is possible? Would the large corporations make a loving machine or a money-and-them-aligned machine?
I think it leads to S-risks. I think people will remain in charge and use AI as a power-amplifier. The people most likely to end up with power like having power. They like having control over other people and dominating them. This is completely apparent if you spend the (unpleasant) time reading the Epstein documents that the House has released. We need societal and governmental reform before we even think about playing with any of this technology.
The answer to the world’s problems doesn’t rely on a bunch of individuals who are good at puzzles solving a puzzle and then we get utopia. It involves people recognizing the humanity of everyone around them and working on societal and governmental reform. And sure this stuff sounds like a long-shot but we’ve got to try. I wish I had a less vague answer but I don’t.