Question: what fraction of work should prioritize the gradual disempowerment risk, and what fraction of work should prioritize the treacherous turn risk? (Guesstimate)
Question 2: what is your response to this argument?
The main driving force of gradual disempowerment seems to be “societal inevitability,” and “millions of people seeing the problem in front of their eyes but unable to convince society to take action.”
If that is the main problem, shouldn’t you assume this problem to be even worse right now? Right now the AI safety community is extremely tiny ($0.1 billion/year, or 000.1% of the world), and the problem appears even harder to seriously believe than it will be. It is also harder to find solutions before seeing the problem.
One analogy is a tiny group of people in 1990, who could foresee democratic backsliding and fake news happening around 2020. Assuming they have $0.1 billion/year and a mediocre reputation, what is the probability they can use their earliness to fix these problems (given that people in 2020 were unable to fix them)?
Although I thought of this argument, I don’t think it’s necessarily correct and my intuition about it is very fuzzy and uncertain. I just want to hear your response.
Great question. I think treacherous turn risk is still under-funded in absolute terms. And gradual disempowerment is much less shovel-ready as a discipline.
I think there are two reasons why maybe this question isn’t so important to answer: 1) The kinds of skills required might be somewhat disjoint. 2) Gradual disempowerment is perhaps a subset or extension of the alignment problem. As Ryan Greenblatt and others point out: at some point, agents aligned to one person or organization will also naturally start working on this problem at the object level for their principals.
Question: what fraction of work should prioritize the gradual disempowerment risk, and what fraction of work should prioritize the treacherous turn risk? (Guesstimate)
Question 2: what is your response to this argument?
The main driving force of gradual disempowerment seems to be “societal inevitability,” and “millions of people seeing the problem in front of their eyes but unable to convince society to take action.”
If that is the main problem, shouldn’t you assume this problem to be even worse right now? Right now the AI safety community is extremely tiny ($0.1 billion/year, or 000.1% of the world), and the problem appears even harder to seriously believe than it will be. It is also harder to find solutions before seeing the problem.
One analogy is a tiny group of people in 1990, who could foresee democratic backsliding and fake news happening around 2020. Assuming they have $0.1 billion/year and a mediocre reputation, what is the probability they can use their earliness to fix these problems (given that people in 2020 were unable to fix them)?
Although I thought of this argument, I don’t think it’s necessarily correct and my intuition about it is very fuzzy and uncertain. I just want to hear your response.
Great question. I think treacherous turn risk is still under-funded in absolute terms. And gradual disempowerment is much less shovel-ready as a discipline.
I think there are two reasons why maybe this question isn’t so important to answer:
1) The kinds of skills required might be somewhat disjoint.
2) Gradual disempowerment is perhaps a subset or extension of the alignment problem. As Ryan Greenblatt and others point out: at some point, agents aligned to one person or organization will also naturally start working on this problem at the object level for their principals.