TurnTrout comments on Credence polls for 26 claims from the 2019 Review

TurnTrout 9 Jan 2021 21:30 UTC
4 points
0
Speaking of claims made in 2019 review posts: Conclusion to ‘Reframing Impact’ (the final post of my nominated Reframing Impact sequence) contains the following claims and credences:
- AU theory describes how people feel impacted. I’m darn confident (95%) that this is true.
- Agents trained by powerful RL algorithms on arbitrary reward signals generally try to take over the world. Confident (75%). The theorems on power-seeking only apply to optimal policies in fully observable environments, which isn’t realistic for real-world agents. However, I think they’re still informative. There are also strong intuitive arguments for power-seeking.
- The catastrophic convergence conjecture is true. Fairly confident (70%). There seems to be a dichotomy between “catastrophe directly incentivized by goal” and “catastrophe indirectly incentivized by goal through power-seeking”, although Vika provides intuitions in the other direction.
- AUP $_{conceptual}$ prevents catastrophe, assuming the catastrophic convergence conjecture. Very confident (85%).
- Some version of AUP solves side effect problems for an extremely wide class of real-world tasks and for subhuman agents. Leaning towards yes (65%).
- For the superhuman case, penalizing the agent for increasing its own AU is better than penalizing the agent for increasing other AUs. Leaning towards yes (65%).
- There exists a simple closed-form solution to catastrophe avoidance (in the outer alignment sense). Pessimistic (35%).
- Bird Concept 9 Jan 2021 23:44 UTC
  2 points
  0
  Parent
  Ey, awesome! I’ve updated the post to include them.