aog comments on Tyler Cowen’s challenge to develop an ‘actual mathematical model’ for AI X-Risk

aog 16 May 2023 18:03 UTC
1 point
0
I built a preliminary model here: https://colab.research.google.com/drive/108YuOmrf18nQTOQksV30vch6HNPivvX3?authuser=2

It’s definitely too simple to treat as strong evidence, but it shows some interesting dynamics. For example, levels of alignment rise at first, then rapidly falling when AI deception skills exceed human oversight capacity. I sent it to Tyler and he agreed — cool, but not actual evidence.

If anyone wants to work on improving this, feel free to reach out!