My guess is timelines fall off this hopeful story around:
Things moving too quickly once automated research comes in
Labs go straight for RSI once it becomes available rather than slowing enough
Even if they vaguely try to slow down advanced systems converge towards unhobbling themselves and getting this past our safeguards
The control walls fall too fast and sometimes silently
Our interpretability tools don’t hold up through the architectural phase transitions, etc.
Agent Foundations being too tricky to pass to AI research teams to do it properly in time
Labs don’t have enough in-house philosophical competence to know what to ask their systems or check their answers
AF is hard to verify in a way that defies easy feedback loops
Current AF field likely is missing crucial considerations which will kill us if the AI research systems can’t locate areas of research needed that we haven’t noticed
Very few people have deep skills here (maybe a few dozen by my read)
Nice story, thanks for laying it out like this.
My guess is timelines fall off this hopeful story around:
Things moving too quickly once automated research comes in
Labs go straight for RSI once it becomes available rather than slowing enough
Even if they vaguely try to slow down advanced systems converge towards unhobbling themselves and getting this past our safeguards
The control walls fall too fast and sometimes silently
Our interpretability tools don’t hold up through the architectural phase transitions, etc.
Agent Foundations being too tricky to pass to AI research teams to do it properly in time
Labs don’t have enough in-house philosophical competence to know what to ask their systems or check their answers
AF is hard to verify in a way that defies easy feedback loops
Current AF field likely is missing crucial considerations which will kill us if the AI research systems can’t locate areas of research needed that we haven’t noticed
Very few people have deep skills here (maybe a few dozen by my read)