In light of recent works on automating alignment and AI task horizons, I’m (re)linking this brief presentation of mine from last year, which I think stands up pretty well and might have gotten less views than ideal:
Towards automated AI safety research
In light of recent works on automating alignment and AI task horizons, I’m (re)linking this brief presentation of mine from last year, which I think stands up pretty well and might have gotten less views than ideal:
Towards automated AI safety research