Control is super obvious and not new conceptually; rather it’s a bit new that someone is actually trying to do the faffy thing of making something maybe work. I think it’s pretty likely they’d be doing it anyway?
Counterpoint: companies as group actors (in spite of intelligent and even caring constituent humans) are mostly myopic and cut as many corners as possible by default (either due to vicious leadership, corporate myopia, or (perceived) race incentives), so maybe even super obvious things get skipped without external parties picking up the slack?
The same debate could perhaps be had about dangerous capability evaluations.
Even though the basic ideas are kind of obvious, I think that us thinking them through and pushing on them has made a big difference in what companies are planning to do.
Mostly AI companies researching AI control and planning to some extent to adopt it (e.g. see the GDM safety plan).
Mostly unfiltered blurting
Counterfactual?
Control is super obvious and not new conceptually; rather it’s a bit new that someone is actually trying to do the faffy thing of making something maybe work. I think it’s pretty likely they’d be doing it anyway?
Counterpoint: companies as group actors (in spite of intelligent and even caring constituent humans) are mostly myopic and cut as many corners as possible by default (either due to vicious leadership, corporate myopia, or (perceived) race incentives), so maybe even super obvious things get skipped without external parties picking up the slack?
The same debate could perhaps be had about dangerous capability evaluations.
Even though the basic ideas are kind of obvious, I think that us thinking them through and pushing on them has made a big difference in what companies are planning to do.