I feel like going into this will predictably cause a demon thread, but as an obvious pointer that is hopefully on the less controversial side, my sense is the vibes of “how hard is AI Alignment” and “how much are we on track to build superintelligent systems safely” are really quite a lot downstream of the incentives frontier labs face, since >50% of talent-weighted people in AI safety work at the frontier labs.
This seems quite bad to me, and it is quite plausible the world would be better of if instead of there being a field of “AI Safety” that has this much of a central vibe, and is this directly exposed to some extremely strong incentives, it was more the case that a bunch of existing fields are thinking about this on their own term, probably overall using worse epistemics and tools less well-suited to the task, but in a way that I think by-default would be much less hijack-able.
(There are also many other things of this kind going on but I feel like those will be more controversial)
Edit: Or alternatively, that the “field of AI Safety” had something more akin to journals or membership or courts or other forms of social organization that could bring deliberation and intention to how all of these vibes are shifting. I do think right now I am thinking more in the direction of “maybe we should have conquered less”, but I am also sympathetic to arguments of the form “but maybe we could just defend more?”, I am just somewhat burned out on those dimensions.
I hear you about not wanting to start a demon thread. At the same time, I also feel like this answer is too nonspecific for people who either want to help out or give advice. What part of your work at lightcone feels like it’s contributing to AI lab control of safety & alignment research?
At risk of seeming contrarian, I think there’s a worrying tendency within the rationalist community to look at anything broadly framed as being problematically underspecified. It’s somewhat ironic, considering that a huge focus within the field of AI safety is how dangerous it can be to build a model on bad assumptions. There’s a lot of eagerness to reduce problems down to narrowly-defined, actionable metrics that I think stems from a discomfort with the ambiguity that arises while trying to hold a lot of complexity at once.
Uncertainty seems to make rationalists disproportionately nervous, to the degree that, when confronted with concerns about Moloch/Goodhart’s Law, the response is often “But can you give us a real answer please?”, which I think is charming in an exasperating sort of way.
This is so true. I find the assumption that the required specifications are even knowable exhausting. We do not have perfect information. People be just wrong sometimes.
I feel like going into this will predictably cause a demon thread, but as an obvious pointer that is hopefully on the less controversial side, my sense is the vibes of “how hard is AI Alignment” and “how much are we on track to build superintelligent systems safely” are really quite a lot downstream of the incentives frontier labs face, since >50% of talent-weighted people in AI safety work at the frontier labs.
This seems quite bad to me, and it is quite plausible the world would be better of if instead of there being a field of “AI Safety” that has this much of a central vibe, and is this directly exposed to some extremely strong incentives, it was more the case that a bunch of existing fields are thinking about this on their own term, probably overall using worse epistemics and tools less well-suited to the task, but in a way that I think by-default would be much less hijack-able.
(There are also many other things of this kind going on but I feel like those will be more controversial)
Edit: Or alternatively, that the “field of AI Safety” had something more akin to journals or membership or courts or other forms of social organization that could bring deliberation and intention to how all of these vibes are shifting. I do think right now I am thinking more in the direction of “maybe we should have conquered less”, but I am also sympathetic to arguments of the form “but maybe we could just defend more?”, I am just somewhat burned out on those dimensions.
I hear you about not wanting to start a demon thread. At the same time, I also feel like this answer is too nonspecific for people who either want to help out or give advice. What part of your work at lightcone feels like it’s contributing to AI lab control of safety & alignment research?
At risk of seeming contrarian, I think there’s a worrying tendency within the rationalist community to look at anything broadly framed as being problematically underspecified. It’s somewhat ironic, considering that a huge focus within the field of AI safety is how dangerous it can be to build a model on bad assumptions. There’s a lot of eagerness to reduce problems down to narrowly-defined, actionable metrics that I think stems from a discomfort with the ambiguity that arises while trying to hold a lot of complexity at once.
Uncertainty seems to make rationalists disproportionately nervous, to the degree that, when confronted with concerns about Moloch/Goodhart’s Law, the response is often “But can you give us a real answer please?”, which I think is charming in an exasperating sort of way.
This is so true. I find the assumption that the required specifications are even knowable exhausting. We do not have perfect information. People be just wrong sometimes.