Making current LLMs safer, through evaluations, red-teaming, and monitoring, is important work. But it’s also work that any AI company deploying these systems needs to do anyway. It has commercial incentive.
Does it? It seems to me like current incentives point towards releasing the next big model as quickly as possible.
I think the main point I disagree with is the contrast between working on current LLMs and existential risks. I think (and I could be wildly off) that it’ll largely be the same folks and orgs who’re working on making current LLMs safer who’ll end up working on aligning superintelligent systems. Mostly by trying to keep up as the models scale and evolve. This is not to say that we don’t need work that looks ahead, just that there will be a lot of lessons to be learned working with current models that will carry over to x-risk—both technical and perhaps more importantly, organizational (e.g. working with labs).
Does it? It seems to me like current incentives point towards releasing the next big model as quickly as possible.
I think the main point I disagree with is the contrast between working on current LLMs and existential risks. I think (and I could be wildly off) that it’ll largely be the same folks and orgs who’re working on making current LLMs safer who’ll end up working on aligning superintelligent systems. Mostly by trying to keep up as the models scale and evolve. This is not to say that we don’t need work that looks ahead, just that there will be a lot of lessons to be learned working with current models that will carry over to x-risk—both technical and perhaps more importantly, organizational (e.g. working with labs).