I think selling alignment-relevant RL environments to labs is underrated as an x-risk-relevant startup idea. To be clear, x-risk-relevant startups is a pretty restricted search space; I’m not saying that one necessarily should be founding a startup as the best way to address AI x-risk, but just operating under the assumption that we’re optimizing within that space, selling alignment RL environments is definitely the thing I would go for. There’s a market for it, the incentives are reasonable (as long as you are careful and opinionated about only selling environments you think are good for alignment, not just good for capabilities), and it gives you a pipeline for shipping whatever alignment interventions you think are good directly into labs’ training processes. Of course, that’s dependent on you actually having a good idea for how to train models to be more aligned, and that intervention being in the form of something you can sell, but if you can do that, and you can demonstrate that it works, you can just sell it to all the labs, have them all use it, and then hopefully all of their models will now be more aligned. E.g. if you’re excited about character training, you can just replicate it, sell it to all the labs, and then in so doing change how all the labs are training their models.
I think selling alignment-relevant RL environments to labs is underrated as an x-risk-relevant startup idea. To be clear, x-risk-relevant startups is a pretty restricted search space; I’m not saying that one necessarily should be founding a startup as the best way to address AI x-risk, but just operating under the assumption that we’re optimizing within that space, selling alignment RL environments is definitely the thing I would go for. There’s a market for it, the incentives are reasonable (as long as you are careful and opinionated about only selling environments you think are good for alignment, not just good for capabilities), and it gives you a pipeline for shipping whatever alignment interventions you think are good directly into labs’ training processes. Of course, that’s dependent on you actually having a good idea for how to train models to be more aligned, and that intervention being in the form of something you can sell, but if you can do that, and you can demonstrate that it works, you can just sell it to all the labs, have them all use it, and then hopefully all of their models will now be more aligned. E.g. if you’re excited about character training, you can just replicate it, sell it to all the labs, and then in so doing change how all the labs are training their models.
I’m interested in your take on what the differences are here.