While I think building safety-adjacent RL envs is worse than most kinds of technical safety work for people who are very high context in AGI safety, I think it’s net positive.
I think it’s a pretty high-variance activity! It’s not that I can’t imagine any kind of RL environment that might make things better, but most of them will just be used to make AIs “more helpful” and serve as generic training data to ascend the capabilities frontier.
Like, yes, there are some more interesting monitor-shaped RL environments, and I would actually be interested in digging into the details of how good or bad some of them would be, but the thing I am expecting here are more like “oh, we made a Wikipedia navigation environment, which reduces hallucinations in AI, which is totally helpful for safety I promise”, when really, I think that is just a straightforward capabilities push.
Like, yes, there are some more interesting monitor-shaped RL environments, and I would actually be interested in digging into the details of how good or bad some of them would be
As part of my startup exploration, I would like to discuss this as well. It would be helpful to clarify my thinking on whether there’s a shape of such a business that could be meaningfully positive. I’ve started reaching out to people who work in the labs to get better context on this. I think it would be good to dig deeper into Evan’s comment on the topic.
I’m going to start a Google Doc, but I would love to talk in person with folks in the Bay about this to ideate and refine it faster.
I think it’s a pretty high-variance activity! It’s not that I can’t imagine any kind of RL environment that might make things better, but most of them will just be used to make AIs “more helpful” and serve as generic training data to ascend the capabilities frontier.
Like, yes, there are some more interesting monitor-shaped RL environments, and I would actually be interested in digging into the details of how good or bad some of them would be, but the thing I am expecting here are more like “oh, we made a Wikipedia navigation environment, which reduces hallucinations in AI, which is totally helpful for safety I promise”, when really, I think that is just a straightforward capabilities push.
As part of my startup exploration, I would like to discuss this as well. It would be helpful to clarify my thinking on whether there’s a shape of such a business that could be meaningfully positive. I’ve started reaching out to people who work in the labs to get better context on this. I think it would be good to dig deeper into Evan’s comment on the topic.
I’m going to start a Google Doc, but I would love to talk in person with folks in the Bay about this to ideate and refine it faster.