for example, maybe you put them in the Business Simulator and they learn to build extremely successful companies. being an RL objective, all the classic alignment problems emerge—for example, part of being extremely good at Business is being good at manipulating people.
This post closely matches my mental model (I’ve used the same analogy with a “Y-Combinator Simulator” and was devestated to learn YC-Bench was not environments like this).
Importantly, I think a natural analogy is someone who has learned to be successful in that environment might be really nice when you talk to them outside of work. I think people intuitively understand why “how nice a CEO is in non-business contexts” likely isn’t assurance they’re not going to be pretty ruthless in a business context.
This post closely matches my mental model (I’ve used the same analogy with a “Y-Combinator Simulator” and was devestated to learn YC-Bench was not environments like this).
Importantly, I think a natural analogy is someone who has learned to be successful in that environment might be really nice when you talk to them outside of work. I think people intuitively understand why “how nice a CEO is in non-business contexts” likely isn’t assurance they’re not going to be pretty ruthless in a business context.