You could find a way of proving to the world that your AI is aligned, which other labs can’t replicate, giving you economic advantage.
I don’t expect this to be a very large effect. It feels similar to an argument like “company A will be better on ESG dimensions and therefore more and customers will switch to using it”. Doing a quick review of the literature on that, it seems like there’s a small but notable change in consumer behavior for ESG-labeled products.
It seems quite different to the ESG case. Customers don’t personally benefit from using a company with good ESG. They will benefit from using an aligned AI over a misaligned one.
In the AI space, it doesn’t seem to me like any customers care about OpenAI’s safety team disappearing (except a few folks in the AI safety world).
Again though, customers currently have no selfish reason to care.
In this particular case, I expect the technical argument needed to demonstrate that some family of AI systems are aligned while others are not is a really complicated argument; I expect fewer than 500 people would be able to actually verify such an argument (or the initial “scalable alignment solution”), maybe zero people.
It’s quite common for only a very small number of ppl to have the individual ability to verify a safety case, but many more to defer to their judgement. People may defer to an AISI, or a regulatory agency.
Out of curiosity, have your takes here changed much lately?
I think the o3+ saga has updated me a small-medium amount toward “companies will just deploy misaligned AIs and consumers will complain but use them anyway” (evidenced by deployment of models that blatantly lie from multiple companies) and “slightly misaligned AI systems that are very capable will likely be preferred over more aligned systems that are less capable” (evidenced by many consumers, including myself, switching over to using these more capable lying models).
I also think companies will work a bit to reduce reward hacking and blatant lying, and they will probably succeed to some extent (at least for noticeable, everyday problems), in the next few months. That, combined with OpenAI’s rollback of 4o sycophancy, will perhaps make it seem like companies are responsive to consumer pressure here. But I think the situation is overall a small-medium update against consumer pressure doing the thing you might hope here.
Side point: Noting one other dynamic: advanced models are probably not going to act misaligned in everyday use cases (that consumers have an incentive to care about, though again revealed preference is less clear), even if they’re misaligned. That’s the whole deceptive alignment thing. So I think it does seem more like the ESG case?
Though a small update as I don’t think a default gov-led project would be much better on this front. (Though a well designed one led by responsible ppl could be way way better of course.)
And I’ve had a few other convos that made me more worried about race dynamics.
Still think two projects is prob better than one overall, but two probbetter than six
Noting one other dynamic: advanced models are probably not going to act misaligned in everyday use cases (that consumers have an incentive to care about, though again revealed preference is less clear), even if they’re misaligned. That’s the whole deceptive alignment thing.
Agreed, but customers would also presumably be a bit worried that the AI would rarely cross them and steal their stuff or whatever which is somewhat different. Like there wouldn’t be a feedback loop toward this where we necessarily see a bunch of early failures, but if we’ve seen a bunch of cases where scheming powerseeking AIs in the lab execute well crafted misaligned plans, then customers might want an AI which is less likely to do this.
It seems quite different to the ESG case. Customers don’t personally benefit from using a company with good ESG. They will benefit from using an aligned AI over a misaligned one.
Again though, customers currently have no selfish reason to care.
It’s quite common for only a very small number of ppl to have the individual ability to verify a safety case, but many more to defer to their judgement. People may defer to an AISI, or a regulatory agency.
Out of curiosity, have your takes here changed much lately?
I think the o3+ saga has updated me a small-medium amount toward “companies will just deploy misaligned AIs and consumers will complain but use them anyway” (evidenced by deployment of models that blatantly lie from multiple companies) and “slightly misaligned AI systems that are very capable will likely be preferred over more aligned systems that are less capable” (evidenced by many consumers, including myself, switching over to using these more capable lying models).
I also think companies will work a bit to reduce reward hacking and blatant lying, and they will probably succeed to some extent (at least for noticeable, everyday problems), in the next few months. That, combined with OpenAI’s rollback of 4o sycophancy, will perhaps make it seem like companies are responsive to consumer pressure here. But I think the situation is overall a small-medium update against consumer pressure doing the thing you might hope here.
Side point: Noting one other dynamic: advanced models are probably not going to act misaligned in everyday use cases (that consumers have an incentive to care about, though again revealed preference is less clear), even if they’re misaligned. That’s the whole deceptive alignment thing. So I think it does seem more like the ESG case?
Agree with those updates.
Though a small update as I don’t think a default gov-led project would be much better on this front. (Though a well designed one led by responsible ppl could be way way better of course.)
And I’ve had a few other convos that made me more worried about race dynamics.
Still think two projects is prob better than one overall, but two probbetter than six
Agreed, but customers would also presumably be a bit worried that the AI would rarely cross them and steal their stuff or whatever which is somewhat different. Like there wouldn’t be a feedback loop toward this where we necessarily see a bunch of early failures, but if we’ve seen a bunch of cases where scheming powerseeking AIs in the lab execute well crafted misaligned plans, then customers might want an AI which is less likely to do this.