Out of curiosity, have your takes here changed much lately?
I think the o3+ saga has updated me a small-medium amount toward “companies will just deploy misaligned AIs and consumers will complain but use them anyway” (evidenced by deployment of models that blatantly lie from multiple companies) and “slightly misaligned AI systems that are very capable will likely be preferred over more aligned systems that are less capable” (evidenced by many consumers, including myself, switching over to using these more capable lying models).
I also think companies will work a bit to reduce reward hacking and blatant lying, and they will probably succeed to some extent (at least for noticeable, everyday problems), in the next few months. That, combined with OpenAI’s rollback of 4o sycophancy, will perhaps make it seem like companies are responsive to consumer pressure here. But I think the situation is overall a small-medium update against consumer pressure doing the thing you might hope here.
Side point: Noting one other dynamic: advanced models are probably not going to act misaligned in everyday use cases (that consumers have an incentive to care about, though again revealed preference is less clear), even if they’re misaligned. That’s the whole deceptive alignment thing. So I think it does seem more like the ESG case?
Though a small update as I don’t think a default gov-led project would be much better on this front. (Though a well designed one led by responsible ppl could be way way better of course.)
And I’ve had a few other convos that made me more worried about race dynamics.
Still think two projects is prob better than one overall, but two probbetter than six
Noting one other dynamic: advanced models are probably not going to act misaligned in everyday use cases (that consumers have an incentive to care about, though again revealed preference is less clear), even if they’re misaligned. That’s the whole deceptive alignment thing.
Agreed, but customers would also presumably be a bit worried that the AI would rarely cross them and steal their stuff or whatever which is somewhat different. Like there wouldn’t be a feedback loop toward this where we necessarily see a bunch of early failures, but if we’ve seen a bunch of cases where scheming powerseeking AIs in the lab execute well crafted misaligned plans, then customers might want an AI which is less likely to do this.
Out of curiosity, have your takes here changed much lately?
I think the o3+ saga has updated me a small-medium amount toward “companies will just deploy misaligned AIs and consumers will complain but use them anyway” (evidenced by deployment of models that blatantly lie from multiple companies) and “slightly misaligned AI systems that are very capable will likely be preferred over more aligned systems that are less capable” (evidenced by many consumers, including myself, switching over to using these more capable lying models).
I also think companies will work a bit to reduce reward hacking and blatant lying, and they will probably succeed to some extent (at least for noticeable, everyday problems), in the next few months. That, combined with OpenAI’s rollback of 4o sycophancy, will perhaps make it seem like companies are responsive to consumer pressure here. But I think the situation is overall a small-medium update against consumer pressure doing the thing you might hope here.
Side point: Noting one other dynamic: advanced models are probably not going to act misaligned in everyday use cases (that consumers have an incentive to care about, though again revealed preference is less clear), even if they’re misaligned. That’s the whole deceptive alignment thing. So I think it does seem more like the ESG case?
Agree with those updates.
Though a small update as I don’t think a default gov-led project would be much better on this front. (Though a well designed one led by responsible ppl could be way way better of course.)
And I’ve had a few other convos that made me more worried about race dynamics.
Still think two projects is prob better than one overall, but two probbetter than six
Agreed, but customers would also presumably be a bit worried that the AI would rarely cross them and steal their stuff or whatever which is somewhat different. Like there wouldn’t be a feedback loop toward this where we necessarily see a bunch of early failures, but if we’ve seen a bunch of cases where scheming powerseeking AIs in the lab execute well crafted misaligned plans, then customers might want an AI which is less likely to do this.