I think its quite unlikely that funding certain kinds of essential AI safety research leads you to more profitable AI.
Namely mechinterp, preventing stuff like scheming. Not all AI safety research is aimed at getting the user to follow a prompt, yet the research may be very important for stuff like existential risk.
The opportunity cost is funding research into how you can make your model more engaging, performant or cheaper. I would be suprised if these things aren’t way more effective for your dollar.
A breakthrough in a model’s benchmark performance or in training/inferencing costs would usually be more commercially useful to an AI company than a breakthrough in alignment, but compared to alignment, they’re higher-hanging fruit due to the larger amounts of work that have already been done on model performance and efficiency. Alignment research will sometimes be more cost-effective for an AI company, especially for companies that aren’t big enough to do frontier-scale training runs and have to compete on some other axis.
I disagree about increasing engagingness being more commercially useful for AI companies than increasing alignment. In terms of potential future revenues, the big bucks are in agentic tool-use systems that are sold B2B (e.g., to automate office work), not in consumer-facing systems like chatbots. For B2B tool use systems, engagingness doesn’t matter but alignment does. And this relevance of alignment includes avoiding failure modes like scheming.
I think this is true to an extent. But not fully.
I think its quite unlikely that funding certain kinds of essential AI safety research leads you to more profitable AI.
Namely mechinterp, preventing stuff like scheming. Not all AI safety research is aimed at getting the user to follow a prompt, yet the research may be very important for stuff like existential risk.
The opportunity cost is funding research into how you can make your model more engaging, performant or cheaper. I would be suprised if these things aren’t way more effective for your dollar.
A breakthrough in a model’s benchmark performance or in training/inferencing costs would usually be more commercially useful to an AI company than a breakthrough in alignment, but compared to alignment, they’re higher-hanging fruit due to the larger amounts of work that have already been done on model performance and efficiency. Alignment research will sometimes be more cost-effective for an AI company, especially for companies that aren’t big enough to do frontier-scale training runs and have to compete on some other axis.
I disagree about increasing engagingness being more commercially useful for AI companies than increasing alignment. In terms of potential future revenues, the big bucks are in agentic tool-use systems that are sold B2B (e.g., to automate office work), not in consumer-facing systems like chatbots. For B2B tool use systems, engagingness doesn’t matter but alignment does. And this relevance of alignment includes avoiding failure modes like scheming.