The frontier labs have certainly succeeded at aligning their models. LLMs have achieved a level of alignment people wouldn’t have dreamed of 10 years ago.
Now labs are running into issues with the reasoning models, but this doesn’t at all seem insurmountable.
Contemporary AI models are not “aligned” in any sense that would help the slightest bit against a superintelligence. You need stronger guardrails against stronger AI capabilities, and current “alignment” doesn’t even prevent stuff like ChatGPT’s recent sycophancy, or jailbreaking.
The frontier labs have certainly succeeded at aligning their models. LLMs have achieved a level of alignment people wouldn’t have dreamed of 10 years ago.
Now labs are running into issues with the reasoning models, but this doesn’t at all seem insurmountable.
Contemporary AI models are not “aligned” in any sense that would help the slightest bit against a superintelligence. You need stronger guardrails against stronger AI capabilities, and current “alignment” doesn’t even prevent stuff like ChatGPT’s recent sycophancy, or jailbreaking.