yams comments on LLMs are badly misaligned

yams 6 Oct 2025 17:59 UTC
3 points
1
I’m not sure how useful I find hypotheticals of the form ‘if Claude had its current values [to the extent we can think of Claude as a coherent enough agent to have consistent values, etc etc], but were much more powerful, what would happen?’ A more powerful model would be likely to have/evince different values from a less powerful model, even if they were similar architectures subjected to similar training schema. Less powerful models also don’t need to be as well-aligned in practice, if we’re thinking of each deployment as a separate decision-point, since they’re of less consequence.
I understand that you’re in-part responding to the hypothetical seeded by Nina’s rhetorical line, but I’m not sure how useful it is when she does it, either.
- Joe Rogero 6 Oct 2025 22:51 UTC
  1 point
  0
  Parent
  Yeah, I’m mostly trying to address the impression that LLMs are ~close to aligned already and thus the problem is keeping them that way, rather than, like, actually solving alignment for AIs in general.