Richard Juggins comments on How to specify an alignment target

Richard Juggins 5 May 2025 19:45 UTC
1 point
0
Thank you for your kind words! I’m glad you liked it. Your instruction-following post is a good fit for one of my examples, so I will edit in a link to it.

I agree that alignment is a somewhat awkwardly-used term. I think the original definition relies on AI having quite cleanly defined goals in a way that is probably unrealistic for sufficiently complex systems, and certainly doesn’t apply to LLMs. As a result, it often ends up being approximated to mean something more like directing a set of behavioural tendencies, like trying to teach the AI to always take the appropriate action in any given context. I tend to lean into this latter interpretation.

I haven’t had time to read your other links yet but will take a look!