Seth Herd comments on Seth Herd’s Shortform

Seth Herd Jul 13, 2024, 8:34 PM
4 points
2
Interesting. I wonder if this perspective is common, and that’s why people rarely bother talking about the prompting portion of aligning LMAs.

I don’t know how to really weigh which is more important. Of course, even having a model reliably follow prompts is a product of tuning (usually RLHF or RLAIF, but there are also RL-free pre-training techniques that work fairly well to accomplish the same end). So its tendency to follow many types of prompts is part of the underlying “personality”.

Whatever their relative strengths, aligning an LMA AGI should employ both tuning and prompting (as well as several other “layers” of alignment techniques), so looking carefully at how these come together within a particular agent architecture would be the game.