I don’t understand why people rave so much about Claude Code etc., nor how they really use these agents. The problem is not capability—sure, today agents can go far without stumbling or losing the plot. The problem is that they will go not in the direction I want.
It’s because my product vision, architectural vision, and code quality “functions” are complex: very tedious to express in CLAUDE/AGENTS .md, and often hardly expressible in language at all. “I know it when I see it.” Hence keeping agent “on a short leash” (Karpathy)--in Cursor.
This makes me think that at least in coding (also, probably some other types of engineering, design, soon perhaps content creation, perhaps deep research, etc.) agents are hobbled by alignment, not capability. I predict that in the next few months, “agentic alignment” concept will be taken over by AI engineers from AI safety, and will become a trendy topic/area of focus in AI engineering, like “context engineering” or “memory” are today.
When agentic alignment is largely solved (likely with a mix of specific model post-training and harness), agents will be unhobbled in a big way in engineering, research, business, etc., akin to how RLHF (“alignment technique”) unhobbled LLM chat bots.
I don’t understand why people rave so much about Claude Code etc., nor how they really use these agents. The problem is not capability—sure, today agents can go far without stumbling or losing the plot. The problem is that they will go not in the direction I want.
It’s because my product vision, architectural vision, and code quality “functions” are complex: very tedious to express in CLAUDE/AGENTS .md, and often hardly expressible in language at all. “I know it when I see it.” Hence keeping agent “on a short leash” (Karpathy)--in Cursor.
This makes me think that at least in coding (also, probably some other types of engineering, design, soon perhaps content creation, perhaps deep research, etc.) agents are hobbled by alignment, not capability. I predict that in the next few months, “agentic alignment” concept will be taken over by AI engineers from AI safety, and will become a trendy topic/area of focus in AI engineering, like “context engineering” or “memory” are today.
When agentic alignment is largely solved (likely with a mix of specific model post-training and harness), agents will be unhobbled in a big way in engineering, research, business, etc., akin to how RLHF (“alignment technique”) unhobbled LLM chat bots.