Matrice Jacobine comments on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine 18 Feb 2025 13:02 UTC
5 points
2
It does seem that the LLMs are subject to deontological constraints (Figure 19), but I think that in fact makes the paper’s framing of questions as evaluation between world-states instead of specific actions more apt at evaluating whether LLMs have utility functions over world-states behind those deontological constraints. Your reinterpretation of how those world-state descriptions are actually interpreted by LLMs is an important remark and certainly change the conclusions we can make from this article regarding to implicit bias, but (unless you debunk those results) the most important discoveries of the paper from my point of view, that LLMs have utility functions over world-states which are 1/ consistent across LLMs, 2/ more and more consistent as model size increase, and 3/ can be subject to mechanical interpretability methods, remain the same.