RSS

mattmacdermott

Karma: 1,222

Is in­stru­men­tal con­ver­gence a thing for virtue-driven agents?

mattmacdermottApr 2, 2025, 3:59 AM
33 points
37 comments2 min readLW link

Val­i­dat­ing against a mis­al­ign­ment de­tec­tor is very differ­ent to train­ing against one

mattmacdermottMar 4, 2025, 3:41 PM
33 points
4 comments4 min readLW link