Coming back >2.5 years later to say this is among the most helpful pieces of AI writing I’ve ever read—I remember it being super clarifying at the time, and I still link people to it, cite it in conversation, and use similar analogies. (Even though I now also caveat it with ”...but also maybe really sophisticated agents will actively seek it, because the training environment might reward it, and maybe they’ll ‘experience’ something like fear/pain/etc for things correlated with negative reward, if they experience things...”) Thank you for writing it up!!
Coming back >2.5 years later to say this is among the most helpful pieces of AI writing I’ve ever read—I remember it being super clarifying at the time, and I still link people to it, cite it in conversation, and use similar analogies. (Even though I now also caveat it with ”...but also maybe really sophisticated agents will actively seek it, because the training environment might reward it, and maybe they’ll ‘experience’ something like fear/pain/etc for things correlated with negative reward, if they experience things...”) Thank you for writing it up!!