AI, learn to be con­ser­va­tive, then learn to be less so: re­duc­ing side-effects, learn­ing pre­served fea­tures, and go­ing be­yond conservatism

Sig­moids be­hav­ing badly: arXiv paper

Im­mo­bile AI makes a move: anti-wire­head­ing, on­tol­ogy change, and model splintering

Re­ward splin­ter­ing as re­verse of interpretability

What are bi­ases, any­way? Mul­ti­ple type signatures

What does GPT-3 un­der­stand? Sym­bol ground­ing and Chi­nese rooms

Re­ward splin­ter­ing for AI design

Bayesi­anism ver­sus con­ser­vatism ver­sus Goodhart

Un­der­ly­ing model of an im­perfect morphism

