Differ­ent way clas­sifiers can be diverse

Value ex­trap­o­la­tion par­tially re­solves sym­bol grounding

How an alien the­ory of mind might be unlearnable

Find­ing the mul­ti­ple ground truths of CoinRun and image classification

Declus­ter­ing, reclus­ter­ing, and filling in thingspace

Are there al­ter­na­tive to solv­ing value trans­fer and ex­trap­o­la­tion?

$100/​$50 re­wards for good references

Mo­rally un­der­defined situ­a­tions can be deadly

Gen­eral al­ign­ment plus hu­man val­ues, or al­ign­ment via hu­man val­ues?

Beyond the hu­man train­ing dis­tri­bu­tion: would the AI CEO cre­ate al­most-ille­gal ted­dies?

Clas­si­cal sym­bol ground­ing and causal graphs

Prefer­ences from (real and hy­po­thet­i­cal) psy­chol­ogy papers

Force neu­ral nets to use mod­els, then de­tect these

AI learns be­trayal and how to avoid it

AI, learn to be con­ser­va­tive, then learn to be less so: re­duc­ing side-effects, learn­ing pre­served fea­tures, and go­ing be­yond conservatism

Sig­moids be­hav­ing badly: arXiv paper

Im­mo­bile AI makes a move: anti-wire­head­ing, on­tol­ogy change, and model splintering

Re­ward splin­ter­ing as re­verse of interpretability

What are bi­ases, any­way? Mul­ti­ple type signatures

What does GPT-3 un­der­stand? Sym­bol ground­ing and Chi­nese rooms

