Stuart_Armstrong(Stuart Armstrong)

Karma: 15,890

AI, learn to be con­ser­va­tive, then learn to be less so: re­duc­ing side-effects, learn­ing pre­served fea­tures, and go­ing be­yond conservatism

Stuart_Armstrong20 Sep 2021 11:56 UTC
12 points
1 comment3 min readLW link

Sig­moids be­hav­ing badly: arXiv paper

Stuart_Armstrong20 Sep 2021 10:29 UTC
24 points
1 comment1 min readLW link

Im­mo­bile AI makes a move: anti-wire­head­ing, on­tol­ogy change, and model splintering

Stuart_Armstrong17 Sep 2021 15:24 UTC
31 points
3 comments2 min readLW link

Re­ward splin­ter­ing as re­verse of interpretability

Stuart_Armstrong31 Aug 2021 22:27 UTC
10 points
0 comments1 min readLW link

What are bi­ases, any­way? Mul­ti­ple type signatures

Stuart_Armstrong31 Aug 2021 21:16 UTC
11 points
0 comments3 min readLW link

What does GPT-3 un­der­stand? Sym­bol ground­ing and Chi­nese rooms

Stuart_Armstrong3 Aug 2021 13:14 UTC
34 points
14 comments12 min readLW link

Re­ward splin­ter­ing for AI design

Stuart_Armstrong21 Jul 2021 16:13 UTC
20 points
1 comment8 min readLW link

Bayesi­anism ver­sus con­ser­vatism ver­sus Goodhart

Stuart_Armstrong16 Jul 2021 23:39 UTC
14 points
0 comments6 min readLW link

Un­der­ly­ing model of an im­perfect morphism

Stuart_Armstrong16 Jul 2021 13:13 UTC
13 points
0 comments3 min readLW link