RSS

Stuart_Armstrong(Stuart Armstrong)

Karma: 17,171

Large lan­guage mod­els can provide “nor­ma­tive as­sump­tions” for learn­ing hu­man preferences

Stuart_Armstrong2 Jan 2023 19:39 UTC
29 points
12 comments3 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

12 Dec 2022 22:09 UTC
20 points
2 comments3 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

6 Dec 2022 19:54 UTC
167 points
77 comments9 min readLW link

Bench­mark for suc­cess­ful con­cept ex­trap­o­la­tion/​avoid­ing goal misgeneralization

Stuart_Armstrong4 Jul 2022 20:48 UTC
80 points
12 comments4 min readLW link

Value ex­trap­o­la­tion vs Wireheading

Stuart_Armstrong17 Jun 2022 15:02 UTC
16 points
1 comment1 min readLW link

Ge­or­gism, in theory

Stuart_Armstrong15 Jun 2022 15:20 UTC
38 points
21 comments4 min readLW link

How to get into AI safety research

Stuart_Armstrong18 May 2022 18:05 UTC
44 points
7 comments1 min readLW link

GPT-3 and con­cept extrapolation

Stuart_Armstrong20 Apr 2022 10:39 UTC
19 points
28 comments1 min readLW link

Con­cept ex­trap­o­la­tion: key posts

Stuart_Armstrong19 Apr 2022 10:01 UTC
12 points
2 comments1 min readLW link

AIs should learn hu­man prefer­ences, not biases

Stuart_Armstrong8 Apr 2022 13:45 UTC
10 points
1 comment1 min readLW link

Differ­ent per­spec­tives on con­cept extrapolation

Stuart_Armstrong8 Apr 2022 10:42 UTC
43 points
7 comments5 min readLW link

Value ex­trap­o­la­tion, con­cept ex­trap­o­la­tion, model splintering

Stuart_Armstrong8 Mar 2022 22:50 UTC
14 points
1 comment2 min readLW link

[Link] Aligned AI AMA

Stuart_Armstrong1 Mar 2022 12:01 UTC
18 points
0 comments1 min readLW link

More GPT-3 and sym­bol grounding

Stuart_Armstrong23 Feb 2022 18:30 UTC
21 points
7 comments3 min readLW link

Why I’m co-found­ing Aligned AI

Stuart_Armstrong17 Feb 2022 19:55 UTC
93 points
54 comments3 min readLW link

Differ­ent way clas­sifiers can be diverse

Stuart_Armstrong17 Jan 2022 16:30 UTC
10 points
5 comments2 min readLW link

Value ex­trap­o­la­tion par­tially re­solves sym­bol grounding

Stuart_Armstrong12 Jan 2022 16:30 UTC
24 points
10 comments1 min readLW link

How an alien the­ory of mind might be unlearnable

Stuart_Armstrong3 Jan 2022 11:16 UTC
26 points
35 comments5 min readLW link

Find­ing the mul­ti­ple ground truths of CoinRun and image classification

Stuart_Armstrong8 Dec 2021 18:13 UTC
15 points
3 comments2 min readLW link

Declus­ter­ing, reclus­ter­ing, and filling in thingspace

Stuart_Armstrong6 Dec 2021 20:53 UTC
16 points
6 comments3 min readLW link