RSS

Stuart_Armstrong(Stuart Armstrong)

Karma: 17,671

Align­ment can im­prove gen­er­al­i­sa­tion through more ro­bustly do­ing what a hu­man wants—CoinRun example

Stuart_Armstrong21 Nov 2023 11:41 UTC
68 points
9 comments3 min readLW link

How toy mod­els of on­tol­ogy changes can be misleading

Stuart_Armstrong21 Oct 2023 21:13 UTC
41 points
0 comments2 min readLW link

Differ­ent views of al­ign­ment have differ­ent con­se­quences for im­perfect methods

Stuart_Armstrong28 Sep 2023 16:31 UTC
31 points
0 comments1 min readLW link

Avoid­ing xrisk from AI doesn’t mean fo­cus­ing on AI xrisk

Stuart_Armstrong2 May 2023 19:27 UTC
64 points
7 comments3 min readLW link

What is a defi­ni­tion, how can it be ex­trap­o­lated?

Stuart_Armstrong14 Mar 2023 18:08 UTC
34 points
5 comments7 min readLW link

You’re not a simu­la­tion, ’cause you’re hallucinating

Stuart_Armstrong21 Feb 2023 12:12 UTC
25 points
6 comments1 min readLW link

Large lan­guage mod­els can provide “nor­ma­tive as­sump­tions” for learn­ing hu­man preferences

Stuart_Armstrong2 Jan 2023 19:39 UTC
29 points
12 comments3 min readLW link

Con­cept ex­trap­o­la­tion for hy­poth­e­sis generation

12 Dec 2022 22:09 UTC
20 points
2 comments3 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

6 Dec 2022 19:54 UTC
170 points
85 comments9 min readLW link

Bench­mark for suc­cess­ful con­cept ex­trap­o­la­tion/​avoid­ing goal misgeneralization

Stuart_Armstrong4 Jul 2022 20:48 UTC
82 points
12 comments4 min readLW link

Value ex­trap­o­la­tion vs Wireheading

Stuart_Armstrong17 Jun 2022 15:02 UTC
16 points
1 comment1 min readLW link

Ge­or­gism, in theory

Stuart_Armstrong15 Jun 2022 15:20 UTC
40 points
22 comments4 min readLW link

How to get into AI safety research

Stuart_Armstrong18 May 2022 18:05 UTC
44 points
7 comments1 min readLW link

GPT-3 and con­cept extrapolation

Stuart_Armstrong20 Apr 2022 10:39 UTC
19 points
27 comments1 min readLW link

Con­cept ex­trap­o­la­tion: key posts

Stuart_Armstrong19 Apr 2022 10:01 UTC
13 points
2 comments1 min readLW link

AIs should learn hu­man prefer­ences, not biases

Stuart_Armstrong8 Apr 2022 13:45 UTC
10 points
0 comments1 min readLW link

Differ­ent per­spec­tives on con­cept extrapolation

Stuart_Armstrong8 Apr 2022 10:42 UTC
48 points
8 comments5 min readLW link1 review

Value ex­trap­o­la­tion, con­cept ex­trap­o­la­tion, model splintering

Stuart_Armstrong8 Mar 2022 22:50 UTC
16 points
1 comment2 min readLW link

[Link] Aligned AI AMA

Stuart_Armstrong1 Mar 2022 12:01 UTC
18 points
0 comments1 min readLW link

More GPT-3 and sym­bol grounding

Stuart_Armstrong23 Feb 2022 18:30 UTC
21 points
7 comments3 min readLW link