RSS

Rauno Arike

Karma: 94

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
11 points
0 comments5 min readLW link

Ex­plor­ing the Lot­tery Ticket Hypothesis

Rauno Arike25 Apr 2023 20:06 UTC
50 points
3 comments11 min readLW link

[Question] Re­quest for Align­ment Re­search Pro­ject Recommendations

Rauno Arike3 Sep 2022 15:29 UTC
10 points
2 comments1 min readLW link

Coun­ter­ing ar­gu­ments against work­ing on AI safety

Rauno Arike20 Jul 2022 18:23 UTC
7 points
2 comments7 min readLW link

Clar­ify­ing the con­fu­sion around in­ner alignment

Rauno Arike13 May 2022 23:05 UTC
29 points
0 comments11 min readLW link