Sam Marks

Karma: 1,645

[Question] For mRNA vaccines, is (short-term) efficacy really higher after the second dose?

Sam Marks25 Apr 2021 20:21 UTC

27 points

13 comments3 min readLW link

[Book review] Gödel, Escher, Bach: an in-depth explainer

Sam Marks29 Sep 2021 19:03 UTC

98 points

23 comments23 min readLW link 1 review

Movie review: Don’t Look Up

Sam Marks4 Jan 2022 20:16 UTC

35 points

6 comments11 min readLW link

2022 ACX predictions: market prices

Sam Marks6 Mar 2022 6:24 UTC

21 points

2 comments5 min readLW link

If you’re very optimistic about ELK then you should be optimistic about outer alignment

Sam Marks27 Apr 2022 19:30 UTC

17 points

8 comments3 min readLW link

Proxy misspecification and the capabilities vs. value learning race

Sam Marks16 May 2022 18:58 UTC

23 points

3 comments4 min readLW link

Safety considerations for online generative modeling

Sam Marks7 Jul 2022 18:31 UTC

42 points

9 comments14 min readLW link

Caution when interpreting Deepmind’s In-context RL paper

Sam Marks1 Nov 2022 2:42 UTC

103 points

6 comments4 min readLW link

Recommend HAIST resources for assessing the value of RLHF-related alignment research

Sam Marks and Xander Davies

5 Nov 2022 20:58 UTC

26 points

9 comments3 min readLW link

AGISF adaptation for in-person groups

Sam Marks, Xander Davies and Richard_Ngo

13 Jan 2023 3:24 UTC

44 points

2 comments3 min readLW link

Turning off lights with model editing

Sam Marks12 May 2023 20:25 UTC

67 points

5 comments2 min readLW link

(arxiv.org)

Thoughts on open source AI

Sam Marks3 Nov 2023 15:35 UTC

52 points

17 comments10 min readLW link

Some open-source dictionaries and dictionary learning infrastructure

Sam Marks5 Dec 2023 6:05 UTC

45 points

7 comments5 min readLW link

What’s up with LLMs representing XORs of arbitrary features?

Sam Marks3 Jan 2024 19:44 UTC

154 points

61 comments16 min readLW link

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

Sam Marks18 Apr 2024 16:17 UTC

75 points

1 comment12 min readLW link