Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Sam Marks
Karma:
1,645
All
Posts
Comments
New
Top
Old
[Question]
For mRNA vaccines, is (short-term) efficacy really higher after the second dose?
Sam Marks
25 Apr 2021 20:21 UTC
27
points
13
comments
3
min read
LW
link
[Book review] Gödel, Escher, Bach: an in-depth explainer
Sam Marks
29 Sep 2021 19:03 UTC
98
points
23
comments
23
min read
LW
link
1
review
Movie review: Don’t Look Up
Sam Marks
4 Jan 2022 20:16 UTC
35
points
6
comments
11
min read
LW
link
2022 ACX predictions: market prices
Sam Marks
6 Mar 2022 6:24 UTC
21
points
2
comments
5
min read
LW
link
If you’re very optimistic about ELK then you should be optimistic about outer alignment
Sam Marks
27 Apr 2022 19:30 UTC
17
points
8
comments
3
min read
LW
link
Proxy misspecification and the capabilities vs. value learning race
Sam Marks
16 May 2022 18:58 UTC
23
points
3
comments
4
min read
LW
link
Safety considerations for online generative modeling
Sam Marks
7 Jul 2022 18:31 UTC
42
points
9
comments
14
min read
LW
link
Caution when interpreting Deepmind’s In-context RL paper
Sam Marks
1 Nov 2022 2:42 UTC
103
points
6
comments
4
min read
LW
link
Recommend HAIST resources for assessing the value of RLHF-related alignment research
Sam Marks
and
Xander Davies
5 Nov 2022 20:58 UTC
26
points
9
comments
3
min read
LW
link
AGISF adaptation for in-person groups
Sam Marks
,
Xander Davies
and
Richard_Ngo
13 Jan 2023 3:24 UTC
44
points
2
comments
3
min read
LW
link
Turning off lights with model editing
Sam Marks
12 May 2023 20:25 UTC
67
points
5
comments
2
min read
LW
link
(arxiv.org)
Thoughts on open source AI
Sam Marks
3 Nov 2023 15:35 UTC
52
points
17
comments
10
min read
LW
link
Some open-source dictionaries and dictionary learning infrastructure
Sam Marks
5 Dec 2023 6:05 UTC
45
points
7
comments
5
min read
LW
link
What’s up with LLMs representing XORs of arbitrary features?
Sam Marks
3 Jan 2024 19:44 UTC
154
points
61
comments
16
min read
LW
link
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
18 Apr 2024 16:17 UTC
75
points
1
comment
12
min read
LW
link
Back to top