Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Jozdien
(Arun Jose)
Karma:
1,212
All
Posts
Comments
New
Top
Old
The case for more ambitious language model evals
Jozdien
30 Jan 2024 0:01 UTC
105
points
25
comments
5
min read
LW
link
AI Safety via Luck
Jozdien
1 Apr 2023 20:13 UTC
76
points
7
comments
11
min read
LW
link
Thoughts On (Solving) Deep Deception
Jozdien
21 Oct 2023 22:40 UTC
66
points
2
comments
6
min read
LW
link
Conditioning Generative Models for Alignment
Jozdien
18 Jul 2022 7:11 UTC
58
points
8
comments
20
min read
LW
link
Gradient Filtering
Jozdien
and
janus
18 Jan 2023 20:09 UTC
54
points
16
comments
13
min read
LW
link
Trying to isolate objectives: approaches toward high-level interpretability
Jozdien
9 Jan 2023 18:33 UTC
48
points
14
comments
8
min read
LW
link
Critiques of the AI control agenda
Jozdien
14 Feb 2024 19:25 UTC
47
points
14
comments
9
min read
LW
link
[ASoT] Finetuning, RL, and GPT’s world prior
Jozdien
2 Dec 2022 16:33 UTC
44
points
8
comments
5
min read
LW
link
Gradient Descent on the Human Brain
Jozdien
and
gaspode
1 Apr 2024 22:39 UTC
42
points
4
comments
2
min read
LW
link
The Pointer Resolution Problem
Jozdien
16 Feb 2024 21:25 UTC
41
points
20
comments
3
min read
LW
link
[ASoT] Simulators show us behavioural properties by default
Jozdien
13 Jan 2023 18:42 UTC
33
points
2
comments
3
min read
LW
link
Difficulty classes for alignment properties
Jozdien
20 Feb 2024 9:08 UTC
32
points
5
comments
2
min read
LW
link
Insufficient Values
Jozdien
,
Jacob Abraham
and
Abraham Francis
16 Jun 2021 14:33 UTC
31
points
15
comments
5
min read
LW
link
Utopic Nightmares
Jozdien
14 May 2021 21:24 UTC
10
points
20
comments
5
min read
LW
link
Gaming Incentives
Jozdien
29 Jul 2021 13:51 UTC
10
points
4
comments
6
min read
LW
link
Back to top