Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Diogo de Lucena
Karma:
662
Chief Scientist at AE Studio
All
Posts
Comments
New
Top
Old
Mistral Large 2 (123B) seems to exhibit alignment faking
Marc Carauleanu
,
Diogo de Lucena
,
Gunnar_Zarncke
,
Cameron Berg
,
Judd Rosenblatt
,
Mike Vaiana
and
Trent Hodgeson
27 Mar 2025 15:39 UTC
81
points
4
comments
13
min read
LW
link
Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu
,
Diogo de Lucena
,
Gunnar_Zarncke
,
Judd Rosenblatt
,
Cameron Berg
,
Mike Vaiana
and
Trent Hodgeson
13 Mar 2025 19:09 UTC
162
points
46
comments
6
min read
LW
link
Science advances one funeral at a time
Cameron Berg
,
Judd Rosenblatt
,
Diogo de Lucena
and
Trent Hodgeson
1 Nov 2024 23:06 UTC
100
points
9
comments
2
min read
LW
link
Self-prediction acts as an emergent regularizer
Cameron Berg
,
Judd Rosenblatt
,
Mike Vaiana
,
Diogo de Lucena
,
florin_pop
and
Trent Hodgeson
23 Oct 2024 22:27 UTC
91
points
9
comments
4
min read
LW
link
The case for a negative alignment tax
Cameron Berg
,
Judd Rosenblatt
,
Diogo de Lucena
and
Trent Hodgeson
18 Sep 2024 18:33 UTC
77
points
20
comments
7
min read
LW
link
Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu
,
Mike Vaiana
,
Judd Rosenblatt
,
Diogo de Lucena
,
Cameron Berg
and
Trent Hodgeson
30 Jul 2024 16:22 UTC
226
points
51
comments
12
min read
LW
link
Video Intro to Guaranteed Safe AI
Mike Vaiana
,
Diogo de Lucena
and
Trent Hodgeson
11 Jul 2024 17:53 UTC
27
points
0
comments
1
min read
LW
link
(youtu.be)
AE Studio @ SXSW: We need more AI consciousness research (and further resources)
Trent Hodgeson
,
Cameron Berg
,
Judd Rosenblatt
,
phgubbins
and
Diogo de Lucena
26 Mar 2024 20:59 UTC
67
points
8
comments
3
min read
LW
link
Back to top