Diogo de Lucena

Karma: 682

Chief Scientist at AE Studio

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Kvee, Mike Vaiana and Trent Hodgeson

27 Mar 2025 15:39 UTC

81 points

4 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Kvee, Cameron Berg, Mike Vaiana and Trent Hodgeson

13 Mar 2025 19:09 UTC

162 points

46 comments6 min readLW link

Science advances one funeral at a time

Cameron Berg, Kvee, Diogo de Lucena and Trent Hodgeson

1 Nov 2024 23:06 UTC

100 points

9 comments2 min readLW link

Self-prediction acts as an emergent regularizer

Cameron Berg, Kvee, Mike Vaiana, Diogo de Lucena, florin_pop and Trent Hodgeson

23 Oct 2024 22:27 UTC

92 points

9 comments4 min readLW link

The case for a negative alignment tax

Cameron Berg, Kvee, Diogo de Lucena and Trent Hodgeson

18 Sep 2024 18:33 UTC

79 points

20 comments7 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Kvee, Diogo de Lucena, Cameron Berg and Trent Hodgeson

30 Jul 2024 16:22 UTC

242 points

53 comments12 min readLW link 2 reviews

Video Intro to Guaranteed Safe AI

Mike Vaiana, Diogo de Lucena and Trent Hodgeson

11 Jul 2024 17:53 UTC

27 points

0 comments1 min readLW link

(youtu.be)

AE Studio @ SXSW: We need more AI consciousness research (and further resources)

Trent Hodgeson, Cameron Berg, Kvee, phgubbins and Diogo de Lucena

26 Mar 2024 20:59 UTC

68 points

8 comments3 min readLW link