Kvee

Karma: 1,440

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Kvee, Mike Vaiana and Trent Hodgeson

27 Mar 2025 15:39 UTC

81 points

4 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Kvee, Cameron Berg, Mike Vaiana and Trent Hodgeson

13 Mar 2025 19:09 UTC

162 points

46 comments6 min readLW link

Alignment can be the ‘clean energy’ of AI

Cameron Berg, Kvee and Trent Hodgeson

22 Feb 2025 0:08 UTC

69 points

8 comments8 min readLW link

Making a conservative case for alignment

Cameron Berg, Kvee, phgubbins and Trent Hodgeson

15 Nov 2024 18:55 UTC

208 points

67 comments7 min readLW link

Science advances one funeral at a time

Cameron Berg, Kvee, Diogo de Lucena and Trent Hodgeson

1 Nov 2024 23:06 UTC

100 points

9 comments2 min readLW link

Self-prediction acts as an emergent regularizer

Cameron Berg, Kvee, Mike Vaiana, Diogo de Lucena, florin_pop and Trent Hodgeson

23 Oct 2024 22:27 UTC

92 points

9 comments4 min readLW link

The case for a negative alignment tax

Cameron Berg, Kvee, Diogo de Lucena and Trent Hodgeson

18 Sep 2024 18:33 UTC

79 points

20 comments7 min readLW link

The EA case for Trump

Kvee3 Aug 2024 1:00 UTC

14 points

1 comment1 min readLW link

(www.secondbest.ca)

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Kvee, Diogo de Lucena, Cameron Berg and Trent Hodgeson

30 Jul 2024 16:22 UTC

242 points

53 comments12 min readLW link 2 reviews

Yoshua Bengio: Reasoning through arguments against taking AI safety seriously

Kvee11 Jul 2024 23:53 UTC

72 points

3 comments1 min readLW link

(yoshuabengio.org)

There Should Be More Alignment-Driven Startups

Vaniver, Kvee, Cameron Berg and phgubbins

31 May 2024 2:05 UTC

62 points

14 comments11 min readLW link

Key takeaways from our EA and alignment research surveys

Cameron Berg, Kvee, florin_pop and Trent Hodgeson

3 May 2024 18:10 UTC

114 points

10 comments21 min readLW link

AE Studio @ SXSW: We need more AI consciousness research (and further resources)

Trent Hodgeson, Cameron Berg, Kvee, phgubbins and Diogo de Lucena

26 Mar 2024 20:59 UTC

68 points

8 comments3 min readLW link

Survey for alignment researchers!

Cameron Berg, Kvee and Trent Hodgeson

2 Feb 2024 20:41 UTC

71 points

11 comments1 min readLW link

The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda

Cameron Berg, Kvee, Trent Hodgeson and Marc Carauleanu

18 Dec 2023 20:35 UTC

190 points

23 comments12 min readLW link 1 review