Mike Vaiana

Karma: 670

AICRAFT: DARPA-Funded AI Alignment Researchers — Applications Open

Mike Vaiana, Diogo de Lucena and Kvee

16 Mar 2026 21:44 UTC

67 points

8 comments4 min readLW link

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Kvee, Mike Vaiana and Trent Hodgeson

27 Mar 2025 15:39 UTC

81 points

4 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Kvee, Cameron Berg, Mike Vaiana and Trent Hodgeson

13 Mar 2025 19:09 UTC

162 points

46 comments6 min readLW link

Self-prediction acts as an emergent regularizer

Cameron Berg, Kvee, Mike Vaiana, Diogo de Lucena, florin_pop and Trent Hodgeson

23 Oct 2024 22:27 UTC

92 points

9 comments4 min readLW link

Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Kvee, Diogo de Lucena, Cameron Berg and Trent Hodgeson

30 Jul 2024 16:22 UTC

247 points

53 comments12 min readLW link 2 reviews

Video Intro to Guaranteed Safe AI

Mike Vaiana, Diogo de Lucena and Trent Hodgeson

11 Jul 2024 17:53 UTC

27 points

0 comments1 min readLW link

(youtu.be)

DIY RLHF: A simple implementation for hands on experience

Mike Vaiana and Trent Hodgeson

10 Jul 2024 12:07 UTC

29 points

0 comments6 min readLW link