RSS

Kvee

Karma: 1,491

AICRAFT: DARPA-Funded AI Align­ment Re­searchers — Ap­pli­ca­tions Open

16 Mar 2026 21:44 UTC
67 points
8 comments4 min readLW link

Mis­tral Large 2 (123B) seems to ex­hibit al­ign­ment faking

27 Mar 2025 15:39 UTC
81 points
4 comments13 min readLW link

Re­duc­ing LLM de­cep­tion at scale with self-other over­lap fine-tuning

13 Mar 2025 19:09 UTC
162 points
46 comments6 min readLW link

Align­ment can be the ‘clean en­ergy’ of AI

22 Feb 2025 0:08 UTC
69 points
8 comments8 min readLW link

Mak­ing a con­ser­va­tive case for alignment

15 Nov 2024 18:55 UTC
208 points
67 comments7 min readLW link

Science ad­vances one funeral at a time

1 Nov 2024 23:06 UTC
104 points
9 comments2 min readLW link

Self-pre­dic­tion acts as an emer­gent regularizer

23 Oct 2024 22:27 UTC
92 points
9 comments4 min readLW link

The case for a nega­tive al­ign­ment tax

18 Sep 2024 18:33 UTC
79 points
20 comments7 min readLW link