nielsrolf

Karma: 648

Shaping the exploration of the motivation-space matters for AI safety

Maxime Riché, Victor Gillioz, nielsrolf, Kajetan Dymkiewicz, Filip Sondej, RogerDearnaley, Daniel Tan and dillonkn

6 Mar 2026 14:43 UTC

83 points

15 comments10 min readLW link

UtopiaBench

nielsrolf8 Feb 2026 18:19 UTC

67 points

10 comments1 min readLW link

Concrete research ideas on AI personas

nielsrolf, Maxime Riché and Daniel Tan

3 Feb 2026 21:50 UTC

69 points

10 comments6 min readLW link

Conditionalization Confounds Inoculation Prompting Results

Maxime Riché and nielsrolf

3 Feb 2026 11:50 UTC

78 points

5 comments19 min readLW link

A Case for Model Persona Research

nielsrolf, Maxime Riché and Daniel Tan

15 Dec 2025 13:35 UTC

121 points

11 comments4 min readLW link

nielsrolf’s Shortform

nielsrolf3 Mar 2023 0:00 UTC

1 point

29 comments1 min readLW link