RSS

williawa

Karma: 404

Models not mak­ing it clear when they’re role­play­ing seems like a fairly big issue

williawa21 Nov 2025 20:23 UTC
16 points
3 comments6 min readLW link

a sketch of how we might go about get­ting bas­ins of cor­rigi­bil­ity from RL

williawa14 Nov 2025 22:10 UTC
9 points
0 comments4 min readLW link

Thoughts About how RLHF and Re­lated “Pro­saic” Ap­proaches Could be Used to Create Ro­bustly Aligned AIs.

williawa23 Aug 2025 21:05 UTC
10 points
14 comments4 min readLW link

willi­awa’s Shortform

williawa1 Apr 2025 13:17 UTC
3 points
41 comments1 min readLW link

Ber­gen – ACX Mee­tups Every­where Spring 2025

williawa25 Mar 2025 23:49 UTC
2 points
0 comments1 min readLW link