RSS

burnssa

Karma: 31

Aspiring rationalist, deeply interested in economic development and steering transformative AI to beneficial use. Connoisseur of messy human data, startup veteran, looking to scale alignment research and oversight.

A cheap spe­cial­ist judge gets used by agents but fails to re­duce al­ign­ment au­dit costs

burnssa13 Jun 2026 20:38 UTC
8 points
0 comments8 min readLW link

2B scor­ing model flags out-of-do­main mis­al­ign­ment, sug­gest­ing spe­cial­ist judges have po­ten­tial for audits

burnssa14 May 2026 20:00 UTC
8 points
0 comments6 min readLW link

Emer­gent mis­al­ign­ment ev­i­dent in ac­ti­va­tions at low poi­son­ing doses—long be­fore be­hav­ioral checks flag it

burnssa27 Apr 2026 1:15 UTC
15 points
0 comments5 min readLW link

Don’t Write Off Hu­man La­bor, Yet

burnssa25 Mar 2026 16:37 UTC
−3 points
0 comments8 min readLW link

How post-train­ing shapes le­gal rep­re­sen­ta­tions: prob­ing SCOTUS opinions across model families

burnssa15 Mar 2026 0:15 UTC
7 points
0 comments8 min readLW link