RSS

Vivek Hebbar

Karma: 1,422

Oper­a­tional­iz­ing FDT

Vivek Hebbar13 Mar 2026 0:12 UTC
69 points
9 comments6 min readLW link

A sim­ple rule for causation

Vivek Hebbar24 Feb 2026 23:14 UTC
37 points
2 comments3 min readLW link

How will we do SFT on mod­els with opaque rea­son­ing?

21 Feb 2026 0:00 UTC
32 points
17 comments7 min readLW link

The­o­ret­i­cal pre­dic­tions on the sam­ple effi­ciency of train­ing poli­cies and ac­ti­va­tion monitors

10 Jan 2026 23:50 UTC
17 points
2 comments7 min readLW link

Method­olog­i­cal con­sid­er­a­tions in mak­ing ma­lign ini­tial­iza­tions for con­trol research

24 Dec 2025 1:18 UTC
11 points
0 comments13 min readLW link

Su­per­vised fine-tun­ing as a method for train­ing-based AI control

13 Nov 2025 22:25 UTC
40 points
0 comments18 min readLW link

When does train­ing a model change its goals?

12 Jun 2025 18:43 UTC
78 points
3 comments15 min readLW link

Poli­ti­cal syco­phancy as a model or­ganism of scheming

12 May 2025 17:49 UTC
40 points
0 comments14 min readLW link

How can we solve diffuse threats like re­search sab­o­tage with AI con­trol?

Vivek Hebbar30 Apr 2025 19:23 UTC
53 points
1 comment8 min readLW link