RSS

Punya Syon Pandey

Karma: 18

Ex­plain­ing un­de­sir­able model be­hav­ior: (How) can in­fluence func­tions help?

2 Mar 2026 11:30 UTC
18 points
0 comments3 min readLW link

In­ves­ti­gat­ing Ac­ci­den­tal Misal­ign­ment: Causal Effects of Fine-Tun­ing Data on Model Vulnerability

11 Jun 2025 19:30 UTC
6 points
0 comments5 min readLW link