RSS

7vik

Karma: 377

I research intelligence and it’s emergence and expression in neural networks to ensure advanced AI is safe and beneficial.

Current interests: neural network interpretability, alignment/​safety, unsupervised learning, and deep learning theory.

For more, check out my scholar profile and personal website.

Spar­sity is the en­emy of fea­ture ex­trac­tion (ft. ab­sorp­tion)

3 May 2025 10:13 UTC
32 points
0 comments6 min readLW link

Among Us: A Sand­box for Agen­tic Deception

5 Apr 2025 6:24 UTC
110 points
7 comments7 min readLW link

Au­dit­ing lan­guage mod­els for hid­den objectives

13 Mar 2025 19:18 UTC
141 points
15 comments13 min readLW link

Some les­sons from the OpenAI-Fron­tierMath debacle

7vik19 Jan 2025 21:09 UTC
71 points
9 comments4 min readLW link

In­tri­ca­cies of Fea­ture Geom­e­try in Large Lan­guage Models

7 Dec 2024 18:10 UTC
71 points
0 comments12 min readLW link

The Geom­e­try of Feel­ings and Non­sense in Large Lan­guage Models

27 Sep 2024 17:49 UTC
61 points
10 comments4 min readLW link