RSS

Alexandre Variengien

Karma: 600

My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre Variengien6 Feb 2024 19:10 UTC
74 points
12 comments16 min readLW link

The case for train­ing fron­tier AIs on Sume­rian-only corpus

15 Jan 2024 16:40 UTC
127 points
14 comments3 min readLW link

A Univer­sal Emer­gent De­com­po­si­tion of Retrieval Tasks in Lan­guage Models

19 Dec 2023 11:52 UTC
81 points
3 comments10 min readLW link
(arxiv.org)

Cap­ture the Flag Mechanis­tic In­ter­pretabil­ity Challenges

8 Sep 2023 23:00 UTC
22 points
0 comments7 min readLW link

In­put Swap Graphs: Dis­cov­er­ing the role of neu­ral net­work com­po­nents at scale

Alexandre Variengien12 May 2023 9:41 UTC
90 points
0 comments33 min readLW link

An in­tro­duc­tion to lan­guage model interpretability

Alexandre Variengien20 Apr 2023 22:22 UTC
14 points
0 comments9 min readLW link

Some com­mon con­fu­sion about in­duc­tion heads

Alexandre Variengien28 Mar 2023 21:51 UTC
46 points
4 comments5 min readLW link

Gliders in Lan­guage Models

Alexandre Variengien25 Nov 2022 0:38 UTC
30 points
11 comments10 min readLW link

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

28 Oct 2022 23:55 UTC
99 points
9 comments9 min readLW link2 reviews
(arxiv.org)

Ap­ply to the Ma­chine Learn­ing For Good boot­camp in France

Alexandre Variengien17 Jun 2022 7:32 UTC
10 points
0 comments1 min readLW link

Croe­sus, Cer­berus, and the mag­pies: a gen­tle in­tro­duc­tion to Elic­it­ing La­tent Knowledge

Alexandre Variengien27 May 2022 17:58 UTC
14 points
0 comments16 min readLW link