RSS

Adrià Garriga-alonso

Karma: 882

Catas­trophic Good­hart in RL with KL penalty

15 May 2024 0:58 UTC
49 points
7 comments7 min readLW link

An eval­u­a­tion of cir­cuit eval­u­a­tion metrics

15 Apr 2024 19:38 UTC
18 points
0 comments4 min readLW link

Ophiol­ogy (or, how the Mamba ar­chi­tec­ture works)

9 Apr 2024 19:31 UTC
60 points
8 comments10 min readLW link

Does liter­acy re­move your abil­ity to be a bard as good as Homer?

Adrià Garriga-alonso18 Jan 2024 3:43 UTC
51 points
19 comments3 min readLW link

Thomas Kwa’s re­search journal

23 Nov 2023 5:11 UTC
79 points
1 comment6 min readLW link

On Fre­quen­tism and Bayesian Dogma

15 Oct 2023 22:23 UTC
59 points
27 comments6 min readLW link

A com­par­i­son of causal scrub­bing, causal ab­strac­tions, and re­lated methods

8 Jun 2023 23:40 UTC
72 points
3 comments22 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

3 Dec 2022 0:59 UTC
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

3 Dec 2022 0:58 UTC
17 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
197 points
35 comments20 min readLW link1 review

The No Free Lunch the­o­rems and their Razor

Adrià Garriga-alonso24 May 2022 6:40 UTC
56 points
3 comments9 min readLW link

Löb’s the­o­rem sim­ply shows that Peano ar­ith­metic can­not prove its own soundness

Adrià Garriga-alonso22 Apr 2021 9:17 UTC
5 points
15 comments1 min readLW link