RSS

Iron­ing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC
87 points
5 comments11 min readLW link

Towards Mul­ti­modal In­ter­pretabil­ity: Learn­ing Sparse In­ter­pretable Fea­tures in Vi­sion Transformers

hugofry29 Apr 2024 20:57 UTC
20 points
3 comments11 min readLW link

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_Altair29 Apr 2024 20:28 UTC
18 points
0 comments14 min readLW link

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
142 points
52 comments10 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

Screwtape26 Apr 2024 21:58 UTC
118 points
9 comments9 min readLW link

Big-en­dian is bet­ter than lit­tle-endian

Menotim29 Apr 2024 2:30 UTC
26 points
13 comments3 min readLW link

Open-Source AI: A Reg­u­la­tory Review

29 Apr 2024 10:10 UTC
12 points
0 comments8 min readLW link

List your AI X-Risk cruxes!

Aryeh Englander28 Apr 2024 18:26 UTC
31 points
4 comments2 min readLW link

Con­structabil­ity: Plainly-coded AGIs may be fea­si­ble in the near future

27 Apr 2024 16:04 UTC
63 points
12 comments13 min readLW link

[Aspira­tion-based de­signs] 1. In­for­mal in­tro­duc­tion

28 Apr 2024 13:00 UTC
34 points
4 comments8 min readLW link

Un­in­ten­tion­ally Creat­ing Value

28 Apr 2024 20:05 UTC
22 points
0 comments2 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworth23 Apr 2024 22:19 UTC
172 points
88 comments1 min readLW link

An Un­in­ten­tional Compliment

28 Apr 2024 20:04 UTC
21 points
1 comment4 min readLW link

Disen­tan­gling Com­pe­tence and Intelligence

Robert Kralisch29 Apr 2024 0:12 UTC
16 points
4 comments6 min readLW link

So What’s Up With PUFAs Chem­i­cally?

J Bostock27 Apr 2024 13:32 UTC
55 points
22 comments6 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
266 points
94 comments17 min readLW link
(dynomight.net)

The first fu­ture and the best future

KatjaGrace25 Apr 2024 6:40 UTC
104 points
10 comments1 min readLW link
(worldspiritsockpuppet.com)

Duct Tape security

Isaac King26 Apr 2024 18:57 UTC
62 points
8 comments5 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam Shai16 Apr 2024 21:16 UTC
313 points
64 comments12 min readLW link

Su­per­po­si­tion is not “just” neu­ron polysemanticity

LawrenceC26 Apr 2024 23:22 UTC
50 points
3 comments13 min readLW link