RSS

LLMs seem (rel­a­tively) safe

JustisMills25 Apr 2024 22:13 UTC
48 points
18 comments7 min readLW link
(justismills.substack.com)

[Aspira­tion-based de­signs] 2. For­mal frame­work, ba­sic algorithm

28 Apr 2024 13:02 UTC
18 points
2 comments16 min readLW link

Search­ing for Search­ing for Search

Rubi J. Hudson14 Feb 2024 23:51 UTC
21 points
3 comments7 min readLW link

Big-en­dian is bet­ter than lit­tle-endian

Menotim29 Apr 2024 2:30 UTC
27 points
14 comments3 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

Screwtape26 Apr 2024 21:58 UTC
120 points
10 comments9 min readLW link

UDT1.01: The Story So Far (1/​10)

Diffractor27 Mar 2024 23:22 UTC
31 points
5 comments13 min readLW link

Iron­ing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC
87 points
6 comments11 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
266 points
94 comments17 min readLW link
(dynomight.net)

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
142 points
52 comments10 min readLW link

The Prop-room and Stage Cog­ni­tive Architecture

Robert Kralisch29 Apr 2024 0:48 UTC
8 points
3 comments14 min readLW link

Refer­en­tial Containment

Robert Kralisch29 Apr 2024 0:16 UTC
2 points
2 comments3 min readLW link

Disen­tan­gling Com­pe­tence and Intelligence

Robert Kralisch29 Apr 2024 0:12 UTC
16 points
4 comments6 min readLW link

Towards Mul­ti­modal In­ter­pretabil­ity: Learn­ing Sparse In­ter­pretable Fea­tures in Vi­sion Transformers

hugofry29 Apr 2024 20:57 UTC
20 points
3 comments11 min readLW link

Es­ti­mat­ing the Num­ber of Play­ers from Game Re­sult Percentages

Daniel L28 Apr 2024 17:42 UTC
1 point
2 comments1 min readLW link

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
117 points
15 comments1 min readLW link
(www.anthropic.com)

Los­ing Faith In Con­trar­i­anism

omnizoid25 Apr 2024 20:53 UTC
31 points
40 comments5 min readLW link

Towards a for­mal­iza­tion of the agent struc­ture problem

Alex_Altair29 Apr 2024 20:28 UTC
28 points
0 comments14 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworth23 Apr 2024 22:19 UTC
172 points
88 comments1 min readLW link

Open Thread Spring 2024

habryka11 Mar 2024 19:17 UTC
22 points
82 comments1 min readLW link

We are headed into an ex­treme com­pute overhang

devrandom26 Apr 2024 21:38 UTC
38 points
15 comments2 min readLW link