RSS

Log­i­cal in­duc­tion for soft­ware engineers

Alex Flint3 Dec 2022 19:55 UTC
87 points
2 comments27 min readLW link

A challenge for AGI or­ga­ni­za­tions, and a challenge for readers

1 Dec 2022 23:11 UTC
219 points
16 comments2 min readLW link

Jailbreak­ing ChatGPT on Re­lease Day

Zvi2 Dec 2022 13:10 UTC
162 points
41 comments6 min readLW link
(thezvi.wordpress.com)

The Plan − 2022 Update

johnswentworth1 Dec 2022 20:43 UTC
170 points
16 comments8 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
94 points
7 comments20 min readLW link

Sum­mary of a new study on out-group hate (and how to fix it)

AllAmericanBreakfast4 Dec 2022 1:53 UTC
34 points
6 comments3 min readLW link
(www.pnas.org)

MrBeast’s Squid Game Tricked Me

lsusr3 Dec 2022 5:50 UTC
58 points
1 comment2 min readLW link

ChatGPT seems over­con­fi­dent to me

qbolec4 Dec 2022 8:03 UTC
16 points
1 comment16 min readLW link

Did ChatGPT just gaslight me?

ThomasW1 Dec 2022 5:41 UTC
120 points
43 comments9 min readLW link
(equonc.substack.com)

In­ner and outer al­ign­ment de­com­pose one hard prob­lem into two ex­tremely hard problems

TurnTrout2 Dec 2022 2:43 UTC
79 points
5 comments53 min readLW link

Re-Ex­am­in­ing LayerNorm

Eric Winsor1 Dec 2022 22:20 UTC
85 points
9 comments5 min readLW link

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie Steiner4 Dec 2022 2:48 UTC
17 points
8 comments2 min readLW link

Find­ing gliders in the game of life

paulfchristiano1 Dec 2022 20:40 UTC
81 points
1 comment16 min readLW link
(ai-alignment.com)

Our 2022 Giving

jefftk3 Dec 2022 15:40 UTC
28 points
0 comments1 min readLW link
(www.jefftk.com)

The LessWrong 2021 Re­view (In­tel­lec­tual Cir­cle Ex­pan­sion)

1 Dec 2022 21:17 UTC
68 points
25 comments8 min readLW link

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

28 Nov 2022 12:54 UTC
155 points
25 comments31 min readLW link

Be less scared of overconfidence

benkuhn30 Nov 2022 15:20 UTC
97 points
8 comments9 min readLW link
(www.benkuhn.net)

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
32 points
0 comments17 min readLW link

De­con­fus­ing Direct vs Amor­tised Optimization

beren2 Dec 2022 11:30 UTC
37 points
2 comments10 min readLW link

On the Di­plo­macy AI

Zvi28 Nov 2022 13:20 UTC
112 points
24 comments11 min readLW link
(thezvi.wordpress.com)