RSS

Meta Align­ment: Com­mu­ni­ca­tion Wack-a-Mole

Bridgett Kay22 Jun 2024 20:12 UTC
12 points
2 comments5 min readLW link
(dxmrevealed.wordpress.com)

AI as a com­put­ing plat­form: what to expect

Jonasb22 Jun 2024 19:55 UTC
−1 points
0 comments7 min readLW link
(www.denominations.io)

Ap­ply­ing Force to the Wrong End of a Causal Chain

silentbob22 Jun 2024 18:06 UTC
24 points
0 comments9 min readLW link

Bed Time Quests & Din­ner Games for 3-5 year olds

Gunnar_Zarncke22 Jun 2024 7:53 UTC
40 points
0 comments1 min readLW link
(kidquest.substack.com)

Ap­prais­ing ag­grega­tivism and utilitarianism

Cleo Nardo21 Jun 2024 23:10 UTC
20 points
0 comments19 min readLW link

Best-of-n with mis­al­igned re­ward mod­els for Math reasoning

Fabien Roger21 Jun 2024 22:53 UTC
23 points
0 comments3 min readLW link

No re­ally, the Sticker Short­cut fal­lacy is in­deed a fallacy

ymeskhout21 Jun 2024 22:27 UTC
10 points
2 comments5 min readLW link
(www.ymeskhout.com)

Sara­jevo 1914: Black Swan Questions

JohnBuridan21 Jun 2024 21:27 UTC
7 points
0 comments2 min readLW link

Let’s De­sign a School, Part 3.2 Costs

Sable21 Jun 2024 17:58 UTC
6 points
0 comments5 min readLW link
(affablyevil.substack.com)

2022 AI Align­ment Course: 5→37% work­ing on AI safety

Dewi21 Jun 2024 17:45 UTC
6 points
3 comments3 min readLW link

Some Thoughts on AI Align­ment: Us­ing AI to Con­trol AI

eigenvalue21 Jun 2024 17:44 UTC
1 point
1 comment1 min readLW link
(github.com)

What dis­t­in­guishes “early”, “mid” and “end” games?

Raemon21 Jun 2024 17:41 UTC
37 points
11 comments1 min readLW link

AI gov­er­nance needs a the­ory of victory

21 Jun 2024 16:15 UTC
11 points
0 comments1 min readLW link

Con­nect­ing the Dots: LLMs can In­fer & Ver­bal­ize La­tent Struc­ture from Train­ing Data

21 Jun 2024 15:54 UTC
120 points
8 comments8 min readLW link
(arxiv.org)

On OpenAI’s Model Spec

Zvi21 Jun 2024 13:00 UTC
38 points
3 comments30 min readLW link
(thezvi.wordpress.com)

At­ten­tion Out­put SAEs Im­prove Cir­cuit Analysis

21 Jun 2024 12:56 UTC
28 points
0 comments19 min readLW link

“New­ton’s laws” of finance

pchvykov21 Jun 2024 9:41 UTC
7 points
2 comments10 min readLW link

Cap­i­tal­is­ing On Trust—A Simulation

James Stephen Brown21 Jun 2024 4:43 UTC
2 points
0 comments1 min readLW link
(nonzerosum.games)

″… than av­er­age” is (al­most) meaningless

jwfiredragon21 Jun 2024 4:42 UTC
8 points
5 comments3 min readLW link

The Ker­nel of Mean­ing in Prop­erty Rights

Abhimanyu Pallavi Sudhir21 Jun 2024 1:12 UTC
5 points
6 comments2 min readLW link