All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29 30 31

K-complexity is silly; use cross-entropy instead

So8res20 Dec 2022 23:06 UTC

153 points

60 comments14 min readLW link 2 reviews

Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic

Orpheus1620 Dec 2022 21:39 UTC

19 points

2 comments11 min readLW link

Discovering Language Model Behaviors with Model-Written Evaluations

evhub and Ethan Perez

20 Dec 2022 20:08 UTC

100 points

34 comments1 min readLW link

(www.anthropic.com)

Reflections: Bureaucratic Hell

Haris Rashid20 Dec 2022 19:22 UTC

−5 points

1 comment1 min readLW link

(www.harisrab.com)

Proliferating Education

Haris Rashid20 Dec 2022 19:22 UTC

−1 points

2 comments5 min readLW link

(www.harisrab.com)

AGI is here, but nobody wants it. Why should we even care?

MGow20 Dec 2022 19:14 UTC

−22 points

0 comments17 min readLW link

Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development

Roman Leventov20 Dec 2022 17:13 UTC

34 points

3 comments36 min readLW link

I believe some AI doomers are overconfident

FTPickle20 Dec 2022 17:09 UTC

8 points

15 comments2 min readLW link

Note on algorithms with multiple trained components

Steven Byrnes20 Dec 2022 17:08 UTC

23 points

4 comments2 min readLW link

Marvel Snap: Phase 2

Zvi20 Dec 2022 14:50 UTC

11 points

1 comment13 min readLW link

(thezvi.wordpress.com)

(Extremely) Naive Gradient Hacking Doesn’t Work

ojorgensen20 Dec 2022 14:35 UTC

17 points

0 comments6 min readLW link

An Open Agency Architecture for Safe Transformative AI

davidad20 Dec 2022 13:04 UTC

80 points

22 comments4 min readLW link

Under-Appreciated Ways to Use Flashcards—Part I

Florence Hinder20 Dec 2022 12:43 UTC

22 points

5 comments5 min readLW link

(thoughtsaver.ghost.io)

EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′)

Zoe Williams20 Dec 2022 9:49 UTC

10 points

0 comments17 min readLW link

[Fiction] Unspoken Stone

Gordon Seidoh Worley20 Dec 2022 5:11 UTC

19 points

0 comments5 min readLW link

Notice when you stop reading right before you understand

just_browsing20 Dec 2022 5:09 UTC

61 points

6 comments1 min readLW link

Take 12: RLHF’s use is evidence that orgs will jam RL at real-world problems.

Charlie Steiner20 Dec 2022 5:01 UTC

25 points

1 comment3 min readLW link

More notes from raising a late-talking kid

Steven Byrnes20 Dec 2022 2:13 UTC

41 points

2 comments6 min readLW link

The “Minimal Latents” Approach to Natural Abstractions

johnswentworth20 Dec 2022 1:22 UTC

53 points

24 comments12 min readLW link

Shard Theory in Nine Theses: a Distillation and Critical Appraisal

LawrenceC19 Dec 2022 22:52 UTC

150 points

30 comments18 min readLW link

[Question] Will research in AI risk jinx it? Consequences of training AI on AI risk arguments

Yann Dubois19 Dec 2022 22:42 UTC

5 points

6 comments1 min readLW link

AGI Timelines in Governance: Different Strategies for Different Timeframes

simeon_c and AmberDawn

19 Dec 2022 21:31 UTC

65 points

28 comments10 min readLW link

Towards Hodge-podge Alignment

Cleo Nardo19 Dec 2022 20:12 UTC

95 points

30 comments9 min readLW link

Computational signatures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC

30 points

3 comments20 min readLW link

Results from a survey on tool use and workflows in alignment research

jacquesthibs, Jan, janus and Logan Riggs

19 Dec 2022 15:19 UTC

79 points

2 comments19 min readLW link

Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]

Bill Benzon19 Dec 2022 15:12 UTC

13 points

5 comments4 min readLW link

(new-savanna.blogspot.com)

Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner’s Dilemma

Jim Buhler19 Dec 2022 15:00 UTC

24 points

4 comments5 min readLW link

Next Level Seinfeld

Zvi19 Dec 2022 13:30 UTC

50 points

8 comments1 min readLW link

(thezvi.wordpress.com)

CEA Disambiguation

jefftk19 Dec 2022 13:20 UTC

25 points

0 comments1 min readLW link

(www.jefftk.com)

Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend)

Remmelt19 Dec 2022 12:02 UTC

−3 points

9 comments31 min readLW link

Hacker-AI and Cyberwar 2.0+

Erland Wittkotter19 Dec 2022 11:46 UTC

2 points

0 comments15 min readLW link

Non-Technical Preparation for Hacker-AI and Cyberwar 2.0+

Erland Wittkotter19 Dec 2022 11:42 UTC

2 points

0 comments25 min readLW link

An Effective Grab Bag

stavros19 Dec 2022 10:29 UTC

30 points

3 comments7 min readLW link

Slick hyperfinite Ramsey theory proof

Alok Singh19 Dec 2022 8:40 UTC

8 points

3 comments1 min readLW link

(alok.github.io)

The True Spirit of Solstice?

Raemon19 Dec 2022 8:00 UTC

71 points

31 comments9 min readLW link

The Risk of Orbital Debris and One (Cheap) Way to Mitigate It

clans19 Dec 2022 3:16 UTC

13 points

1 comment4 min readLW link

(locationtbd.home.blog)

Why I think that teaching philosophy is high impact

Eleni Angelou19 Dec 2022 3:11 UTC

5 points

0 comments2 min readLW link

A template for doing annual reviews

peterslattery19 Dec 2022 3:09 UTC

2 points

0 comments1 min readLW link

Event [Berkeley]: Alignment Collaborator Speed-Meeting

AlexMennen and Carson Jones

19 Dec 2022 2:24 UTC

18 points

2 comments1 min readLW link

An easier(?) end to the electoral college

ejacob19 Dec 2022 2:09 UTC

2 points

2 comments2 min readLW link

How Death Feels

sisyphus18 Dec 2022 23:47 UTC

−7 points

9 comments1 min readLW link

Why Are Women Hot?

Jacob Falkovich18 Dec 2022 23:20 UTC

17 points

19 comments11 min readLW link

[Question] Can we, in principle, know the measure of counterfactual quantum branches?

sisyphus18 Dec 2022 22:07 UTC

1 point

15 comments1 min readLW link

Boston Solstice 2022 Retrospective

jefftk18 Dec 2022 19:00 UTC

19 points

3 comments5 min readLW link

(www.jefftk.com)

Take 11: “Aligning language models” should be weirder.

Charlie Steiner18 Dec 2022 14:14 UTC

34 points

0 comments2 min readLW link

Bad at Arithmetic, Promising at Math

cohenmacaulay18 Dec 2022 5:40 UTC

102 points

19 comments20 min readLW link 1 review

Overconfidence bubbles

kaputmi18 Dec 2022 2:07 UTC

3 points

0 comments2 min readLW link

Positive values seem more robust and lasting than prohibitions

TurnTrout17 Dec 2022 21:43 UTC

52 points

13 comments2 min readLW link

What we owe the microbiome

weverka17 Dec 2022 19:40 UTC

2 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Why write more: improve your epistemics, self-care, & 28 other reasons

KatWoods17 Dec 2022 19:25 UTC

24 points

1 comment6 min readLW link