Dec 16, 2023, 5:49 AM

76 points

4 comments6 min readLW link 1 review

The problems with the concept of an infohazard as used by the LW community [Linkpost]

Noosphere89Dec 22, 2023, 4:13 PM

75 points

43 comments3 min readLW link

(www.beren.io)

Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)

Thane RuthenisDec 22, 2023, 8:19 PM

74 points

14 comments6 min readLW link

Neural uncertainty estimation review article (for alignment)

Charlie SteinerDec 5, 2023, 8:01 AM

74 points

3 comments11 min readLW link

shoes with springs

bhauthDec 30, 2023, 9:46 PM

71 points

9 comments4 min readLW link 2 reviews

(www.bhauth.com)

Update on Chinese IQ-related gene panels

Lao MeinDec 14, 2023, 10:12 AM

70 points

7 comments1 min readLW link

OpenAI: Preparedness framework

Zach Stein-PerlmanDec 18, 2023, 6:30 PM

70 points

23 comments4 min readLW link

(openai.com)

Finding Sparse Linear Connections between Features in LLMs

Logan Riggs, Sam Mitchell and Adam Kaufman

Dec 9, 2023, 2:27 AM

70 points

5 comments10 min readLW link

Flagging Potentially Unfair Parenting

jefftkDec 26, 2023, 12:40 PM

69 points

1 comment1 min readLW link

(www.jefftk.com)

Meetup Tip: Heartbeat Messages

ScrewtapeDec 7, 2023, 5:18 PM

69 points

4 comments3 min readLW link

We’re all in this together

Tamsin LeakeDec 5, 2023, 1:57 PM

69 points

65 comments2 min readLW link

Don’t Share Information Exfohazardous on Others’ AI-Risk Models

Thane RuthenisDec 19, 2023, 8:09 PM

68 points

11 comments1 min readLW link

AI #42: The Wrong Answer

ZviDec 14, 2023, 2:50 PM

67 points

6 comments54 min readLW link

(thezvi.wordpress.com)

Out-of-distribution Bioattacks

jefftkDec 2, 2023, 12:20 PM

66 points

15 comments2 min readLW link

(www.jefftk.com)

Funding case: AI Safety Camp 10

Remmelt and Linda Linsefors

Dec 12, 2023, 9:08 AM

66 points

5 comments6 min readLW link

(manifund.org)

How LDT helps reduce the AI arms race

Tamsin LeakeDec 10, 2023, 4:21 PM

65 points

13 comments4 min readLW link

(carado.moe)

Complex systems research as a field (and its relevance to AI Alignment)

Nora_Ammann and habryka

Dec 1, 2023, 10:10 PM

65 points

11 comments19 min readLW link

METR is hiring!

Beth BarnesDec 26, 2023, 9:00 PM

65 points

1 comment1 min readLW link

E.T. Jaynes Probability Theory: The logic of Science I

Jan Christian Refsgaard and dentalperson

Dec 27, 2023, 11:47 PM

63 points

20 comments21 min readLW link

Balsa Update and General Thank You

ZviDec 12, 2023, 8:30 PM

61 points

8 comments8 min readLW link

(thezvi.wordpress.com)

AI Safety Chatbot

markov and Robert Miles

Dec 21, 2023, 2:06 PM

61 points

11 comments4 min readLW link

Some negative steganography results

Fabien RogerDec 9, 2023, 8:22 PM

60 points

5 comments2 min readLW link

Originality vs. Correctness

alkjash and habryka

Dec 6, 2023, 6:51 PM

60 points

17 comments25 min readLW link

In Defense of Epistemic Empathy

Kevin DorstDec 27, 2023, 4:27 PM

60 points

19 comments6 min readLW link

(kevindorst.substack.com)

Are There Examples of Overhang for Other Technologies?

Jeffrey HeningerDec 13, 2023, 9:48 PM

59 points

50 comments11 min readLW link

(blog.aiimpacts.org)

Talk: “AI Would Be A Lot Less Alarming If We Understood Agents”

johnswentworthDec 17, 2023, 11:46 PM

58 points

3 comments1 min readLW link

(www.youtube.com)

The LessWrong 2022 Review: Review Phase

RobertMDec 22, 2023, 3:23 AM

58 points

7 comments2 min readLW link

Measurement tampering detection as a special case of weak-to-strong generalization

ryan_greenblatt, Fabien Roger and Buck

Dec 23, 2023, 12:05 AM

57 points

10 comments4 min readLW link

Meditations on Mot

Richard_NgoDec 4, 2023, 12:19 AM

56 points

11 comments8 min readLW link

(www.mindthefuture.info)

The predictive power of dissipative adaptation

dr_sDec 17, 2023, 2:01 PM

56 points

14 comments19 min readLW link

The Best of Don’t Worry About the Vase

ZviDec 13, 2023, 12:50 PM

55 points

4 comments13 min readLW link

(thezvi.wordpress.com)

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

leogaoDec 16, 2023, 5:39 AM

55 points

5 comments1 min readLW link

Anthropical Paradoxes are Paradoxes of Probability Theory

Ape in the coatDec 6, 2023, 8:16 AM

55 points

18 comments5 min readLW link

Google Gemini Announced

Jacob G-WDec 6, 2023, 4:14 PM

54 points

22 comments1 min readLW link

(blog.google)

the micro-fulfillment cambrian explosion

bhauthDec 4, 2023, 1:15 AM

54 points

5 comments4 min readLW link

(www.bhauth.com)

AI #44: Copyright Confrontation

ZviDec 28, 2023, 2:30 PM

54 points

13 comments43 min readLW link

(thezvi.wordpress.com)

2022 (and All Time) Posts by Pingback Count

RaemonDec 16, 2023, 9:17 PM

53 points

14 comments6 min readLW link

AI #43: Functional Discoveries

ZviDec 21, 2023, 3:50 PM

52 points

26 comments49 min readLW link

(thezvi.wordpress.com)

Pseudonymity and Accusations

jefftkDec 21, 2023, 7:20 PM

52 points

20 comments3 min readLW link

(www.jefftk.com)

n of m ring signatures

DanielFilanDec 4, 2023, 8:00 PM

51 points

7 comments1 min readLW link

(danielfilan.com)

Will 2024 be very hot? Should we be worried?

A.H.Dec 29, 2023, 11:22 AM

51 points

12 comments10 min readLW link

Goal-Completeness is like Turing-Completeness for AGI

LironDec 19, 2023, 6:12 PM

51 points

26 comments3 min readLW link

On OpenAI’s Preparedness Framework

ZviDec 21, 2023, 2:00 PM

51 points

4 comments21 min readLW link

(thezvi.wordpress.com)

The Shortest Path Between Scylla and Charybdis

Thane RuthenisDec 18, 2023, 8:08 PM

50 points

8 comments5 min readLW link

Gemini 1.0

ZviDec 7, 2023, 2:40 PM

50 points

7 comments9 min readLW link

(thezvi.wordpress.com)

Bounty: Diverse hard tasks for LLM agents

Beth Barnes and Megan Kinniment

Dec 17, 2023, 1:04 AM

49 points

31 comments16 min readLW link

On ‘Responsible Scaling Policies’ (RSPs)

ZviDec 5, 2023, 4:10 PM

48 points

3 comments37 min readLW link

(thezvi.wordpress.com)

What is the next level of rationality?

lsusr and Yoav Ravid

Dec 12, 2023, 8:14 AM

48 points

24 comments7 min readLW link

If Clarity Seems Like Death to Them

Zack_M_DavisDec 30, 2023, 5:40 PM

47 points

192 comments87 min readLW link 1 review

(unremediatedgender.space)

Environmental allergies are curable? (Sublingual immunotherapy)

Chris LakinDec 26, 2023, 7:05 PM

47 points

10 comments1 min readLW link