2 Feb 2024 23:17 UTC

98 points

0 comments9 min readLW link

Survey for alignment researchers!

Cameron Berg, Kvee and Trent Hodgeson

2 Feb 2024 20:41 UTC

71 points

11 comments1 min readLW link

Voting Results for the 2022 Review

Ben Pace2 Feb 2024 20:34 UTC

57 points

3 comments73 min readLW link

On Dwarkesh’s 3rd Podcast With Tyler Cowen

Zvi2 Feb 2024 19:30 UTC

36 points

9 comments21 min readLW link

(thezvi.wordpress.com)

Most experts believe COVID-19 was probably not a lab leak

DanielFilan2 Feb 2024 19:28 UTC

66 points

89 comments2 min readLW link

(gcrinstitute.org)

What Failure Looks Like is not an existential risk (and alignment is not the solution)

otto.barten2 Feb 2024 18:59 UTC

14 points

12 comments9 min readLW link

Solving alignment isn’t enough for a flourishing future

mic2 Feb 2024 18:23 UTC

27 points

0 comments22 min readLW link

(papers.ssrn.com)

Manifold Markets

PeterMcCluskey2 Feb 2024 17:48 UTC

26 points

9 comments4 min readLW link

(bayesianinvestor.com)

Types of subjective welfare

MichaelStJules2 Feb 2024 9:56 UTC

10 points

3 comments18 min readLW link

Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small

Joseph Bloom2 Feb 2024 6:54 UTC

103 points

37 comments15 min readLW link

Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities

porby2 Feb 2024 5:49 UTC

48 points

1 comment4 min readLW link

(arxiv.org)

Running a Prediction Market Mafia Game

Arjun Panickssery1 Feb 2024 23:24 UTC

22 points

5 comments1 min readLW link

(arjunpanickssery.substack.com)

Evaluating Stability of Unreflective Alignment

james.lucassen1 Feb 2024 22:15 UTC

63 points

12 comments18 min readLW link

(jlucassen.com)

Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis

simeon_c1 Feb 2024 21:30 UTC

69 points

17 comments1 min readLW link

(www.aria.org.uk)

Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis

RogerDearnaley1 Feb 2024 21:15 UTC

20 points

15 comments13 min readLW link

Wrong answer bias

lemonhope1 Feb 2024 20:05 UTC

83 points

23 comments1 min readLW link

On Not Requiring Vaccination

jefftk1 Feb 2024 19:20 UTC

31 points

21 comments1 min readLW link

(www.jefftk.com)

The economy is mostly newbs (strat predictions)

lemonhope1 Feb 2024 19:15 UTC

27 points

6 comments2 min readLW link

Managing risks while trying to do good

Wei Dai1 Feb 2024 18:08 UTC

76 points

28 comments2 min readLW link

Putting multimodal LLMs to the Tetris test

Lovre and gabrielagc

1 Feb 2024 16:02 UTC

30 points

5 comments7 min readLW link

AI #49: Bioweapon Testing Begins

Zvi1 Feb 2024 15:30 UTC

37 points

11 comments42 min readLW link

(thezvi.wordpress.com)

Some Notes on Ethics

Pareto Optimal1 Feb 2024 10:18 UTC

−3 points

0 comments1 min readLW link

(paretooptimal.substack.com)

Increasingly vague interpersonal welfare comparisons

MichaelStJules1 Feb 2024 6:45 UTC

5 points

0 comments2 min readLW link

PIBBSS Speaker events comings up in February

DusanDNesic, Nora_Ammann and Lucas Teixeira

1 Feb 2024 3:28 UTC

10 points

2 comments1 min readLW link

Drone Wars Endgame

RussellThor1 Feb 2024 2:30 UTC

44 points

75 comments10 min readLW link 1 review

Sequencing Swabs

jefftk1 Feb 2024 1:50 UTC

19 points

1 comment5 min readLW link

(www.jefftk.com)

Leading The Parade

johnswentworth31 Jan 2024 22:39 UTC

150 points

32 comments9 min readLW link 1 review

Proposal for an AI Safety Prize

sweenesm31 Jan 2024 18:35 UTC

3 points

0 comments2 min readLW link

Literally Everything is Infinite

Spiral31 Jan 2024 18:31 UTC

−9 points

8 comments5 min readLW link

What fuels your ambition?

Cissy31 Jan 2024 18:30 UTC

29 points

1 comment5 min readLW link

(www.moremyself.xyz)

“Genlangs” and Zipf’s Law: Do languages generated by ChatGPT statistically look human?

Justin-Diamond31 Jan 2024 18:30 UTC

2 points

2 comments1 min readLW link

(arxiv.org)

AI, Intellectual Property, and the Techno-Optimist Revolution

Justin-Diamond31 Jan 2024 18:30 UTC

1 point

0 comments1 min readLW link

(www.researchgate.net)

My Alignment “Plan”: Avoid Strong Optimisation and Align Economy

VojtaKovarik31 Jan 2024 17:03 UTC

24 points

9 comments7 min readLW link

Per protocol analysis as medical malpractice

braces31 Jan 2024 16:22 UTC

57 points

11 comments1 min readLW link

Adam Smith Meets AI Doomers

James_Miller31 Jan 2024 15:53 UTC

35 points

10 comments5 min readLW link

Ten Modes of Culture War Discourse

jchan31 Jan 2024 13:58 UTC

62 points

16 comments15 min readLW link

Without Fundamental Advances, Rebellion and Coup d’État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries

Roko31 Jan 2024 10:14 UTC

27 points

34 comments1 min readLW link

Explaining Impact Markets

Saul Munn31 Jan 2024 9:51 UTC

95 points

2 comments3 min readLW link

(www.brasstacks.blog)

Exploring OpenAI’s Latent Directions: Tests, Observations, and Poking Around

Johnny Lin31 Jan 2024 6:01 UTC

26 points

4 comments14 min readLW link

Clip keys together with tiny carabiners

Brendan Long31 Jan 2024 4:26 UTC

11 points

5 comments1 min readLW link

(www.brendanlong.com)

The problem with proportional extrapolation

pathos_bot30 Jan 2024 23:40 UTC

8 points

0 comments1 min readLW link

Counterfactual Mechanism Networks

StrivingForLegibility30 Jan 2024 20:30 UTC

5 points

0 comments5 min readLW link

Control vs Selection: Civilisation is best at control, but navigating AGI requires selection

VojtaKovarik30 Jan 2024 19:06 UTC

7 points

1 comment1 min readLW link

AI governance frames

NathanBarnard30 Jan 2024 18:18 UTC

3 points

0 comments3 min readLW link

Deciding What Project/Org to Start: A Guide to Prioritization Research

Alexandra Bos30 Jan 2024 18:13 UTC

8 points

0 comments7 min readLW link

on neodymium magnets

bhauth30 Jan 2024 15:58 UTC

47 points

6 comments4 min readLW link

(www.bhauth.com)

[Question] Can we create self-improving AIs that perfect their own ethics?

Gabi QUENE30 Jan 2024 14:45 UTC

1 point

10 comments1 min readLW link

Childhood and Education Roundup #4

Zvi30 Jan 2024 13:50 UTC

44 points

10 comments24 min readLW link

(thezvi.wordpress.com)

Last call for submissions for TAIS 2024!

Blaine30 Jan 2024 12:08 UTC

4 points

0 comments1 min readLW link

(tais2024.cc)

[Question] Has anyone actually changed their mind regarding Sleeping Beauty problem?

Ape in the coat30 Jan 2024 8:34 UTC

15 points

51 comments1 min readLW link