All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30 31

Simple alignment plan that maybe works

Iknownothing18 Jul 2023 22:48 UTC

4 points

8 comments1 min readLW link

Prospera-dump

tailcalled18 Jul 2023 21:36 UTC

11 points

16 comments1 min readLW link

Tiny Mech Interp Projects: Emergent Positional Embeddings of Words

Neel Nanda18 Jul 2023 21:24 UTC

52 points

1 comment9 min readLW link

Quick Thoughts on Language Models

RohanS18 Jul 2023 20:38 UTC

6 points

0 comments4 min readLW link

Still no Lie Detector for LLMs

Daniel Herrmann and ben_levinstein

18 Jul 2023 19:56 UTC

50 points

3 comments21 min readLW link

Meta announces Llama 2; “open sources” it for commercial use

LawrenceC18 Jul 2023 19:28 UTC

46 points

12 comments1 min readLW link

(about.fb.com)

The Rope Management Theory: A Comprehensive Approach to Modulating Reward Perception and Mitigating Hedonic Adaptation

Eris Discordia18 Jul 2023 17:45 UTC

−23 points

2 comments3 min readLW link

AI Impacts Quarterly Newsletter, Apr-Jun 2023

Harlan and Richard Korzekwa

18 Jul 2023 17:14 UTC

6 points

0 comments3 min readLW link

(blog.aiimpacts.org)

Clever arguers give weak evidence, not zero

dkl918 Jul 2023 17:07 UTC

7 points

2 comments1 min readLW link

(dkl9.net)

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman and Ethan Perez

18 Jul 2023 16:36 UTC

111 points

15 comments6 min readLW link 1 review

[Question] Least-problematic Resource for learning RL?

Dalcy18 Jul 2023 16:30 UTC

24 points

9 comments1 min readLW link

Charter Cities: why they’re exciting & how they might work

Jackson Wagner18 Jul 2023 13:57 UTC

21 points

7 comments8 min readLW link

Train for incorrigibility, then reverse it (Shutdown Problem Contest Submission)

Daniel_Eth18 Jul 2023 8:26 UTC

9 points

1 comment2 min readLW link

The shape of AGI: Cartoons and back of envelope

Boaz Barak17 Jul 2023 20:57 UTC

33 points

19 comments6 min readLW link 1 review

Predictive history classes

dkl917 Jul 2023 20:48 UTC

69 points

17 comments2 min readLW link

(dkl9.net)

Highlights from The Industrial Revolution, by T. S. Ashton

jasoncrawford17 Jul 2023 19:02 UTC

17 points

0 comments10 min readLW link

(rootsofprogress.org)

Existential Risk Persuasion Tournament

PeterMcCluskey17 Jul 2023 18:04 UTC

73 points

1 comment8 min readLW link

(bayesianinvestor.com)

[Interview w/ Rob Miles] The case for taking AI Safety seriously

fowlertm17 Jul 2023 17:08 UTC

17 points

1 comment1 min readLW link

Announcing the Existential InfoSec Forum

calebp9917 Jul 2023 17:05 UTC

10 points

0 comments2 min readLW link

Sapient Algorithms

Valentine17 Jul 2023 16:30 UTC

87 points

15 comments5 min readLW link

AI safety technical research—Career review

Benjamin Hilton17 Jul 2023 15:34 UTC

14 points

0 comments29 min readLW link

[Question] Conditional on living in a AI safety/alignment by default universe, what are the implications of this assumption being true?

Noosphere8917 Jul 2023 14:44 UTC

26 points

10 comments1 min readLW link

Thoughts on “Process-Based Supervision” / MONA

Steven Byrnes17 Jul 2023 14:08 UTC

79 points

4 comments23 min readLW link

Proof of posteriority: a defense against AI-generated misinformation

jchan17 Jul 2023 12:04 UTC

33 points

3 comments5 min readLW link

An Overview of AI risks—the Flyer

Charbel-Raphaël, Jonathan Claybrough and tchauvin

17 Jul 2023 12:03 UTC

20 points

0 comments1 min readLW link

(docs.google.com)

[Question] Build knowledge base first, or backchain?

Nicholas Kross17 Jul 2023 3:44 UTC

11 points

5 comments1 min readLW link

A fictional AI law laced w/ alignment theory

MiguelDev17 Jul 2023 1:42 UTC

6 points

0 comments2 min readLW link

AutoInterpretation Finds Sparse Coding Beats Alternatives

Hoagy17 Jul 2023 1:41 UTC

56 points

1 comment7 min readLW link

An upcoming US Supreme Court case may impede AI governance efforts

NickGabs16 Jul 2023 23:51 UTC

57 points

17 comments2 min readLW link

Weak Evidence is Common

dkl916 Jul 2023 23:37 UTC

7 points

5 comments1 min readLW link

(dkl9.net)

Even briefer summary of ai-plans.com

Iknownothing16 Jul 2023 23:25 UTC

10 points

6 comments2 min readLW link

(www.ai-plans.com)

Mech Interp Puzzle 1: Suspiciously Similar Embeddings in GPT-Neo

Neel Nanda16 Jul 2023 22:02 UTC

67 points

15 comments1 min readLW link

A Technology of Everything – Part 1: A Magical Science Experiment

aiuisensei16 Jul 2023 22:01 UTC

−3 points

0 comments7 min readLW link

(www.aiui.cloud)

Runaway Optimizers in Mind Space

silentbob16 Jul 2023 14:26 UTC

16 points

0 comments12 min readLW link

[Question] Is Adam Elga’s proof for thirdism in Sleeping Beauty still considered to be sound?

Ape in the coat16 Jul 2023 14:11 UTC

8 points

25 comments1 min readLW link

A simple way of exploiting AI’s coming economic impact may be highly-impactful

kuira16 Jul 2023 9:33 UTC

11 points

2 comments2 min readLW link

Activation adding experiments with llama-7b

Nina Panickssery16 Jul 2023 4:17 UTC

51 points

1 comment3 min readLW link

Introducción al Riesgo Existencial de Inteligencia Artificial

david.friva15 Jul 2023 20:37 UTC

4 points

2 comments4 min readLW link

(youtu.be)

The housing crisis, explained using game theory

Johnstone15 Jul 2023 20:27 UTC

4 points

2 comments8 min readLW link

Only a hack can solve the shutdown problem

dp15 Jul 2023 20:26 UTC

5 points

0 comments8 min readLW link

Robustness of Model-Graded Evaluations and Automated Interpretability

Simon Lermen and viluon

15 Jul 2023 19:12 UTC

47 points

5 comments9 min readLW link

[Question] How to deal with fear of failure?

TeaTieAndHat15 Jul 2023 18:57 UTC

8 points

2 comments1 min readLW link

Simplified bio-anchors for upper bounds on AI timelines

Fabien Roger15 Jul 2023 18:15 UTC

21 points

4 comments5 min readLW link

A Hill of Validity in Defense of Meaning

Zack_M_Davis15 Jul 2023 17:57 UTC

28 points

121 comments73 min readLW link 1 review

(unremediatedgender.space)

What is a cognitive bias?

Lionel15 Jul 2023 13:01 UTC

1 point

0 comments2 min readLW link

(lionelpage.substack.com)

[Question] When people say robots will steal jobs, what kinds of jobs are never implied?

Mary Chernyshenko15 Jul 2023 10:50 UTC

5 points

12 comments1 min readLW link

How to use ChatGPT to get better book & movie recommendations

KatWoods15 Jul 2023 8:55 UTC

29 points

3 comments1 min readLW link

Rationality, Pedagogy, and “Vibes”: Quick Thoughts

Nicholas Kross15 Jul 2023 2:09 UTC

14 points

1 comment4 min readLW link

(redacted) Anomalous tokens might disproportionately affect complex language tasks

Nikola Jurkovic15 Jul 2023 0:48 UTC

4 points

0 comments7 min readLW link

Why was the AI Alignment community so unprepared for this moment?

Ras151315 Jul 2023 0:26 UTC

123 points

65 comments2 min readLW link