All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] Is there any metric measuring ~”proportion of people creating extra value”?

Amal 3 Aug 2023 22:54 UTC

7 points

3 comments1 min readLW link

[Question] Hypothetical: what would you do?

JNS3 Aug 2023 22:39 UTC

4 points

2 comments1 min readLW link

[Linkpost] Deception Abilities Emerged in Large Language Models

Bogdan Ionut Cirstea3 Aug 2023 17:28 UTC

12 points

0 comments1 min readLW link

Embedding Ethical Priors into AI Systems: A Bayesian Approach

Justausername3 Aug 2023 15:31 UTC

−5 points

3 comments21 min readLW link

Password-locked models: a stress case for capabilities evaluation

Fabien Roger3 Aug 2023 14:53 UTC

156 points

14 comments6 min readLW link

AI #23: Fundamental Problems with RLHF

Zvi3 Aug 2023 12:50 UTC

59 points

9 comments41 min readLW link

(thezvi.wordpress.com)

Bad Imitation Instruments

jefftk3 Aug 2023 2:30 UTC

21 points

1 comment1 min readLW link

(www.jefftk.com)

Kolmogorov’s theory of Algorithmic Probability

Aidan Rocke3 Aug 2023 0:58 UTC

6 points

2 comments2 min readLW link

(keplerlounge.com)

Work culture creep

CrimsonChin3 Aug 2023 0:38 UTC

34 points

16 comments8 min readLW link

[Question] Boxing

Zach Stein-Perlman2 Aug 2023 23:38 UTC

6 points

1 comment1 min readLW link

External rationality vs. internal rationality

metachirality2 Aug 2023 23:29 UTC

7 points

0 comments1 min readLW link

When performing a dimensionality reduction on tensors, the trace is often zero.

Joseph Van Name2 Aug 2023 21:06 UTC

7 points

1 comment3 min readLW link

Progress links digest, 2023-08-02: Superconductor edition

jasoncrawford2 Aug 2023 20:27 UTC

13 points

0 comments3 min readLW link

(rootsofprogress.org)

[Question] What works for ADHD and/or related things?

TeaTieAndHat2 Aug 2023 18:37 UTC

9 points

13 comments1 min readLW link

[Question] Would you pay for a search engine limited to rationalist sites?

Conor2 Aug 2023 18:06 UTC

4 points

19 comments1 min readLW link

The Roots of Progress Blog-Building Intensive: advice for applicants, request for support

jasoncrawford2 Aug 2023 15:37 UTC

9 points

0 comments1 min readLW link

(rootsofprogress.org)

3 levels of threat obfuscation

HoldenKarnofsky2 Aug 2023 14:58 UTC

71 points

14 comments7 min readLW link

ChatGPT for translation

Varshul Gupta2 Aug 2023 11:57 UTC

1 point

0 comments3 min readLW link

(dubverseblack.substack.com)

Long-Term Future Fund: April 2023 grant recommendations

abergal, calebp99, Linch, habryka, Thomas Larsen and Vaniver

2 Aug 2023 7:54 UTC

81 points

3 comments50 min readLW link

[Question] Could we breed/engineer intelligent parrots?

lemonhope2 Aug 2023 7:32 UTC

9 points

18 comments1 min readLW link

Anthropical Motte and Bailey in two versions of Sleeping Beauty

Ape in the coat2 Aug 2023 7:08 UTC

32 points

57 comments6 min readLW link

solar-thermal and techno-economic analysis

bhauth2 Aug 2023 6:22 UTC

21 points

8 comments5 min readLW link

(www.bhauth.com)

South Bay ACX/SSC Meetup @ Whole Foods

allisona2 Aug 2023 3:44 UTC

1 point

0 comments1 min readLW link

“Is There Anything That’s Worth More”

Zack_M_Davis2 Aug 2023 3:28 UTC

64 points

6 comments1 min readLW link

Bay Winter Solstice: call for speech pitches!

tcheasdfjkl2 Aug 2023 3:24 UTC

9 points

0 comments1 min readLW link

(docs.google.com)

[Question] What is ontology?

Adam Zerner2 Aug 2023 0:54 UTC

28 points

19 comments1 min readLW link

My current LK99 questions

Eliezer Yudkowsky1 Aug 2023 22:48 UTC

211 points

38 comments5 min readLW link

Spiral Staircase

Michael Samoilov1 Aug 2023 21:51 UTC

21 points

2 comments2 min readLW link

Open Mic—August 2023

Adam Zerner1 Aug 2023 19:24 UTC

8 points

0 comments1 min readLW link

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC

153 points

12 comments5 min readLW link

(evals.alignment.org)

[Question] When(if ever) are superstimuli good/useful/advantageous?

Perhaps1 Aug 2023 15:50 UTC

−7 points

2 comments1 min readLW link

AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight

Dan H1 Aug 2023 15:40 UTC

8 points

0 comments8 min readLW link

(newsletter.safe.ai)

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Dan H and Corin Katzke

1 Aug 2023 15:39 UTC

3 points

0 comments6 min readLW link

(newsletter.safe.ai)

“Desperate Honesty” by Agnes Callard

David Gross1 Aug 2023 13:34 UTC

11 points

0 comments2 min readLW link

(dailynous.com)

Barbieheimer: Across the Dead Reckoning

Zvi1 Aug 2023 13:00 UTC

49 points

17 comments41 min readLW link

(thezvi.wordpress.com)

Untangling Infrabayesianism: A redistillation [PDF link; ~12k words + lots of math]

Lorxus1 Aug 2023 12:42 UTC

29 points

22 comments2 min readLW link

(docdro.id)

What Is Childhood Supposed To Be?

Sable1 Aug 2023 9:51 UTC

21 points

13 comments3 min readLW link

(affablyevil.substack.com)

AI romantic partners will harm society if they go unregulated

Roman Leventov1 Aug 2023 9:32 UTC

27 points

76 comments13 min readLW link

What is autonomy, and how does it lead to greater risk from AI?

Davidmanheim1 Aug 2023 7:58 UTC

30 points

0 comments6 min readLW link

Evaluating Superhuman Models with Consistency Checks

Daniel Paleka and Lukas Fluri

1 Aug 2023 7:51 UTC

21 points

2 comments9 min readLW link

(arxiv.org)

[See link to Sept meetup below!] San Francisco ACX Meetup “First Saturday” August 5, 1 pm

guenael1 Aug 2023 3:38 UTC

1 point

0 comments1 min readLW link

[Question] Exercise: Solve “Thinking Physics”

Raemon1 Aug 2023 0:44 UTC

102 points

30 comments5 min readLW link 1 review

The “public debate” about AI is confusing for the general public and for policymakers because it is a three-sided debate

Adam David Long1 Aug 2023 0:08 UTC

146 points

30 comments4 min readLW link

The “no sandbagging on checkable tasks” hypothesis

Joe Carlsmith31 Jul 2023 23:06 UTC

61 points

14 comments9 min readLW link

A Social History of Truth

Vaniver31 Jul 2023 22:49 UTC

70 points

2 comments14 min readLW link

Watermarking considered overrated?

DanielFilan31 Jul 2023 21:36 UTC

19 points

4 comments1 min readLW link

What The Lord of the Rings Teaches Us About AI Alignment

Jeffrey Heninger31 Jul 2023 20:16 UTC

25 points

12 comments7 min readLW link

The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited

mwatkins31 Jul 2023 19:47 UTC

85 points

29 comments20 min readLW link

“Building a House” Review

jefftk31 Jul 2023 19:20 UTC

64 points

6 comments1 min readLW link

(www.jefftk.com)

The Meaning of Shoggoth AI Memes

Dan Smith31 Jul 2023 18:52 UTC

−1 points

5 comments2 min readLW link