Cohesion and business problems

Adam Zerner19 Apr 2024 0:45 UTC

8 points

1 comment4 min readLW link

The Thermodynamics of Death

Peter lawless 19 Apr 2024 0:36 UTC

1 point

0 comments10 min readLW link

hydrogen tube transport

bhauth18 Apr 2024 22:47 UTC

20 points

2 comments5 min readLW link

(www.bhauth.com)

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

alamerton18 Apr 2024 18:29 UTC

13 points

1 comment15 min readLW link

Blessed information, garbage information, cursed information

tailcalled18 Apr 2024 16:56 UTC

19 points

2 comments3 min readLW link

[Fiction] A Confession

Arjun Panickssery18 Apr 2024 16:28 UTC

28 points

3 comments5 min readLW link

(arjunpanickssery.substack.com)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

Sam Marks18 Apr 2024 16:17 UTC

61 points

0 comments12 min readLW link

Cooperation is optimal, with weaker agents too - tldr

Ryo 18 Apr 2024 15:03 UTC

10 points

14 comments4 min readLW link

(medium.com)

How to coordinate despite our biases? - tldr

Ryo 18 Apr 2024 15:03 UTC

3 points

2 comments3 min readLW link

(medium.com)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

Diffractor18 Apr 2024 8:39 UTC

27 points

0 comments19 min readLW link

An examination of GPT-2′s boring yet effective glitch

MiguelDev18 Apr 2024 5:26 UTC

5 points

3 comments3 min readLW link

[Question] What if Ethics is Provably Self-Contradictory?

Yitz18 Apr 2024 5:12 UTC

2 points

5 comments2 min readLW link

The Mom Test: Summary and Thoughts

Adam Zerner18 Apr 2024 3:34 UTC

37 points

1 comment10 min readLW link

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth and David Lorell

18 Apr 2024 0:27 UTC

112 points

13 comments7 min readLW link

AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil

DanielFilan17 Apr 2024 21:42 UTC

10 points

0 comments65 min readLW link

LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery, Sam Bowman and Shi Feng

17 Apr 2024 21:09 UTC

26 points

1 comment3 min readLW link

(tiny.cc)

An ethical framework to supersede Utilitarianism

metalcrow17 Apr 2024 17:18 UTC

1 point

4 comments4 min readLW link

Moving on from community living

Vika17 Apr 2024 17:02 UTC

48 points

6 comments3 min readLW link

(vkrakovna.wordpress.com)

Staged release

Zach Stein-Perlman17 Apr 2024 16:00 UTC

9 points

4 comments2 min readLW link

[Question] Discomfort Stacking

Lewis O’Brien17 Apr 2024 14:49 UTC

5 points

11 comments1 min readLW link