14 points

3 comments1 min readLW link

A plea for more funding shortfall transparency

porby7 Aug 2023 21:33 UTC

73 points

4 comments2 min readLW link

[Question] Tips for reducing thinking branching factor

Simon Berens7 Aug 2023 20:21 UTC

4 points

6 comments1 min readLW link

An interactive introduction to grokking and mechanistic interpretability

Adam Pearce and Asma Ghandeharioun

7 Aug 2023 19:09 UTC

23 points

3 comments1 min readLW link

(pair.withgoogle.com)

Feedbackloop-first Rationality

Raemon7 Aug 2023 17:58 UTC

211 points

69 comments8 min readLW link 2 reviews

Growing Bonsai Networks with RNNs

ameo7 Aug 2023 17:34 UTC

21 points

5 comments1 min readLW link

(cprimozic.net)

[Question] Should I test myself for microplastics?

Augs7 Aug 2023 17:31 UTC

9 points

2 comments1 min readLW link

Optimisation Measures: Desiderata, Impossibility, Proposals

mattmacdermott and Alexander Gietelink Oldenziel

7 Aug 2023 15:52 UTC

36 points

9 comments1 min readLW link

Announcing the Clearer Thinking micro-grants program for 2023

spencerg7 Aug 2023 15:21 UTC

14 points

1 comment1 min readLW link

(www.clearerthinking.org)

What I’ve been reading, July–August 2023

jasoncrawford7 Aug 2023 14:22 UTC

23 points

0 comments13 min readLW link

(rootsofprogress.org)

Monthly Roundup #9: August 2023

Zvi7 Aug 2023 13:20 UTC

42 points

25 comments57 min readLW link

(thezvi.wordpress.com)

Strengthening the Argument for Intrinsic AI Safety: The S-Curves Perspective

avturchin7 Aug 2023 13:13 UTC

8 points

0 comments12 min readLW link

Overview of how AI might exacerbate long-running catastrophic risks

Hauke Hillebrandt7 Aug 2023 11:53 UTC

20 points

0 comments11 min readLW link

(aisafetyfundamentals.com)

Drinks at a bar

yakimoff7 Aug 2023 2:52 UTC

3 points

0 comments1 min readLW link

Problems with Robin Hanson’s Quillette Article On AI

DaemonicSigil6 Aug 2023 22:13 UTC

89 points

33 comments8 min readLW link

Yann LeCun on AGI and AI Safety

Chris_Leong6 Aug 2023 21:56 UTC

37 points

13 comments1 min readLW link

(drive.google.com)

Computational Thread Art

CallumMcDougall6 Aug 2023 21:42 UTC

76 points

2 comments6 min readLW link

‘We’re changing the clouds.’ An unforeseen test of geoengineering is fueling record ocean warmth

Annapurna6 Aug 2023 20:58 UTC

60 points

6 comments1 min readLW link

(www.science.org)

[Linkpost] Will AI avoid exploitation?

cdkg6 Aug 2023 14:28 UTC

22 points

1 comment1 min readLW link

Reducing the risk of catastrophically misaligned AI by avoiding the Singleton scenario: the Manyton Variant

GravitasGradient6 Aug 2023 14:24 UTC

−6 points

0 comments3 min readLW link

Rebooting AI Governance: An AI-Driven Approach to AI Governance

utilon6 Aug 2023 14:19 UTC

1 point

1 comment29 min readLW link

(forum.effectivealtruism.org)

Model-Based Policy Analysis under Deep Uncertainty

utilon6 Aug 2023 14:07 UTC

16 points

1 comment23 min readLW link

(forum.effectivealtruism.org)

[Question] On being in a bad place and too stubborn to leave.

TeaTieAndHat6 Aug 2023 11:45 UTC

12 points

14 comments3 min readLW link

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Brendon_Wong6 Aug 2023 8:02 UTC

13 points

2 comments12 min readLW link

The Benevolent Ruler’s Handbook (Part 1): The Policy Problem

FCCC6 Aug 2023 3:46 UTC

11 points

3 comments4 min readLW link

Exploring the Multiverse of Large Language Models

franky6 Aug 2023 2:38 UTC

1 point

0 comments5 min readLW link

Aligning my web server with devops practices: part 2 (security)

VipulNaik6 Aug 2023 1:30 UTC

6 points

0 comments19 min readLW link

how 2 tell if ur input is out of distribution given only model weights

dkirmani5 Aug 2023 22:45 UTC

49 points

10 comments1 min readLW link

Summary of Improving Global Decision Making (around AI)

Will_Pearson5 Aug 2023 18:46 UTC

−7 points

0 comments1 min readLW link

Ground-Truth Label Imbalance Impairs the Performance of Contrast-Consistent Search (and Other Contrast-Pair-Based Unsupervised Methods)

Tom Angsten and Ami Hays

5 Aug 2023 17:55 UTC

6 points

2 comments7 min readLW link

(drive.google.com)

Seattle Astral Codex Ten Monthly Social

a7x5 Aug 2023 17:55 UTC

1 point

0 comments1 min readLW link

AISafety.info’s Writing & Editing Hackathon

smallsilo5 Aug 2023 17:14 UTC

2 points

0 comments1 min readLW link

Join AISafety.info’s Writing & Editing Hackathon (Aug 25-28) (Prizes to be won!)

smallsilo5 Aug 2023 14:08 UTC

19 points

3 comments1 min readLW link

(forum.effectivealtruism.org)

Stomach Ulcers and Dental Cavities

Metacelsus5 Aug 2023 14:08 UTC

57 points

7 comments1 min readLW link

(denovo.substack.com)

video games > IQ tests

bhauth5 Aug 2023 13:27 UTC

34 points

46 comments3 min readLW link

[Linkpost] Applicability of scaling laws to vision encoding models

Bogdan Ionut Cirstea5 Aug 2023 11:10 UTC

11 points

2 comments1 min readLW link

A Naive Proposal for Constructing Interpretable AI

Chris_Leong5 Aug 2023 10:32 UTC

18 points

6 comments2 min readLW link

ACX Paris Meetup—August 11 2023

PoignardAzur5 Aug 2023 9:44 UTC

2 points

0 comments1 min readLW link

Meet Hyperion on Sunday Aug 6?

duck_master5 Aug 2023 4:36 UTC

1 point

0 comments1 min readLW link

[Question] What are the best published papers from outside the alignment community that are relevant to Agent Foundations?

Stephen Fowler5 Aug 2023 3:02 UTC

20 points

4 comments1 min readLW link

Announcing Squiggle Hub

ozziegooen and Slava Matyukhin

5 Aug 2023 1:00 UTC

49 points

4 comments5 min readLW link

(forum.effectivealtruism.org)

Read More Books but Pretend to Read Even More

Arjun Panickssery5 Aug 2023 0:07 UTC

26 points

12 comments4 min readLW link

(arjunpanickssery.substack.com)

The Sinews of Sudan’s Latest War

Tim Liptrot4 Aug 2023 18:17 UTC

43 points

12 comments12 min readLW link

Private notes on LW?

Raemon4 Aug 2023 17:35 UTC

61 points

33 comments1 min readLW link

When training AI, we should escalate the frequency of capability tests

Hauke Hillebrandt4 Aug 2023 16:07 UTC

2 points

0 comments1 min readLW link

Manifund: What we’re funding (weeks 2-4)

Austin Chen4 Aug 2023 16:00 UTC

44 points

2 comments5 min readLW link

(manifund.substack.com)

[Linkpost] Multimodal Neurons in Pretrained Text-Only Transformers

Bogdan Ionut Cirstea4 Aug 2023 15:29 UTC

11 points

0 comments1 min readLW link

Apollo Research is hiring evals and interpretability engineers & scientists

Marius Hobbhahn4 Aug 2023 10:54 UTC

25 points

0 comments2 min readLW link

[Question] Has anyone tried creating a YouTube or TikTok series covering the sequences?

Max Rossi4 Aug 2023 0:10 UTC

4 points

4 comments1 min readLW link

[Question] Is there any metric measuring ~”proportion of people creating extra value”?

Amal 3 Aug 2023 22:54 UTC

7 points

3 comments1 min readLW link