12 Jul 2024 23:42 UTC

67 points

6 comments2 min readLW link

Consider attending the AI Security Forum ’24, a 1-day pre-DEFCON event

Charlie Rogers-Smith12 Jul 2024 23:01 UTC

21 points

0 comments1 min readLW link

Memorising molecular structures

dkl912 Jul 2024 22:40 UTC

6 points

0 comments2 min readLW link

(dkl9.net)

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron12 Jul 2024 21:31 UTC

46 points

7 comments45 min readLW link

(www.youtube.com)

Designing Artificial Wisdom: The Wise Workflow Research Organization

Jordan Arel12 Jul 2024 19:18 UTC

2 points

0 comments8 min readLW link

Whiteboard Pen Magazines are Useful

Johannes C. Mayer12 Jul 2024 17:15 UTC

45 points

9 comments1 min readLW link 1 review

Alignment: “Do what I would have wanted you to do”

Oleg Trott12 Jul 2024 16:47 UTC

11 points

48 comments1 min readLW link

Virtue taxation

Dentosal12 Jul 2024 14:56 UTC

9 points

1 comment2 min readLW link

Most smart and skilled people are outside of the EA/rationalist community: an analysis

titotal12 Jul 2024 12:13 UTC

112 points

39 comments14 min readLW link

(open.substack.com)

2024 Freedom Communities Events

Tudor Iliescu12 Jul 2024 8:04 UTC

−6 points

1 comment1 min readLW link

Faithful vs Interpretable Sparse Autoencoder Evals

Louka Ewington-Pitsos12 Jul 2024 5:37 UTC

2 points

0 comments12 min readLW link

Moving away from physical continuity

ProgramCrafter12 Jul 2024 5:05 UTC

2 points

1 comment1 min readLW link

Transformer Circuit Faithfulness Metrics Are Not Robust

Joseph Miller, bilalchughtai and William_S

12 Jul 2024 3:47 UTC

104 points

5 comments7 min readLW link

(arxiv.org)

On Artificial Wisdom

Jordan Arel12 Jul 2024 0:20 UTC

3 points

0 comments14 min readLW link

Yoshua Bengio: Reasoning through arguments against taking AI safety seriously

Kvee11 Jul 2024 23:53 UTC

72 points

3 comments1 min readLW link

(yoshuabengio.org)

Podcast: “How the Smart Money teaches trading with Ricki Heicklen” (Patrick McKenzie interviewing)

rossry11 Jul 2024 22:49 UTC

20 points

2 comments1 min readLW link

(www.complexsystemspodcast.com)

Superbabies: Putting The Pieces Together

sarahconstantin11 Jul 2024 20:40 UTC

221 points

42 comments10 min readLW link 3 reviews

(sarahconstantin.substack.com)

Sherlockian Abduction Master List

Cole Wyeth11 Jul 2024 20:27 UTC

53 points

67 comments36 min readLW link

Thoughts to niplav on lie-detection, truthfwl mechanisms, and wealth-inequality

Emrik and niplav

11 Jul 2024 18:55 UTC

7 points

8 comments11 min readLW link

Games for AI Control

Charlie Griffin and Buck

11 Jul 2024 18:40 UTC

45 points

0 comments5 min readLW link

Video Intro to Guaranteed Safe AI

Mike Vaiana, Diogo de Lucena and Trent Hodgeson

11 Jul 2024 17:53 UTC

27 points

0 comments1 min readLW link

(youtu.be)

Effective Empathy

Thac011 Jul 2024 15:14 UTC

4 points

1 comment1 min readLW link

AI #72: Denying the Future

Zvi11 Jul 2024 15:00 UTC

46 points

8 comments41 min readLW link

(thezvi.wordpress.com)

The Best Bits From Build, Baby, Build

Maxwell Tabarrok11 Jul 2024 14:09 UTC

23 points

0 comments4 min readLW link

(www.maximum-progress.com)

[Question] What Other Lines of Work are Safe from AI Automation?

RogerDearnaley11 Jul 2024 10:01 UTC

40 points

36 comments5 min readLW link

Decomposing Agency — capabilities without desires

owencb and Raymond Douglas

11 Jul 2024 9:38 UTC

157 points

33 comments12 min readLW link 1 review

(strangecities.substack.com)

Reliable Sources: The Story of David Gerard

TracingWoodgrains10 Jul 2024 19:50 UTC

411 points

56 comments43 min readLW link 2 reviews

Managing Emotional Potential Energy

adamShimi10 Jul 2024 18:20 UTC

24 points

4 comments4 min readLW link

(epistemologicalfascinations.substack.com)

[EAForum xpost] A breakdown of OpenAI’s revenue

dschwarz and Lawrence Phillips

10 Jul 2024 18:09 UTC

57 points

5 comments1 min readLW link

(forum.effectivealtruism.org)

Solving Pascal’s Wager using dynamic programming

Paul Wilczewski10 Jul 2024 18:09 UTC

1 point

0 comments5 min readLW link

Fluent, Cruxy Predictions

Raemon10 Jul 2024 18:00 UTC

86 points

18 comments14 min readLW link 1 review

Antitrust as Controlled Creative Destruction

Martin Sustrik10 Jul 2024 16:40 UTC

14 points

2 comments2 min readLW link

(250bpm.substack.com)

New page: Integrity

Zach Stein-Perlman10 Jul 2024 15:00 UTC

91 points

3 comments1 min readLW link

AirBnB Baking

jefftk10 Jul 2024 12:50 UTC

7 points

1 comment1 min readLW link

(www.jefftk.com)

DIY RLHF: A simple implementation for hands on experience

Mike Vaiana and Trent Hodgeson

10 Jul 2024 12:07 UTC

29 points

0 comments6 min readLW link

Usefulness grounds truth

invertedpassion10 Jul 2024 7:58 UTC

0 points

0 comments4 min readLW link

On passing Complete and Honest Ideological Turing Tests (CHITTs)

Aryeh Englander10 Jul 2024 4:01 UTC

11 points

2 comments1 min readLW link

[Question] Pondering how good or bad things will be in the AGI future

Sherrinford9 Jul 2024 22:46 UTC

14 points

9 comments2 min readLW link

Causal Graphs of GPT-2-Small’s Residual Stream

David Udell9 Jul 2024 22:06 UTC

53 points

7 comments7 min readLW link

[Question] If AI starts to end the world, is suicide a good idea?

IlluminateReality9 Jul 2024 21:53 UTC

0 points

8 comments1 min readLW link

Rationalist Purity Test

Gunnar_Zarncke9 Jul 2024 20:30 UTC

−9 points

5 comments1 min readLW link

(ratpuritytest.com)

That which can be destroyed by the truth, should be assumed to should be destroyed by it

Thac09 Jul 2024 19:39 UTC

6 points

0 comments3 min readLW link

AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry

Corin Katzke, Alexa Pan, Julius and Dan H

9 Jul 2024 19:28 UTC

5 points

0 comments5 min readLW link

(newsletter.safe.ai)

Summer Tour Stops

jefftk9 Jul 2024 19:10 UTC

10 points

0 comments3 min readLW link

(www.jefftk.com)

Fix simple mistakes in ARC-AGI, etc.

Oleg Trott9 Jul 2024 17:46 UTC

9 points

9 comments1 min readLW link

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers

Jeffrey Heninger9 Jul 2024 16:50 UTC

42 points

2 comments2 min readLW link

(blog.aiimpacts.org)

UC Berkeley course on LLMs and ML Safety

Dan H9 Jul 2024 15:40 UTC

36 points

1 comment1 min readLW link

(rdi.berkeley.edu)

What and Why: Developmental Interpretability of Reinforcement Learning

Garrett Baker9 Jul 2024 14:09 UTC

67 points

4 comments6 min readLW link

Medical Roundup #3

Zvi9 Jul 2024 13:10 UTC

39 points

4 comments19 min readLW link

(thezvi.wordpress.com)

Consent across power differentials

Ramana Kumar9 Jul 2024 11:42 UTC

52 points

12 comments3 min readLW link