5 Oct 2023 22:18 UTC

52 points

0 comments7 min readLW link

(forum.effectivealtruism.org)

Provably Safe AI

PeterMcCluskey5 Oct 2023 22:18 UTC

35 points

15 comments4 min readLW link

(bayesianinvestor.com)

Stampy’s AI Safety Info soft launch

steven0461 and Robert Miles

5 Oct 2023 22:13 UTC

120 points

9 comments2 min readLW link

Impacts of AI on the housing markets

PottedRosePetal5 Oct 2023 21:24 UTC

8 points

0 comments5 min readLW link

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-Dodds5 Oct 2023 21:01 UTC

289 points

22 comments2 min readLW link 1 review

(transformer-circuits.pub)

Ideation and Trajectory Modelling in Language Models

NickyP5 Oct 2023 19:21 UTC

16 points

2 comments10 min readLW link

A well-defined history in measurable factor spaces

Matthias G. Mayer5 Oct 2023 18:36 UTC

25 points

0 comments2 min readLW link

Evaluating the historical value misspecification argument

Matthew Barnett5 Oct 2023 18:34 UTC

193 points

163 comments7 min readLW link 3 reviews

Translations Should Invert

abramdemski5 Oct 2023 17:44 UTC

48 points

19 comments3 min readLW link

Censorship in LLMs is here to stay because it mirrors how our own intelligence is structured

mnvr5 Oct 2023 17:37 UTC

3 points

0 comments1 min readLW link

Twin Cities ACX Meetup October 2023

Timothy M.5 Oct 2023 16:29 UTC

1 point

2 comments1 min readLW link

This anime storyboard doesn’t exist: a graphic novel written and illustrated by GPT4

RomanS5 Oct 2023 14:01 UTC

12 points

7 comments55 min readLW link

AI #32: Lie Detector

Zvi5 Oct 2023 13:50 UTC

45 points

19 comments44 min readLW link

(thezvi.wordpress.com)

Can the House Legislate?

jefftk5 Oct 2023 13:40 UTC

26 points

6 comments2 min readLW link

(www.jefftk.com)

Making progress on the ``what alignment target should be aimed at?″ question, is urgent

ThomasCederborg5 Oct 2023 12:55 UTC

2 points

0 comments18 min readLW link

Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn

Zvi5 Oct 2023 11:39 UTC

129 points

29 comments9 min readLW link

How to Get Rationalist Feedback

Nicholas Kross5 Oct 2023 2:03 UTC

16 points

0 comments2 min readLW link

On my AI Fable, and the importance of de re, de dicto, and de se reference for AI alignment

PhilGoetz5 Oct 2023 0:50 UTC

9 points

5 comments1 min readLW link

Underspecified Probabilities: A Thought Experiment

lunatic_at_large4 Oct 2023 22:25 UTC

8 points

4 comments2 min readLW link

Fraternal Birth Order Effect and the Maternal Immune Hypothesis

Bucky4 Oct 2023 21:18 UTC

20 points

1 comment2 min readLW link

How to solve deception and still fail.

Charlie Steiner4 Oct 2023 19:56 UTC

43 points

7 comments6 min readLW link

PortAudio M1 Latency

jefftk4 Oct 2023 19:10 UTC

8 points

5 comments1 min readLW link

(www.jefftk.com)

Open Philanthropy is hiring for multiple roles across our Global Catastrophic Risks teams

aarongertler4 Oct 2023 18:04 UTC

6 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master

kgldeshapriya4 Oct 2023 17:52 UTC

−20 points

2 comments2 min readLW link

The 5 Pillars of Happiness

Gabi QUENE4 Oct 2023 17:50 UTC

−24 points

5 comments5 min readLW link

[Question] Using Reinforcement Learning to try to control the heating of a building (district heating)

Tony Karlsson4 Oct 2023 17:47 UTC

3 points

5 comments1 min readLW link

rationalistic probability(litterally just throwing shit out there)

NotaSprayer ASprayer4 Oct 2023 17:46 UTC

−30 points

8 comments2 min readLW link

AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

Dan H4 Oct 2023 17:37 UTC

15 points

2 comments5 min readLW link

(newsletter.safe.ai)

I don’t find the lie detection results that surprising (by an author of the paper)

JanB4 Oct 2023 17:10 UTC

97 points

8 comments3 min readLW link

[Question] What evidence is there of LLM’s containing world models?

Chris_Leong4 Oct 2023 14:33 UTC

17 points

17 comments1 min readLW link

Entanglement and intuition about words and meaning

Bill Benzon4 Oct 2023 14:16 UTC

4 points

0 comments2 min readLW link

Why a Mars colony would lead to a first strike situation

Remmelt4 Oct 2023 11:29 UTC

−60 points

8 comments1 min readLW link

(mflb.com)

[Question] What are some examples of AIs instantiating the ‘nearest unblocked strategy problem’?

EJT4 Oct 2023 11:05 UTC

6 points

4 comments1 min readLW link

Graphical tensor notation for interpretability

Jordan Taylor4 Oct 2023 8:04 UTC

141 points

11 comments19 min readLW link

[Link] Bay Area Winter Solstice 2023

tcheasdfjkl and TheSkeward

4 Oct 2023 2:19 UTC

18 points

3 comments1 min readLW link

(fb.me)

[Question] Who determines whether an alignment proposal is the definitive alignment solution?

MiguelDev3 Oct 2023 22:39 UTC

−1 points

6 comments1 min readLW link

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld

DanielFilan3 Oct 2023 21:50 UTC

43 points

0 comments92 min readLW link

When to Get the Booster?

jefftk3 Oct 2023 21:00 UTC

50 points

15 comments2 min readLW link

(www.jefftk.com)

OpenAI-Microsoft partnership

Zach Stein-Perlman3 Oct 2023 20:01 UTC

51 points

19 comments1 min readLW link

[Question] Current AI safety techniques?

Zach Stein-Perlman3 Oct 2023 19:30 UTC

30 points

2 comments2 min readLW link

Testing and Automation for Intelligent Systems.

Sai Kiran Kammari3 Oct 2023 17:51 UTC

−13 points

0 comments1 min readLW link

(resource-cms.springernature.com)

Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists

ChristianWilliams3 Oct 2023 16:44 UTC

13 points

0 comments2 min readLW link

(www.metaculus.com)

What would it mean to understand how a large language model (LLM) works? Some quick notes.

Bill Benzon3 Oct 2023 15:11 UTC

20 points

4 comments8 min readLW link

[Question] Potential alignment targets for a sovereign superintelligent AI

Paul Colognese3 Oct 2023 15:09 UTC

29 points

4 comments1 min readLW link

Monthly Roundup #11: October 2023

Zvi3 Oct 2023 14:10 UTC

42 points

12 comments35 min readLW link

(thezvi.wordpress.com)

Why We Use Money? - A Walrasian View

Savio Coelho3 Oct 2023 12:02 UTC

4 points

3 comments8 min readLW link

Mech Interp Challenge: October—Deciphering the Sorted List Model

CallumMcDougall3 Oct 2023 10:57 UTC

23 points

0 comments3 min readLW link

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

3 Oct 2023 7:45 UTC

18 points

0 comments5 min readLW link

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”

Miles Turpin3 Oct 2023 2:22 UTC

31 points

0 comments9 min readLW link

My Mid-Career Transition into Biosecurity

jefftk2 Oct 2023 21:20 UTC

26 points

4 comments2 min readLW link

(www.jefftk.com)