All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 567 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Sam Altman’s Business Negging

Julian BradshawSep 30, 2024, 9:06 PM

13 points

0 comments1 min readLW link

(www.bloomberg.com)

In-Context Learning: An Alignment Survey

alamertonSep 30, 2024, 6:44 PM

8 points

0 comments20 min readLW link

(docs.google.com)

Not Just For Therapy Chatbots: The Case For Compassion In AI Moral Alignment Research

kenneth_diaoSep 30, 2024, 6:37 PM

2 points

0 comments12 min readLW link

Exploring Decomposability of SAE Features

Vikram_NSep 30, 2024, 6:28 PM

1 point

0 comments3 min readLW link

Knowledge Base 1: Could it increase intelligence and make it safer?

iwisSep 30, 2024, 4:00 PM

−4 points

0 comments4 min readLW link

Point of Failure: Semiconductor-Grade Quartz

AnnapurnaSep 30, 2024, 3:57 PM

41 points

8 comments2 min readLW link

(jorgevelez.substack.com)

on bacteria, on teeth

bhauthSep 30, 2024, 3:56 PM

62 points

9 comments6 min readLW link

(bhauth.com)

SB 1047 gets vetoed

ryan_bSep 30, 2024, 3:49 PM

25 points

1 comment1 min readLW link

(www.reuters.com)

Of Birds and Bees

RussellThorSep 30, 2024, 10:52 AM

7 points

9 comments2 min readLW link

A new process for mapping discussions

Nathan YoungSep 30, 2024, 8:57 AM

29 points

8 comments6 min readLW link

(open.substack.com)

MATS Alumni Impact Analysis

utilistrutil, Juan Gil, yams, LauraVaughan, K Richards and Ryan Kidd

Sep 30, 2024, 2:35 AM

62 points

7 comments11 min readLW link

[Question] Most capable publicly available agents?

GabeSep 30, 2024, 12:04 AM

2 points

0 comments1 min readLW link

the case for CoT unfaithfulness is overstated

nostalgebraistSep 29, 2024, 10:07 PM

260 points

43 comments11 min readLW link

Pomodoro Method Randomized Self Experiment

niplavSep 29, 2024, 9:55 PM

14 points

2 comments1 min readLW link

Toy Models of Superposition: Simplified by Hand

Axel SorensenSep 29, 2024, 9:19 PM

9 points

3 comments8 min readLW link

LLMs are likely not conscious

research_prime_spaceSep 29, 2024, 8:57 PM

6 points

9 comments1 min readLW link

A Policy Proposal

phdeadSep 29, 2024, 8:45 PM

10 points

4 comments4 min readLW link

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?

Taras Kutsyk, Tommaso Mencattini and Ciprian Florea

Sep 29, 2024, 7:37 PM

26 points

8 comments25 min readLW link

Models of life

Abhishaike MahajanSep 29, 2024, 7:24 PM

8 points

0 comments16 min readLW link

(www.asimov.press)

Interpreting the effects of Jailbreak Prompts in LLMs

Harsh RajSep 29, 2024, 7:01 PM

8 points

0 comments5 min readLW link

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks

Tej LanderSep 29, 2024, 6:58 PM

5 points

0 comments29 min readLW link

Developmental Stages in Multi-Problem Grokking

James SullivanSep 29, 2024, 6:58 PM

4 points

0 comments6 min readLW link

A Psychoanalytic Explanation of Sam Altman’s Irrational Actions

GabeSep 29, 2024, 6:58 PM

1 point

3 comments3 min readLW link

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation

Antonio ClarkeSep 29, 2024, 6:48 PM

6 points

0 comments23 min readLW link

Cryonics is free

Mati_RoySep 29, 2024, 5:58 PM

198 points

43 comments2 min readLW link

Runner’s High On Demand: A Story of Luck & Persistence

Shoshannah TekofskySep 29, 2024, 5:15 PM

14 points

6 comments5 min readLW link

(shoshanigans.substack.com)

You can, in fact, bamboozle an unaligned AI into sparing your life

David MatolcsiSep 29, 2024, 4:59 PM

112 points

173 comments27 min readLW link

Base LLMs refuse too

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

Sep 29, 2024, 4:04 PM

60 points

20 comments10 min readLW link

My Methodological Turn

adamShimiSep 29, 2024, 3:01 PM

29 points

0 comments1 min readLW link

(formethods.substack.com)

Linkpost: Hypocrisy standoff

Chris_LeongSep 29, 2024, 2:27 PM

5 points

1 comment1 min readLW link

(x.com)

[Question] Any real toeholds for making practical decisions regarding AI safety?

lemonhopeSep 29, 2024, 12:03 PM

27 points

6 comments1 min readLW link

Review: Dr Stone

ProgramCrafterSep 29, 2024, 10:35 AM

18 points

9 comments4 min readLW link

AXRP Episode 36 - Adam Shai and Paul Riechers on Computational Mechanics

DanielFilanSep 29, 2024, 5:50 AM

25 points

0 comments55 min readLW link

DunCon @Lighthaven

Duncan Sabien (Inactive)Sep 29, 2024, 4:56 AM

45 points

2 comments1 min readLW link

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents

Alejandro AristizabalSep 29, 2024, 12:32 AM

6 points

0 comments15 min readLW link

Jailbreaking language models with user roleplay

loopsSep 28, 2024, 11:43 PM

8 points

0 comments3 min readLW link

(iter.ca)

“Slow” takeoff is a terrible term for “maybe even faster takeoff, actually”

RaemonSep 28, 2024, 11:38 PM

217 points

69 comments1 min readLW link

Contextual Constitutional AI

aksh-nSep 28, 2024, 11:24 PM

14 points

2 comments12 min readLW link

Explore More: A Bag of Tricks to Keep Your Life on the Rails

Shoshannah TekofskySep 28, 2024, 9:38 PM

236 points

19 comments11 min readLW link

(shoshanigans.substack.com)

2024 Petrov Day Retrospective

Ben Pace and Raemon

Sep 28, 2024, 9:30 PM

93 points

25 comments10 min readLW link

[Question] Any Trump Supporters Want to Dialogue?

k64Sep 28, 2024, 7:41 PM

15 points

83 comments1 min readLW link

Evaluating LLaMA 3 for political sycophancy

alma.liezengaSep 28, 2024, 7:02 PM

2 points

2 comments6 min readLW link

Two new datasets for evaluating political sycophancy in LLMs

alma.liezengaSep 28, 2024, 6:29 PM

9 points

0 comments9 min readLW link

COT Scaling implies slower takeoff speeds

Logan ZoellnerSep 28, 2024, 4:20 PM

36 points

56 comments1 min readLW link

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About “Relative” Fitness?

LorecSep 28, 2024, 2:07 PM

6 points

6 comments1 min readLW link

Steering LLMs’ Behavior with Concept Activation Vectors

Ruixuan HuangSep 28, 2024, 9:53 AM

8 points

0 comments10 min readLW link

An Interactive Shapley Value Explainer

James Stephen BrownSep 28, 2024, 5:01 AM

42 points

9 comments1 min readLW link

(nonzerosum.games)

[Question] Implications of China’s recession on AGI development?

Eric NeymanSep 28, 2024, 1:12 AM

41 points

3 comments1 min readLW link

The Compute Conundrum: AI Governance in a Shifting Geopolitical Era

octavoSep 28, 2024, 1:05 AM

−3 points

1 comment17 min readLW link

‘Chat with impactful research & evaluations’ (Unjournal NotebookLMs)

david reinsteinSep 28, 2024, 12:32 AM

6 points

0 comments2 min readLW link