All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra

aphyer16 Jan 2024 22:44 UTC

26 points

1 comment5 min readLW link

The weak-to-strong generalization (WTSG) paper in 60 seconds

sudo16 Jan 2024 22:44 UTC

12 points

1 comment1 min readLW link

(arxiv.org)

Social media alignment test

amayhew16 Jan 2024 20:56 UTC

1 point

0 comments1 min readLW link

(naiveskepticblog.wordpress.com)

Medical Roundup #1

Zvi16 Jan 2024 20:30 UTC

57 points

9 comments29 min readLW link

(thezvi.wordpress.com)

Being nicer than Clippy

Joe Carlsmith16 Jan 2024 19:44 UTC

108 points

32 comments27 min readLW link

How polysemantic can one neuron be? Investigating features in TinyStories.

Evan Anders16 Jan 2024 19:10 UTC

14 points

0 comments8 min readLW link

(evanhanders.blog)

Applying AI Safety concepts to astronomy

Faris16 Jan 2024 18:29 UTC

1 point

0 comments12 min readLW link

Managing catastrophic misuse without robust AIs

ryan_greenblatt and Buck

16 Jan 2024 17:27 UTC

63 points

17 comments11 min readLW link

[Question] What are the most common social insecurities?

Chris Lakin16 Jan 2024 17:24 UTC

9 points

6 comments1 min readLW link

Why wasn’t preservation with the goal of potential future revival started earlier in history?

Andy_McKenzie16 Jan 2024 16:15 UTC

31 points

1 comment6 min readLW link

[Question] Why are people unkeen to immortality that would come from technological advancements and/or AI?

Gabi QUENE16 Jan 2024 14:23 UTC

12 points

42 comments1 min readLW link

Dealing with Awkwardness

Jonathan Moregård16 Jan 2024 12:32 UTC

13 points

0 comments4 min readLW link

(honestliving.substack.com)

The impossible problem of due process

mingyuan16 Jan 2024 5:18 UTC

206 points

64 comments14 min readLW link

[Retracted] Newton’s law of cooling from first principles

Nisan16 Jan 2024 4:21 UTC

9 points

15 comments2 min readLW link

Sparse Autoencoders Work on Attention Layer Outputs

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

16 Jan 2024 0:26 UTC

85 points

9 comments18 min readLW link

Goals selected from learned knowledge: an alternative to RL alignment

Seth Herd15 Jan 2024 21:52 UTC

42 points

18 comments7 min readLW link

Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols

Arjun Panickssery and agg

15 Jan 2024 21:21 UTC

33 points

0 comments1 min readLW link

Live Sound: Big-O Improvements

jefftk15 Jan 2024 19:50 UTC

8 points

0 comments1 min readLW link

(www.jefftk.com)

Investigating Bias Representations in LLMs via Activation Steering

DawnLu15 Jan 2024 19:39 UTC

29 points

4 comments5 min readLW link

Sparse MLP Distillation

slavachalnev15 Jan 2024 19:39 UTC

30 points

3 comments6 min readLW link

Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results

Iknownothing15 Jan 2024 19:37 UTC

24 points

0 comments25 min readLW link

(aiplans.substack.com)

[Question] What does it look like for AI to significantly improve human coordination, before superintelligence?

Bird Concept15 Jan 2024 19:22 UTC

22 points

2 comments1 min readLW link

Now Accepting Player Applications for Band of Blades

Joe Rogero15 Jan 2024 17:58 UTC

2 points

0 comments3 min readLW link

Three Types of Constraints in the Space of Agents

Nora_Ammann and Mateusz Bagiński

15 Jan 2024 17:27 UTC

26 points

3 comments17 min readLW link

The case for training frontier AIs on Sumerian-only corpus

Alexandre Variengien, Charbel-Raphaël and Jonathan Claybrough

15 Jan 2024 16:40 UTC

130 points

16 comments3 min readLW link

How to Promote More Productive Dialogue Outside of LessWrong

sweenesm15 Jan 2024 14:16 UTC

19 points

4 comments2 min readLW link

[Question] Come and daydream with me about science reform

TeaTieAndHat15 Jan 2024 11:09 UTC

9 points

1 comment1 min readLW link

AI doing philosophy = AI generating hands?

Wei Dai15 Jan 2024 9:04 UTC

47 points

23 comments3 min readLW link

Even if we lose, we win

Morphism15 Jan 2024 2:15 UTC

24 points

17 comments4 min readLW link

Detachment vs attachment [AI risk and mental health]

Neil 15 Jan 2024 0:41 UTC

15 points

4 comments3 min readLW link

Making up statistics to establish priority on Land Value Tax vs Earned Income Tax Credit vs Social Media Dynamic Regulation

Canucklug14 Jan 2024 23:57 UTC

−5 points

2 comments7 min readLW link

Is the universe all there is? ‘Evidence’ for objects outside the universe...

JonathanHall14 Jan 2024 23:56 UTC

−4 points

27 comments11 min readLW link

[Question] What is the minimum amount of time travel and resources needed to secure the future?

Perhaps14 Jan 2024 22:01 UTC

−3 points

5 comments1 min readLW link

Gothenburg LW / ACX meetup

Stefan14 Jan 2024 21:21 UTC

1 point

0 comments1 min readLW link

Gothenburg LW / ACX meetup

Stefan14 Jan 2024 21:20 UTC

1 point

1 comment1 min readLW link

D&D.Sci Hypersphere Analysis Part 2: Nonlinear Effects & Interactions

aphyer14 Jan 2024 19:59 UTC

24 points

0 comments7 min readLW link

Gender Exploration

sapphire14 Jan 2024 18:57 UTC

121 points

26 comments5 min readLW link

(open.substack.com)

List of projects that seem impactful for AI Governance

JaimeRV and Teun van der Weij

14 Jan 2024 16:53 UTC

14 points

0 comments13 min readLW link

The Leeroy Jenkins principle: How faulty AI could guarantee “warning shots”

titotal14 Jan 2024 15:03 UTC

48 points

6 comments21 min readLW link

(titotal.substack.com)

Notice When People Are Directionally Correct

Chris_Leong14 Jan 2024 14:12 UTC

137 points

8 comments2 min readLW link

Corrosive Mnemonics

Epirito14 Jan 2024 12:44 UTC

7 points

0 comments2 min readLW link

Against most, but not all, AI risk analogies

Matthew Barnett14 Jan 2024 3:36 UTC

63 points

41 comments7 min readLW link

Vote With Your Face

jefftk14 Jan 2024 3:30 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

Case Studies in Reverse-Engineering Sparse Autoencoder Features by Using MLP Linearization

Jacob Dunefsky, Philippe Chlenski, Senthooran Rajamanoharan and Neel Nanda

14 Jan 2024 2:06 UTC

24 points

0 comments42 min readLW link

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis

aphyer13 Jan 2024 20:16 UTC

29 points

1 comment5 min readLW link

Some additional SAE thoughts

Hoagy13 Jan 2024 19:31 UTC

31 points

4 comments13 min readLW link

AI #47: Meet the New Year

Zvi13 Jan 2024 16:20 UTC

36 points

7 comments57 min readLW link

(thezvi.wordpress.com)

Takeaways from the NeurIPS 2023 Trojan Detection Competition

mikes13 Jan 2024 12:35 UTC

20 points

2 comments1 min readLW link

(confirmlabs.org)

[Question] Why do so many think deception in AI is important?

Prometheus13 Jan 2024 8:14 UTC

24 points

12 comments1 min readLW link

Eliminating Cookie Banners is Hard

jefftk13 Jan 2024 3:00 UTC

23 points

15 comments3 min readLW link

(www.jefftk.com)