Air-gap­ping eval­u­a­tion and support

Ryan Kidd26 Dec 2022 22:52 UTC
53 points
1 comment2 min readLW link

Slightly against al­ign­ing with neo-luddites

Matthew Barnett26 Dec 2022 22:46 UTC
104 points
31 comments4 min readLW link

Avoid­ing per­pet­ual risk from TAI

scasper26 Dec 2022 22:34 UTC
15 points
6 comments5 min readLW link

An­nounc­ing: The In­de­pen­dent AI Safety Registry

Shoshannah Tekofsky26 Dec 2022 21:22 UTC
53 points
9 comments1 min readLW link

Are men harder to help?

braces26 Dec 2022 21:11 UTC
35 points
1 comment2 min readLW link

[Question] How much should I up­date on the fact that my den­tist is named Den­nis?

MichaelDickens26 Dec 2022 19:11 UTC
2 points
3 comments1 min readLW link

Theod­icy and the simu­la­tion hy­poth­e­sis, or: The prob­lem of simu­la­tor evil

philosophybear26 Dec 2022 18:55 UTC
6 points
12 comments19 min readLW link
(philosophybear.substack.com)

Safety of Self-Assem­bled Neu­ro­mor­phic Hardware

Can26 Dec 2022 18:51 UTC
15 points
2 comments10 min readLW link
(forum.effectivealtruism.org)

Co­her­ent ex­trap­o­lated dreaming

Alex Flint26 Dec 2022 17:29 UTC
38 points
10 comments17 min readLW link

An overview of some promis­ing work by ju­nior al­ign­ment researchers

Akash26 Dec 2022 17:23 UTC
34 points
0 comments4 min readLW link

Sols­tice song: Here Lies the Dragon

jchan26 Dec 2022 16:08 UTC
8 points
1 comment2 min readLW link

The Use­ful­ness Paradigm

Aprillion (Peter Hozák)26 Dec 2022 13:23 UTC
3 points
4 comments1 min readLW link

Look­ing Back on Posts From 2022

Zvi26 Dec 2022 13:20 UTC
49 points
8 comments17 min readLW link
(thezvi.wordpress.com)

Analo­gies be­tween Soft­ware Re­v­erse Eng­ineer­ing and Mechanis­tic Interpretability

26 Dec 2022 12:26 UTC
34 points
6 comments11 min readLW link
(www.neelnanda.io)

Mlyyrczo

lsusr26 Dec 2022 7:58 UTC
41 points
14 comments3 min readLW link

Causal ab­strac­tions vs infradistributions

Pablo Villalobos26 Dec 2022 0:21 UTC
20 points
0 comments6 min readLW link

Con­crete Steps to Get Started in Trans­former Mechanis­tic Interpretability

Neel Nanda25 Dec 2022 22:21 UTC
54 points
7 comments12 min readLW link
(www.neelnanda.io)

It’s time to worry about on­line pri­vacy again

Malmesbury25 Dec 2022 21:05 UTC
66 points
23 comments6 min readLW link

[Heb­bian Nat­u­ral Ab­strac­tions] Math­e­mat­i­cal Foundations

25 Dec 2022 20:58 UTC
15 points
2 comments6 min readLW link
(www.snellessen.com)

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieve25 Dec 2022 20:14 UTC
3 points
6 comments1 min readLW link

YCom­bi­na­tor fraud rates

Xodarap25 Dec 2022 19:21 UTC
56 points
3 comments1 min readLW link

How evolu­tion­ary lineages of LLMs can plan their own fu­ture and act on these plans

Roman Leventov25 Dec 2022 18:11 UTC
39 points
16 comments8 min readLW link

Ac­cu­rate Models of AI Risk Are Hyper­ex­is­ten­tial Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC
30 points
38 comments9 min readLW link

ChatGPT is our Wright Brothers moment

Ron J25 Dec 2022 16:26 UTC
10 points
9 comments1 min readLW link

The Med­i­ta­tion on Winter

Raemon25 Dec 2022 16:12 UTC
58 points
3 comments3 min readLW link

I’ve up­dated to­wards AI box­ing be­ing sur­pris­ingly easy

Noosphere8925 Dec 2022 15:40 UTC
8 points
20 comments2 min readLW link

Take 14: Cor­rigi­bil­ity isn’t that great.

Charlie Steiner25 Dec 2022 13:04 UTC
15 points
3 comments3 min readLW link

Sim­plified Level Up

jefftk25 Dec 2022 13:00 UTC
12 points
16 comments2 min readLW link
(www.jefftk.com)

Hyper­finite graphs ~ manifolds

Alok Singh25 Dec 2022 12:24 UTC
11 points
5 comments2 min readLW link

In­con­sis­tent math is great

Alok Singh25 Dec 2022 3:20 UTC
1 point
2 comments1 min readLW link

A hun­dredth of a bit of ex­tra entropy

Adam Scherlis24 Dec 2022 21:12 UTC
83 points
4 comments3 min readLW link

Shared re­al­ity: a key driver of hu­man behavior

kdbscott24 Dec 2022 19:35 UTC
126 points
25 comments4 min readLW link

Con­tra Steiner on Too Many Nat­u­ral Abstractions

DragonGod24 Dec 2022 17:42 UTC
10 points
6 comments1 min readLW link

Three rea­sons to cooperate

paulfchristiano24 Dec 2022 17:40 UTC
82 points
14 comments10 min readLW link
(sideways-view.com)

Prac­ti­cal AI risk I: Watch­ing large compute

Gustavo Ramires24 Dec 2022 13:25 UTC
3 points
0 comments1 min readLW link

Non-Ele­vated Air Purifiers

jefftk24 Dec 2022 12:40 UTC
10 points
2 comments1 min readLW link
(www.jefftk.com)

The Case for Chip-Backed Dollars

AnthonyRepetto24 Dec 2022 10:28 UTC
0 points
1 comment4 min readLW link

List #3: Why not to as­sume on prior that AGI-al­ign­ment workarounds are available

Remmelt24 Dec 2022 9:54 UTC
4 points
1 comment3 min readLW link

List #2: Why co­or­di­nat­ing to al­ign as hu­mans to not de­velop AGI is a lot eas­ier than, well… co­or­di­nat­ing as hu­mans with AGI co­or­di­nat­ing to be al­igned with humans

Remmelt24 Dec 2022 9:53 UTC
1 point
0 comments3 min readLW link

List #1: Why stop­ping the de­vel­op­ment of AGI is hard but doable

Remmelt24 Dec 2022 9:52 UTC
6 points
11 comments5 min readLW link

The case against AI alignment

andrew sauer24 Dec 2022 6:57 UTC
115 points
110 comments5 min readLW link

Con­tent and Take­aways from SERI MATS Train­ing Pro­gram with John Wentworth

RohanS24 Dec 2022 4:17 UTC
28 points
3 comments12 min readLW link

Löb’s Lemma: an eas­ier ap­proach to Löb’s Theorem

Andrew_Critch24 Dec 2022 2:02 UTC
30 points
16 comments3 min readLW link

Durkon, an open-source tool for In­her­ently In­ter­pretable Modelling

abstractapplic24 Dec 2022 1:49 UTC
29 points
0 comments4 min readLW link

Is­sues with un­even AI re­source distribution

User_Luke24 Dec 2022 1:18 UTC
3 points
9 comments5 min readLW link
(temporal.substack.com)

Loose Threads on Intelligence

Shoshannah Tekofsky24 Dec 2022 0:38 UTC
11 points
3 comments8 min readLW link

[Question] If you fac­tor out next to­ken pre­dic­tion, what are the re­main­ing salient fea­tures of hu­man cog­ni­tion?

shminux24 Dec 2022 0:38 UTC
9 points
7 comments1 min readLW link

[Question] Why is “Ar­gu­ment Map­ping” Not More Com­mon in EA/​Ra­tion­al­ity (And What Ob­jec­tions Should I Ad­dress in a Post on the Topic?)

HarrisonDurland23 Dec 2022 21:58 UTC
10 points
5 comments1 min readLW link

The Fear [Fic­tion]

Yitz23 Dec 2022 21:21 UTC
7 points
0 comments1 min readLW link

To err is neu­ral: se­lect logs with ChatGPT

VipulNaik23 Dec 2022 20:26 UTC
22 points
2 comments38 min readLW link