My guess at Con­jec­ture’s vi­sion: trig­ger­ing a nar­ra­tive bifurcation

Alexandre VariengienFeb 6, 2024, 7:10 PM
75 points
12 comments16 min readLW link

Ar­ro­gance and Peo­ple Pleasing

Jonathan MoregårdFeb 6, 2024, 6:43 PM
26 points
7 comments4 min readLW link
(honestliving.substack.com)

What does davi­dad want from «bound­aries»?

Feb 6, 2024, 5:45 PM
47 points
1 comment5 min readLW link

[Question] How can I effi­ciently read all the Dath Ilan wor­ld­build­ing?

mike_hawkeFeb 6, 2024, 4:52 PM
10 points
1 comment1 min readLW link

Prevent­ing model exfil­tra­tion with up­load limits

ryan_greenblattFeb 6, 2024, 4:29 PM
71 points
22 comments14 min readLW link

Evolu­tion is an ob­ser­va­tion, not a process

Neil Feb 6, 2024, 2:49 PM
8 points
11 comments5 min readLW link

[Question] Why do we need an un­der­stand­ing of the real world to pre­dict the next to­kens in a body of text?

Valentin BaltadzhievFeb 6, 2024, 2:43 PM
2 points
12 comments1 min readLW link

On the De­bate Between Je­zos and Leahy

ZviFeb 6, 2024, 2:40 PM
64 points
6 comments63 min readLW link
(thezvi.wordpress.com)

Why Two Valid An­swers Ap­proach is not Enough for Sleep­ing Beauty

Ape in the coatFeb 6, 2024, 2:21 PM
6 points
12 comments6 min readLW link

Are most per­son­al­ity di­s­or­ders re­ally trust di­s­or­ders?

chaosmageFeb 6, 2024, 12:37 PM
20 points
4 comments1 min readLW link

From Con­cep­tual Spaces to Quan­tum Con­cepts: For­mal­is­ing and Learn­ing Struc­tured Con­cep­tual Models

Roman LeventovFeb 6, 2024, 10:18 AM
8 points
1 comment4 min readLW link
(arxiv.org)

Fluent dream­ing for lan­guage mod­els (AI in­ter­pretabil­ity method)

Feb 6, 2024, 6:02 AM
46 points
5 comments1 min readLW link
(arxiv.org)

Selfish AI Inevitable

Davey MorseFeb 6, 2024, 4:29 AM
1 point
0 comments1 min readLW link

Toy mod­els of AI con­trol for con­cen­trated catas­tro­phe prevention

Feb 6, 2024, 1:38 AM
51 points
2 comments7 min readLW link

Things You’re Allowed to Do: Univer­sity Edition

Saul MunnFeb 6, 2024, 12:36 AM
97 points
13 comments5 min readLW link
(www.brasstacks.blog)

Value learn­ing in the ab­sence of ground truth

Joel_SaarinenFeb 5, 2024, 6:56 PM
47 points
8 comments45 min readLW link

Im­ple­ment­ing ac­ti­va­tion steering

AnnahFeb 5, 2024, 5:51 PM
75 points
8 comments7 min readLW link

AI al­ign­ment as a trans­la­tion problem

Roman LeventovFeb 5, 2024, 2:14 PM
22 points
2 comments3 min readLW link

Safe Sta­sis Fallacy

DavidmanheimFeb 5, 2024, 10:54 AM
54 points
2 commentsLW link

[Question] How has in­ter­nal­is­ing a post-AGI world af­fected your cur­rent choices?

yanni kyriacosFeb 5, 2024, 5:43 AM
10 points
8 comments1 min readLW link

A thought ex­per­i­ment for com­par­ing “biolog­i­cal” vs “digi­tal” in­tel­li­gence in­crease/​explosion

Super AGIFeb 5, 2024, 4:57 AM
6 points
3 comments1 min readLW link

Notic­ing Panic

Cole WyethFeb 5, 2024, 3:45 AM
59 points
8 comments3 min readLW link

EA/​ACX/​LW Fe­bru­ary Santa Cruz Meetup

madmailFeb 4, 2024, 11:26 PM
1 point
0 comments1 min readLW link

Vi­talia Ra­tion­al­ity Meetup

veronicaFeb 4, 2024, 7:46 PM
1 point
0 comments1 min readLW link

Per­sonal predictions

Daniele De NuntiisFeb 4, 2024, 3:59 AM
2 points
2 comments3 min readLW link

A sketch of acausal trade in practice

Richard_NgoFeb 4, 2024, 12:32 AM
36 points
4 comments7 min readLW link

Brute Force Man­u­fac­tured Con­sen­sus is Hid­ing the Crime of the Century

RokoFeb 3, 2024, 8:36 PM
209 points
156 comments9 min readLW link

My thoughts on the Beff Je­zos—Con­nor Leahy debate

kwiat.devFeb 3, 2024, 7:47 PM
−5 points
23 comments4 min readLW link

The Jour­nal of Danger­ous Ideas

rogersbaconFeb 3, 2024, 3:40 PM
−25 points
4 comments5 min readLW link
(www.secretorum.life)

At­ti­tudes about Ap­plied Rationality

Camille Berger Feb 3, 2024, 2:42 PM
108 points
18 comments4 min readLW link

Prac­tic­ing my Hand­writ­ing in 1439

Maxwell TabarrokFeb 3, 2024, 1:21 PM
11 points
0 comments3 min readLW link
(www.maximum-progress.com)

Finite Fac­tored Sets to Bayes Nets Part 2

J BostockFeb 3, 2024, 12:25 PM
6 points
0 comments8 min readLW link

Why I no longer iden­tify as transhumanist

Kaj_SotalaFeb 3, 2024, 12:00 PM
55 points
33 comments3 min readLW link
(kajsotala.fi)

At­ten­tion SAEs Scale to GPT-2 Small

Feb 3, 2024, 6:50 AM
78 points
4 comments8 min readLW link

Why do we need RLHF? Imi­ta­tion, In­verse RL, and the role of reward

Ran WFeb 3, 2024, 4:00 AM
16 points
0 comments5 min readLW link

An­nounc­ing the Lon­don Ini­ti­a­tive for Safe AI (LISA)

Feb 2, 2024, 11:17 PM
98 points
0 comments9 min readLW link

Sur­vey for al­ign­ment re­searchers!

Feb 2, 2024, 8:41 PM
71 points
11 comments1 min readLW link

Vot­ing Re­sults for the 2022 Review

Ben PaceFeb 2, 2024, 8:34 PM
57 points
3 comments73 min readLW link

On Dwarkesh’s 3rd Pod­cast With Tyler Cowen

ZviFeb 2, 2024, 7:30 PM
36 points
9 comments21 min readLW link
(thezvi.wordpress.com)

Most ex­perts be­lieve COVID-19 was prob­a­bly not a lab leak

DanielFilanFeb 2, 2024, 7:28 PM
66 points
89 comments2 min readLW link
(gcrinstitute.org)

What Failure Looks Like is not an ex­is­ten­tial risk (and al­ign­ment is not the solu­tion)

otto.bartenFeb 2, 2024, 6:59 PM
13 points
12 comments9 min readLW link

Solv­ing al­ign­ment isn’t enough for a flour­ish­ing future

micFeb 2, 2024, 6:23 PM
27 points
0 commentsLW link
(papers.ssrn.com)

Man­i­fold Markets

PeterMcCluskeyFeb 2, 2024, 5:48 PM
26 points
9 comments4 min readLW link
(bayesianinvestor.com)

Types of sub­jec­tive welfare

MichaelStJulesFeb 2, 2024, 9:56 AM
10 points
3 commentsLW link

Open Source Sparse Au­toen­coders for all Resi­d­ual Stream Lay­ers of GPT2-Small

Joseph BloomFeb 2, 2024, 6:54 AM
103 points
37 comments15 min readLW link

Soft Prompts for Eval­u­a­tion: Mea­sur­ing Con­di­tional Dis­tance of Capabilities

porbyFeb 2, 2024, 5:49 AM
47 points
1 comment4 min readLW link
(arxiv.org)

Run­ning a Pre­dic­tion Mar­ket Mafia Game

Arjun PanicksseryFeb 1, 2024, 11:24 PM
22 points
5 comments1 min readLW link
(arjunpanickssery.substack.com)

Eval­u­at­ing Sta­bil­ity of Un­re­flec­tive Alignment

james.lucassenFeb 1, 2024, 10:15 PM
57 points
12 comments18 min readLW link
(jlucassen.com)

Davi­dad’s Prov­ably Safe AI Ar­chi­tec­ture—ARIA’s Pro­gramme Thesis

simeon_cFeb 1, 2024, 9:30 PM
69 points
17 comments1 min readLW link
(www.aria.org.uk)

Align­ment has a Basin of At­trac­tion: Beyond the Orthog­o­nal­ity Thesis

RogerDearnaleyFeb 1, 2024, 9:15 PM
16 points
15 comments13 min readLW link