What can we learn from in­se­cure do­mains?

Logan ZoellnerNov 1, 2024, 11:53 PM
14 points
21 comments1 min readLW link

Science ad­vances one funeral at a time

Nov 1, 2024, 11:06 PM
100 points
9 comments2 min readLW link

The Carte­sian Crisis

mindprisonNov 1, 2024, 11:02 PM
−5 points
2 comments2 min readLW link

Com­po­si­tion Cir­cuits in Vi­sion Trans­form­ers (Hy­poth­e­sis)

phenomanonNov 1, 2024, 10:16 PM
1 point
0 comments3 min readLW link

SAE Prob­ing: What is it good for?

Nov 1, 2024, 7:23 PM
33 points
0 comments11 min readLW link

[Question] Set The­ory Mul­ti­verse vs Math­e­mat­i­cal Truth—Philo­soph­i­cal Discussion

Wenitte ApiouNov 1, 2024, 6:56 PM
8 points
25 comments1 min readLW link

Ed­u­ca­tional CAI: Align­ing a Lan­guage Model with Ped­a­gog­i­cal Theories

Bharath PuranamNov 1, 2024, 6:55 PM
5 points
1 comment13 min readLW link

Pre­dic­tion mar­kets and Taxes

Edmund NelsonNov 1, 2024, 5:39 PM
11 points
8 comments1 min readLW link

Den­tistry, Oral Sur­geons, and the Ineffi­ciency of Small Markets

GeneSmithNov 1, 2024, 5:26 PM
86 points
16 comments5 min readLW link

Live Machin­ery: An In­ter­face De­sign Philos­o­phy for Whole­some AI Futures

SahilNov 1, 2024, 5:24 PM
48 points
3 comments35 min readLW link

Seek­ing Collaborators

abramdemskiNov 1, 2024, 5:13 PM
62 points
15 comments7 min readLW link

Com­plete Feedback

abramdemskiNov 1, 2024, 4:58 PM
25 points
8 comments3 min readLW link

Lev­ers for Biolog­i­cal Progress—A Re­sponse to “Machines of Lov­ing Grace”

Niko_McCartyNov 1, 2024, 4:35 PM
15 points
0 comments20 min readLW link
(www.asimov.press)

2024 Unoffi­cial LW Com­mu­nity Cen­sus, Re­quest for Comments

ScrewtapeNov 1, 2024, 4:34 PM
23 points
32 comments3 min readLW link

[Question] When en­gag­ing with a large amount of re­sources dur­ing a liter­a­ture re­view, how do you pre­vent your­self from be­com­ing over­whelmed?

corruptedCatapillarNov 1, 2024, 7:29 AM
25 points
2 comments3 min readLW link

(draft) Cy­borg soft­ware should be open (?)

AtillaYasarNov 1, 2024, 7:24 AM
4 points
5 comments3 min readLW link

Another UFO Bet

codyzNov 1, 2024, 1:55 AM
9 points
11 comments1 min readLW link

Trad­ing Candy

jefftkNov 1, 2024, 1:10 AM
28 points
4 comments1 min readLW link
(www.jefftk.com)

Jar­gonBot Beta Test

RaemonNov 1, 2024, 1:05 AM
84 points
55 comments6 min readLW link

GPT-4o Guardrails Gone: Data Poi­son­ing & Jailbreak-Tuning

Nov 1, 2024, 12:10 AM
18 points
0 comments6 min readLW link
(far.ai)

The sling­shot helps with learning

Wilson WuOct 31, 2024, 11:18 PM
33 points
0 comments8 min readLW link

Toward Safety Case In­spired Ba­sic Research

Oct 31, 2024, 11:06 PM
55 points
3 comments13 min readLW link

Spooky Recom­men­da­tion Sys­tem Scaling

phdeadOct 31, 2024, 10:00 PM
11 points
0 comments4 min readLW link

‘Meta’, ‘mesa’, and mountains

LorecOct 31, 2024, 5:25 PM
1 point
0 comments3 min readLW link

Toward Safety Cases For AI Scheming

Oct 31, 2024, 5:20 PM
60 points
1 comment2 min readLW link

AI #88: Thanks for the Memos

ZviOct 31, 2024, 3:00 PM
46 points
5 comments77 min readLW link
(thezvi.wordpress.com)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

Oct 31, 2024, 12:01 PM
195 points
52 comments2 min readLW link
(www.thecompendium.ai)

Some Pre­limi­nary Notes on the Promise of a Wis­dom Explosion

Chris_LeongOct 31, 2024, 9:21 AM
2 points
0 comments1 min readLW link
(aiimpacts.org)

What TMS is like

SableOct 31, 2024, 12:44 AM
208 points
23 comments6 min readLW link
(affablyevil.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Oc­to­ber ’24

gasteigerjoOct 31, 2024, 12:09 AM
3 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

Stan­dard SAEs Might Be In­co­her­ent: A Choos­ing Prob­lem & A “Con­cise” Solution

Kola AyonrindeOct 30, 2024, 10:50 PM
27 points
0 comments12 min readLW link

Generic ad­vice caveats

Saul MunnOct 30, 2024, 9:03 PM
27 points
1 comment3 min readLW link
(www.brasstacks.blog)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

TapataktOct 30, 2024, 8:13 PM
104 points
23 comments1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoinerOct 30, 2024, 6:31 PM
46 points
8 comments4 min readLW link

[Question] How might lan­guage in­fluence how an AI “thinks”?

bodryOct 30, 2024, 5:41 PM
3 points
0 comments1 min readLW link

Mo­ti­va­tion control

Joe CarlsmithOct 30, 2024, 5:15 PM
45 points
7 comments52 min readLW link

Up­dat­ing the NAO Simulator

jefftkOct 30, 2024, 1:50 PM
11 points
0 comments2 min readLW link
(www.jefftk.com)

Oc­cu­pa­tional Li­cens­ing Roundup #1

ZviOct 30, 2024, 11:00 AM
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)

Three No­tions of “Power”

johnswentworthOct 30, 2024, 6:10 AM
92 points
44 comments4 min readLW link

In­tro­duc­tion to Choice set Misspeci­fi­ca­tion in Re­ward In­fer­ence

Rahul ChandOct 29, 2024, 10:57 PM
1 point
0 comments8 min readLW link

Gothen­burg LW/​ACX meetup

StefanOct 29, 2024, 8:40 PM
2 points
0 comments1 min readLW link

The Align­ment Trap: AI Safety as Path to Power

crispweedOct 29, 2024, 3:21 PM
57 points
17 comments5 min readLW link
(upcoder.com)

Hous­ing Roundup #10

ZviOct 29, 2024, 1:50 PM
32 points
2 comments32 min readLW link
(thezvi.wordpress.com)

[In­tu­itive self-mod­els] 7. Hear­ing Voices, and Other Hallucinations

Steven ByrnesOct 29, 2024, 1:36 PM
51 points
2 comments16 min readLW link

Re­view: “The Case Against Real­ity”

David GrossOct 29, 2024, 1:13 PM
20 points
9 comments5 min readLW link

A Poem Is All You Need: Jailbreak­ing ChatGPT, Meta & More

Sharat Jacob JacobOct 29, 2024, 12:41 PM
12 points
0 comments9 min readLW link

Search­ing for phe­nom­e­nal con­scious­ness in LLMs: Per­cep­tual re­al­ity mon­i­tor­ing and in­tro­spec­tive confidence

EuanMcLeanOct 29, 2024, 12:16 PM
45 points
9 comments26 min readLW link

AI #87: Stay­ing in Character

ZviOct 29, 2024, 7:10 AM
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

A path to hu­man autonomy

Nathan Helm-BurgerOct 29, 2024, 3:02 AM
53 points
16 comments20 min readLW link

D&D.Sci Coli­seum: Arena of Data Eval­u­a­tion and Ruleset

aphyerOct 29, 2024, 1:21 AM
47 points
13 comments6 min readLW link