Look­ing for judges for cri­tiques of Align­ment Plans

IknownothingAug 17, 2023, 10:35 PM
6 points
0 comments1 min readLW link

How is ChatGPT’s be­hav­ior chang­ing over time?

worseAug 17, 2023, 8:54 PM
3 points
0 comments1 min readLW link
(arxiv.org)

Progress links di­gest, 2023-08-17: Cloud seed­ing, robotic sculp­tors, and rogue planets

jasoncrawfordAug 17, 2023, 8:29 PM
15 points
1 comment4 min readLW link
(rootsofprogress.org)

Model of psy­chosis, take 2

Steven ByrnesAug 17, 2023, 7:11 PM
34 points
13 comments4 min readLW link

[Linkpost] Ro­bus­tified ANNs Re­veal Worm­holes Between Hu­man Cat­e­gory Percepts

Bogdan Ionut CirsteaAug 17, 2023, 7:10 PM
6 points
2 comments1 min readLW link

Against Al­most Every The­ory of Im­pact of Interpretability

Charbel-RaphaëlAug 17, 2023, 6:44 PM
329 points
91 comments26 min readLW link2 reviews

Goldilocks and the Three Optimisers

dkl9Aug 17, 2023, 6:15 PM
−10 points
0 comments5 min readLW link
(dkl9.net)

An­nounc­ing Fore­sight In­sti­tute’s AI Safety Grants Program

Allison DuettmannAug 17, 2023, 5:34 PM
35 points
2 comments1 min readLW link

The Ne­gen­tropy Cliff

mephistophelesAug 17, 2023, 5:08 PM
6 points
10 comments1 min readLW link

“AI Wel­lbe­ing” and the On­go­ing De­bate on Phenom­e­nal Consciousness

FlorianHAug 17, 2023, 3:47 PM
10 points
6 comments7 min readLW link

AI #25: In­flec­tion Point

ZviAug 17, 2023, 2:40 PM
59 points
9 comments36 min readLW link
(thezvi.wordpress.com)

[Question] Why might Gen­eral In­tel­li­gences have long term goals?

yrimonAug 17, 2023, 2:10 PM
3 points
17 comments1 min readLW link

Un­der­stand­ing Coun­ter­bal­anced Sub­trac­tions for Bet­ter Ac­ti­va­tion Additions

ojorgensenAug 17, 2023, 1:53 PM
21 points
0 comments14 min readLW link

Reflec­tions on “Mak­ing the Atomic Bomb”

boazbarakAug 17, 2023, 2:48 AM
51 points
7 comments8 min readLW link

Au­tonomous repli­ca­tion and adap­ta­tion: an at­tempt at a con­crete dan­ger threshold

Hjalmar_WijkAug 17, 2023, 1:31 AM
45 points
0 comments13 min readLW link

[Question] (Thought ex­per­i­ment) If you had to choose, which would you pre­fer?

kuiraAug 17, 2023, 12:57 AM
9 points
2 comments1 min readLW link

Some rules for life (v.0,0)

Neil Aug 17, 2023, 12:43 AM
43 points
13 comments12 min readLW link
(neilwarren.substack.com)

When AI cri­tique works even with mis­al­igned models

Fabien RogerAug 17, 2023, 12:12 AM
23 points
0 comments2 min readLW link

Book Launch: “The Carv­ing of Real­ity,” Best of LessWrong vol. III

RaemonAug 16, 2023, 11:52 PM
131 points
22 comments5 min readLW link

One ex­am­ple of how LLM pro­pa­ganda at­tacks can hack the brain

trevorAug 16, 2023, 9:41 PM
27 points
8 comments4 min readLW link

If we had known the at­mo­sphere would ignite

JeffsAug 16, 2023, 8:28 PM
59 points
63 comments2 min readLW link

Stampy’s AI Safety Info—New Distil­la­tions #4 [July 2023]

markovAug 16, 2023, 7:03 PM
22 points
10 comments1 min readLW link
(aisafety.info)

A Proof of Löb’s The­o­rem us­ing Com­putabil­ity Theory

jessicataAug 16, 2023, 6:57 PM
76 points
0 comments17 min readLW link
(unstableontology.com)

Sum­mary of and Thoughts on the Hotz/​Yud­kowsky Debate

ZviAug 16, 2023, 4:50 PM
106 points
47 comments9 min readLW link
(thezvi.wordpress.com)

Red Pill vs Blue Pill, Bayes style

ErickBallAug 16, 2023, 3:23 PM
28 points
33 comments1 min readLW link

What does it mean to “trust sci­ence”?

jasoncrawfordAug 16, 2023, 2:56 PM
34 points
9 comments1 min readLW link
(rootsofprogress.org)

Ja­son Crawford /​ The Roots of Progress in Ban­ga­lore, Au­gust 21 to Septem­ber 8

jasoncrawfordAug 16, 2023, 1:36 PM
13 points
1 comment1 min readLW link
(rootsofprogress.org)

Gain­ing knowl­edge at a price

DavidMadsenAug 16, 2023, 10:21 AM
−4 points
5 comments1 min readLW link

Un­der­stand­ing and vi­su­al­iz­ing syco­phancy datasets

Nina PanicksseryAug 16, 2023, 5:34 AM
47 points
0 comments6 min readLW link

Ge­orge Hotz vs Eliezer Yud­kowsky AI Safety De­bate—link and brief discussion

Gerald MonroeAug 16, 2023, 4:31 AM
11 points
26 comments2 min readLW link
(www.youtube.com)

[Question] How to take ad­van­age of the mar­ket’s ir­ra­tional­ity re­gard­ing AGI?

GeneSmithAug 16, 2023, 3:30 AM
24 points
7 comments2 min readLW link

In­finite Ethics: In­finite Problems

Bentham's BulldogAug 16, 2023, 2:44 AM
−2 points
25 comments23 min readLW link

Pri­vate Biosta­sis & Cry­on­ics Social

Mati_RoyAug 16, 2023, 2:34 AM
11 points
0 comments1 min readLW link

Some thoughts on Ge­orge Hotz vs Eliezer Yudkowsky

TristanTrimAug 15, 2023, 11:33 PM
10 points
3 comments2 min readLW link

Un­der­stand­ing the In­for­ma­tion Flow in­side Large Lan­guage Models

Aug 15, 2023, 9:13 PM
19 points
0 comments17 min readLW link

[Question] Any re­search in “probe-tun­ing” of LLMs?

Roman LeventovAug 15, 2023, 9:01 PM
20 points
3 comments1 min readLW link

Can AI Trans­form the Elec­torate into a Ci­ti­zen’s Assem­bly

RoscoHunterAug 15, 2023, 5:52 PM
−3 points
5 comments3 min readLW link

Ten Thou­sand Years of Solitude

agpAug 15, 2023, 5:45 PM
137 points
19 comments4 min readLW link
(www.discovermagazine.com)

AISN #19: US-China Com­pe­ti­tion on AI Chips, Mea­sur­ing Lan­guage Agent Devel­op­ments, Eco­nomic Anal­y­sis of Lan­guage Model Pro­pa­ganda, and White House AI Cy­ber Challenge

Dan HAug 15, 2023, 4:10 PM
21 points
0 comments5 min readLW link
(newsletter.safe.ai)

[Question] What is the most effec­tive anti-tyranny char­ity?

lcAug 15, 2023, 3:26 PM
20 points
10 comments1 min readLW link

My check­list for pub­lish­ing a blog post

Steven ByrnesAug 15, 2023, 3:04 PM
87 points
6 comments3 min readLW link

The Dun­bar Play­book: A CRM sys­tem for your friends

Severin T. SeehrichAug 15, 2023, 8:44 AM
32 points
16 comments5 min readLW link
(amoretlicentia.substack.com)

Op­ti­cal Illu­sions are Out of Distri­bu­tion Errors

vitaliyaAug 15, 2023, 2:23 AM
30 points
8 comments2 min readLW link

A short calcu­la­tion about a Twit­ter poll

Ege ErdilAug 14, 2023, 7:48 PM
64 points
64 comments11 min readLW link

De­com­pos­ing in­de­pen­dent gen­er­al­iza­tions in neu­ral net­works via Hes­sian analysis

Aug 14, 2023, 5:04 PM
84 points
4 comments1 min readLW link

Memetic Judo #2: In­cor­po­ral Switches and Lev­ers Compendium

Max TKAug 14, 2023, 4:53 PM
19 points
6 comments17 min readLW link

Ex­is­ten­tially rele­vant thought ex­per­i­ment: To kill or not to kill, a sniper, a man and a but­ton.

AlexFromSafeTransitionAug 14, 2023, 10:53 AM
−18 points
6 comments4 min readLW link

Step­ping down as mod­er­a­tor on LW

Kaj_SotalaAug 14, 2023, 10:46 AM
82 points
1 comment1 min readLW link

An­nounc­ing Man­i­fest 2023 (Sep 22-24 in Berkeley)

Aug 14, 2023, 5:13 AM
31 points
0 comments2 min readLW link

Co­her­ence Ther­apy with LLMs—quick demo

Chris LakinAug 14, 2023, 3:34 AM
19 points
11 comments1 min readLW link