How does a toy 2 digit sub­trac­tion trans­former pre­dict the differ­ence?

Evan AndersDec 22, 2023, 9:17 PM
12 points
0 comments10 min readLW link
(evanhanders.blog)

Thoughts on Max Teg­mark’s AI verification

Johannes C. MayerDec 22, 2023, 8:38 PM
10 points
0 comments3 min readLW link

Ideal­ized Agents Are Ap­prox­i­mate Causal Mir­rors (+ Rad­i­cal Op­ti­mism on Agent Foun­da­tions)

Thane RuthenisDec 22, 2023, 8:19 PM
75 points
14 comments6 min readLW link

AI safety ad­vo­cates should con­sider pro­vid­ing gen­tle push­back fol­low­ing the events at OpenAI

civilsocietyDec 22, 2023, 6:55 PM
16 points
5 comments3 min readLW link

“De­stroy hu­man­ity” as an im­me­di­ate subgoal

Seth AhrenbachDec 22, 2023, 6:52 PM
3 points
13 comments3 min readLW link

Syn­thetic Restrictions

nano_brascaDec 22, 2023, 6:50 PM
10 points
0 comments4 min readLW link

Re­view Re­port of David­son on Take­off Speeds (2023)

Trent KannegieterDec 22, 2023, 6:48 PM
37 points
11 comments38 min readLW link

The prob­lems with the con­cept of an in­fo­haz­ard as used by the LW com­mu­nity [Linkpost]

Noosphere89Dec 22, 2023, 4:13 PM
75 points
43 comments3 min readLW link
(www.beren.io)

Em­ployee In­cen­tives Make AGI Lab Pauses More Costly

Nikola JurkovicDec 22, 2023, 5:04 AM
28 points
12 comments3 min readLW link

The LessWrong 2022 Re­view: Re­view Phase

RobertMDec 22, 2023, 3:23 AM
58 points
7 comments2 min readLW link

The ab­sence of self-re­jec­tion is self-acceptance

ChipmonkDec 21, 2023, 9:54 PM
24 points
1 comment1 min readLW link
(chipmonk.substack.com)

A De­ci­sion The­ory Can Be Ra­tional or Com­putable, but Not Both

StrivingForLegibilityDec 21, 2023, 9:02 PM
9 points
4 comments1 min readLW link

Most Peo­ple Don’t Real­ize We Have No Idea How Our AIs Work

Thane RuthenisDec 21, 2023, 8:02 PM
159 points
42 comments1 min readLW link

Pseudonymity and Accusations

jefftkDec 21, 2023, 7:20 PM
52 points
20 comments3 min readLW link
(www.jefftk.com)

At­ten­tion on AI X-Risk Likely Hasn’t Dis­tracted from Cur­rent Harms from AI

Erich_GrunewaldDec 21, 2023, 5:24 PM
26 points
2 comments17 min readLW link
(www.erichgrunewald.com)

“Align­ment” is one of six words of the year in the Har­vard Gazette

Nikola JurkovicDec 21, 2023, 3:54 PM
14 points
1 comment1 min readLW link
(news.harvard.edu)

AI #43: Func­tional Discoveries

ZviDec 21, 2023, 3:50 PM
52 points
26 comments49 min readLW link
(thezvi.wordpress.com)

Rat­ing my AI Predictions

Robert_AIZIDec 21, 2023, 2:07 PM
22 points
5 comments2 min readLW link
(aizi.substack.com)

AI Safety Chatbot

Dec 21, 2023, 2:06 PM
61 points
11 comments4 min readLW link

On OpenAI’s Pre­pared­ness Framework

ZviDec 21, 2023, 2:00 PM
51 points
4 comments21 min readLW link
(thezvi.wordpress.com)

Pre­dic­tion Mar­kets aren’t Magic

SimonMDec 21, 2023, 12:54 PM
90 points
29 comments3 min readLW link

[Question] Why is cap­nom­e­try biofeed­back not more widely known?

riceissaDec 21, 2023, 2:42 AM
20 points
22 comments4 min readLW link

My best guess at the im­por­tant tricks for train­ing 1L SAEs

Arthur ConmyDec 21, 2023, 1:59 AM
37 points
4 comments3 min readLW link

Seat­tle Win­ter Solstice

a7xDec 20, 2023, 8:30 PM
6 points
1 comment1 min readLW link

How Would an Utopia-Max­i­mizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM
32 points
23 comments10 min readLW link

Succession

Richard_NgoDec 20, 2023, 7:25 PM
159 points
48 comments11 min readLW link
(www.narrativeark.xyz)

Me­tac­u­lus In­tro­duces Mul­ti­ple Choice Questions

ChristianWilliamsDec 20, 2023, 7:00 PM
4 points
0 commentsLW link
(www.metaculus.com)

Brighter Than To­day Versions

jefftkDec 20, 2023, 6:20 PM
16 points
2 comments2 min readLW link
(www.jefftk.com)

Gaia Net­work: a prac­ti­cal, in­cre­men­tal path­way to Open Agency Architecture

Dec 20, 2023, 5:11 PM
22 points
8 comments16 min readLW link

On the fu­ture of lan­guage models

owencbDec 20, 2023, 4:58 PM
105 points
17 commentsLW link

[Valence se­ries] Ap­pendix A: He­donic tone /​ (dis)plea­sure /​ (dis)liking

Steven ByrnesDec 20, 2023, 3:54 PM
18 points
0 comments13 min readLW link

Ma­trix com­ple­tion prize results

paulfchristianoDec 20, 2023, 3:40 PM
41 points
0 comments2 min readLW link
(www.alignment.org)

[Question] What’s the min­i­mal ad­di­tive con­stant for Kol­mogorov Com­plex­ity that a pro­gram­ming lan­guage can achieve?

Noosphere89Dec 20, 2023, 3:36 PM
11 points
15 comments1 min readLW link

Le­gal­ize bu­tanol?

bhauthDec 20, 2023, 2:24 PM
39 points
20 comments5 min readLW link
(www.bhauth.com)

A short di­alogue on com­pa­ra­bil­ity of values

cousin_itDec 20, 2023, 2:08 PM
27 points
7 comments1 min readLW link

In­side View, Out­side View… And Op­pos­ing View

chaosmageDec 20, 2023, 12:35 PM
21 points
1 comment5 min readLW link

Heuris­tics for pre­vent­ing ma­jor life mistakes

SK2Dec 20, 2023, 8:01 AM
28 points
2 comments3 min readLW link

What should be reified?

herschelDec 20, 2023, 4:52 AM
4 points
2 comments2 min readLW link
(brothernin.substack.com)

(In)ap­pro­pri­ate (De)reification

herschelDec 20, 2023, 4:51 AM
10 points
1 comment4 min readLW link
(brothernin.substack.com)

Es­cap­ing Skeuomorphism

Stuart JohnsonDec 20, 2023, 3:51 AM
28 points
0 comments8 min readLW link

Ronny and Nate dis­cuss what sorts of minds hu­man­ity is likely to find by Ma­chine Learning

Dec 19, 2023, 11:39 PM
42 points
30 comments25 min readLW link

[Question] What are the best Siderea posts?

mike_hawkeDec 19, 2023, 11:07 PM
17 points
2 comments1 min readLW link

Mean­ing & Agency

abramdemskiDec 19, 2023, 10:27 PM
91 points
17 comments14 min readLW link

s/​acc: Safe Ac­cel­er­a­tionism Manifesto

lorepieriDec 19, 2023, 10:19 PM
−4 points
5 comments2 min readLW link
(lorenzopieri.com)

Don’t Share In­for­ma­tion Exfo­haz­ardous on Others’ AI-Risk Models

Thane RuthenisDec 19, 2023, 8:09 PM
68 points
11 comments1 min readLW link

Paper: Tell, Don’t Show- Declar­a­tive facts in­fluence how LLMs generalize

Dec 19, 2023, 7:14 PM
45 points
4 comments6 min readLW link
(arxiv.org)

In­ter­view: Ap­pli­ca­tions w/​ Alice Rigg

jacobhaimesDec 19, 2023, 7:03 PM
12 points
0 comments1 min readLW link
(into-ai-safety.github.io)

How does a toy 2 digit sub­trac­tion trans­former pre­dict the sign of the out­put?

Evan AndersDec 19, 2023, 6:56 PM
14 points
0 comments8 min readLW link
(evanhanders.blog)

In­cre­men­tal AI Risks from Proxy-Simulations

kmenouDec 19, 2023, 6:56 PM
2 points
0 comments1 min readLW link
(individual.utoronto.ca)

Goal-Com­plete­ness is like Tur­ing-Com­plete­ness for AGI

LironDec 19, 2023, 6:12 PM
51 points
26 comments3 min readLW link