the case for CoT un­faith­ful­ness is overstated

nostalgebraistSep 29, 2024, 10:07 PM
263 points
43 comments11 min readLW link

Po­modoro Method Ran­dom­ized Self Experiment

niplavSep 29, 2024, 9:55 PM
14 points
2 comments1 min readLW link

Toy Models of Su­per­po­si­tion: Sim­plified by Hand

Axel SorensenSep 29, 2024, 9:19 PM
9 points
3 comments8 min readLW link

LLMs are likely not conscious

research_prime_spaceSep 29, 2024, 8:57 PM
6 points
9 comments1 min readLW link

A Policy Proposal

phdeadSep 29, 2024, 8:45 PM
10 points
4 comments4 min readLW link

Do Sparse Au­toen­coders (SAEs) trans­fer across base and fine­tuned lan­guage mod­els?

Sep 29, 2024, 7:37 PM
26 points
8 comments25 min readLW link

Models of life

Abhishaike MahajanSep 29, 2024, 7:24 PM
8 points
0 comments16 min readLW link
(www.asimov.press)

In­ter­pret­ing the effects of Jailbreak Prompts in LLMs

Harsh RajSep 29, 2024, 7:01 PM
8 points
0 comments5 min readLW link

New Ca­pa­bil­ities, New Risks? - Eval­u­at­ing Agen­tic Gen­eral As­sis­tants us­ing Ele­ments of GAIA & METR Frameworks

Tej LanderSep 29, 2024, 6:58 PM
5 points
0 comments29 min readLW link

Devel­op­men­tal Stages in Multi-Prob­lem Grokking

James SullivanSep 29, 2024, 6:58 PM
4 points
0 comments6 min readLW link

A Psy­cho­an­a­lytic Ex­pla­na­tion of Sam Alt­man’s Ir­ra­tional Actions

GabeSep 29, 2024, 6:58 PM
1 point
3 comments3 min readLW link

Build­ing Safer AI from the Ground Up: Steer­ing Model Be­hav­ior via Pre-Train­ing Data Curation

Antonio ClarkeSep 29, 2024, 6:48 PM
6 points
0 comments23 min readLW link

Cry­on­ics is free

Mati_RoySep 29, 2024, 5:58 PM
198 points
46 comments2 min readLW link

Run­ner’s High On De­mand: A Story of Luck & Persistence

Shoshannah TekofskySep 29, 2024, 5:15 PM
14 points
6 comments5 min readLW link
(shoshanigans.substack.com)

You can, in fact, bam­boo­zle an un­al­igned AI into spar­ing your life

David MatolcsiSep 29, 2024, 4:59 PM
113 points
173 comments27 min readLW link

Base LLMs re­fuse too

Sep 29, 2024, 4:04 PM
60 points
20 comments10 min readLW link

My Method­olog­i­cal Turn

adamShimiSep 29, 2024, 3:01 PM
29 points
0 comments1 min readLW link
(formethods.substack.com)

Linkpost: Hypocrisy standoff

Chris_LeongSep 29, 2024, 2:27 PM
5 points
1 comment1 min readLW link
(x.com)

[Question] Any real toe­holds for mak­ing prac­ti­cal de­ci­sions re­gard­ing AI safety?

lemonhopeSep 29, 2024, 12:03 PM
27 points
6 comments1 min readLW link

Re­view: Dr Stone

ProgramCrafterSep 29, 2024, 10:35 AM
18 points
9 comments4 min readLW link

AXRP Epi­sode 36 - Adam Shai and Paul Riech­ers on Com­pu­ta­tional Mechanics

DanielFilanSep 29, 2024, 5:50 AM
25 points
0 comments55 min readLW link

DunCon @Lighthaven

Duncan Sabien (Inactive)Sep 29, 2024, 4:56 AM
45 points
2 comments1 min readLW link

Ex­plor­ing Shard-like Be­hav­ior: Em­piri­cal In­sights into Con­tex­tual De­ci­sion-Mak­ing in RL Agents

Alejandro AristizabalSep 29, 2024, 12:32 AM
6 points
0 comments15 min readLW link

Jailbreak­ing lan­guage mod­els with user roleplay

loopsSep 28, 2024, 11:43 PM
8 points
0 comments3 min readLW link
(iter.ca)

“Slow” take­off is a ter­rible term for “maybe even faster take­off, ac­tu­ally”

RaemonSep 28, 2024, 11:38 PM
218 points
69 comments1 min readLW link

Con­tex­tual Con­sti­tu­tional AI

aksh-nSep 28, 2024, 11:24 PM
14 points
2 comments12 min readLW link

Ex­plore More: A Bag of Tricks to Keep Your Life on the Rails

Shoshannah TekofskySep 28, 2024, 9:38 PM
237 points
19 comments11 min readLW link
(shoshanigans.substack.com)

2024 Petrov Day Retrospective

Sep 28, 2024, 9:30 PM
93 points
25 comments10 min readLW link

[Question] Any Trump Sup­port­ers Want to Dialogue?

k64Sep 28, 2024, 7:41 PM
15 points
86 comments1 min readLW link

Eval­u­at­ing LLaMA 3 for poli­ti­cal syco­phancy

alma.liezengaSep 28, 2024, 7:02 PM
2 points
2 comments6 min readLW link

Two new datasets for eval­u­at­ing poli­ti­cal syco­phancy in LLMs

alma.liezengaSep 28, 2024, 6:29 PM
9 points
0 comments9 min readLW link

COT Scal­ing im­plies slower take­off speeds

Logan ZoellnerSep 28, 2024, 4:20 PM
36 points
56 comments1 min readLW link

Thoughts on Evo-Bio Math and Mesa-Op­ti­miza­tion: Maybe We Need To Think Harder About “Rel­a­tive” Fit­ness?

LorecSep 28, 2024, 2:07 PM
6 points
6 comments1 min readLW link

Steer­ing LLMs’ Be­hav­ior with Con­cept Ac­ti­va­tion Vectors

Ruixuan HuangSep 28, 2024, 9:53 AM
8 points
0 comments10 min readLW link

An In­ter­ac­tive Shap­ley Value Explainer

James Stephen BrownSep 28, 2024, 5:01 AM
42 points
9 comments1 min readLW link
(nonzerosum.games)

[Question] Im­pli­ca­tions of China’s re­ces­sion on AGI de­vel­op­ment?

Eric NeymanSep 28, 2024, 1:12 AM
41 points
3 comments1 min readLW link

The Com­pute Co­nun­drum: AI Gover­nance in a Shift­ing Geopoli­ti­cal Era

octavoSep 28, 2024, 1:05 AM
−3 points
1 comment17 min readLW link

‘Chat with im­pact­ful re­search & eval­u­a­tions’ (Un­jour­nal Note­bookLMs)

david reinsteinSep 28, 2024, 12:32 AM
6 points
0 comments2 min readLW link

Eye con­tact is effortless when you’re no longer emo­tion­ally blocked on it

Chris LakinSep 27, 2024, 9:47 PM
37 points
24 comments4 min readLW link

Where is the Learn Every­thing Sys­tem?

Shoshannah TekofskySep 27, 2024, 9:30 PM
15 points
8 comments4 min readLW link
(thinkfeelplay.substack.com)

An “Ob­ser­va­tory” For a Shy Su­per AI?

SherrinfordSep 27, 2024, 9:22 PM
5 points
0 comments1 min readLW link
(robreid.substack.com)

[Question] Search­ing for Im­pos­si­bil­ity Re­sults or No-Go The­o­rems for prov­able safety.

MaelstromSep 27, 2024, 8:12 PM
2 points
1 comment1 min readLW link

What is Ran­dom­ness?

martinkunevSep 27, 2024, 5:49 PM
11 points
2 comments10 min readLW link

The Geom­e­try of Feel­ings and Non­sense in Large Lan­guage Models

Sep 27, 2024, 5:49 PM
61 points
10 comments4 min readLW link

Avoid­ing jailbreaks by dis­cour­ag­ing their rep­re­sen­ta­tion in ac­ti­va­tion space

Guido BergmanSep 27, 2024, 5:49 PM
7 points
2 comments9 min readLW link

[Question] Why is o1 so de­cep­tive?

abramdemskiSep 27, 2024, 5:27 PM
183 points
24 comments3 min readLW link

The Offense-Defense Balance of Gene Drives

Maxwell TabarrokSep 27, 2024, 4:47 PM
23 points
1 comment4 min readLW link
(www.maximum-progress.com)

Book Re­view: On the Edge: The Future

ZviSep 27, 2024, 2:00 PM
61 points
1 comment49 min readLW link
(thezvi.wordpress.com)

[Question] Is cy­ber­crime re­ally cost­ing trillions per year?

Fabien RogerSep 27, 2024, 8:44 AM
63 points
28 comments1 min readLW link

Aus­tralian AI Safety Fo­rum 2024

Sep 27, 2024, 12:40 AM
42 points
0 comments2 min readLW link