MATS AI Safety Strat­egy Cur­ricu­lum v2

7 Oct 2024 22:44 UTC
43 points
6 comments13 min readLW link

2025 Color Trends

sarahconstantin7 Oct 2024 21:20 UTC
40 points
7 comments6 min readLW link
(sarahconstantin.substack.com)

Clar­ify­ing Align­ment Fun­da­men­tals Through the Lens of Ontology

Ben Ihrig7 Oct 2024 20:57 UTC
12 points
4 comments24 min readLW link

Ethics on Cos­mic Scale, Outer Space Treaty, Directed Pansper­mia, For­wards-Con­tam­i­na­tion, Tech­nol­ogy Assess­ment, Plane­tary Pro­tec­tion, and Fermi’s Paradox

MrFantastic7 Oct 2024 20:56 UTC
−12 points
0 comments1 min readLW link

Do­main-spe­cific SAEs

jacob_drori7 Oct 2024 20:15 UTC
28 points
2 comments5 min readLW link

Me­tac­u­lus Is Open Source

ChristianWilliams7 Oct 2024 19:55 UTC
13 points
0 comments1 min readLW link
(www.metaculus.com)

Re­search up­date: Towards a Law of Iter­ated Ex­pec­ta­tions for Heuris­tic Estimators

Eric Neyman7 Oct 2024 19:29 UTC
87 points
2 comments22 min readLW link

AI Model Registries: A Foun­da­tional Tool for AI Governance

7 Oct 2024 19:27 UTC
20 points
1 comment4 min readLW link
(www.convergenceanalysis.org)

Eval­u­at­ing the truth of state­ments in a world of am­bigu­ous lan­guage.

Hastings7 Oct 2024 18:08 UTC
48 points
19 comments2 min readLW link

Ad­vice for journalists

Nathan Young7 Oct 2024 16:46 UTC
101 points
53 comments9 min readLW link
(nathanpmyoung.substack.com)

Time Effi­cient Re­sis­tance Training

romeostevensit7 Oct 2024 15:15 UTC
42 points
12 comments3 min readLW link

A Nar­row Path: a plan to deal with AI ex­tinc­tion risk

7 Oct 2024 13:02 UTC
80 points
12 comments2 min readLW link
(www.narrowpath.co)

Toy Models of Fea­ture Ab­sorp­tion in SAEs

7 Oct 2024 9:56 UTC
49 points
8 comments10 min readLW link

An ar­gu­ment that con­se­quen­tial­ism is incomplete

cousin_it7 Oct 2024 9:45 UTC
35 points
27 comments1 min readLW link

An X-Ray is Worth 15 Fea­tures: Sparse Au­toen­coders for In­ter­pretable Ra­diol­ogy Re­port Generation

7 Oct 2024 8:53 UTC
40 points
1 comment5 min readLW link
(arxiv.org)

Com­pel­ling Villains and Co­her­ent Values

Cole Wyeth6 Oct 2024 19:53 UTC
42 points
4 comments4 min readLW link

To Be Born in a Bag

Niko_McCarty6 Oct 2024 17:21 UTC
19 points
1 comment16 min readLW link
(www.asimov.press)

Whim­si­cal Thoughts on an AI Notepad: Ex­plor­ing Non-In­va­sive Neu­ral In­te­gra­tion via Viral and Stem Cell Pathways

Pug stanky6 Oct 2024 16:37 UTC
1 point
2 comments4 min readLW link

Why I’m not a Bayesian

Richard_Ngo6 Oct 2024 15:22 UTC
221 points
104 comments10 min readLW link
(www.mindthefuture.info)

Euro­pean Progress Conference

Martin Sustrik6 Oct 2024 11:10 UTC
27 points
11 comments3 min readLW link
(250bpm.substack.com)

Open Thread Fall 2024

habryka5 Oct 2024 22:28 UTC
44 points
194 comments1 min readLW link

[Question] Seek­ing AI Align­ment Tu­tor/​Ad­vi­sor: $100–150/​hr

MrThink5 Oct 2024 21:28 UTC
28 points
3 comments2 min readLW link

In­ter­pretabil­ity of SAE Fea­tures Rep­re­sent­ing Check in ChessGPT

Jonathan Kutasov5 Oct 2024 20:43 UTC
27 points
2 comments8 min readLW link

2024 Elec­tion Fore­cast­ing Contest

mike207315 Oct 2024 20:43 UTC
4 points
0 comments1 min readLW link
(www.mikesblog.net)

5 ways to im­prove CoT faithfulness

Caleb Biddulph5 Oct 2024 20:17 UTC
46 points
40 comments6 min readLW link

Con­scious­ness As Re­cur­sive Reflections

Gunnar_Zarncke5 Oct 2024 20:00 UTC
7 points
2 comments1 min readLW link
(www.astralcodexten.com)

Mus­ings on Text Data Wall (Oct 2024)

Vladimir_Nesov5 Oct 2024 19:00 UTC
41 points
2 comments5 min readLW link

Ap­ply to the Co­op­er­a­tive AI PhD Fel­low­ship by Oc­to­ber 14th!

Lewis Hammond5 Oct 2024 12:41 UTC
23 points
0 comments1 min readLW link

AISafety.info: What is the “nat­u­ral ab­strac­tions hy­poth­e­sis”?

Algon5 Oct 2024 12:31 UTC
38 points
2 comments3 min readLW link
(aisafety.info)

ARENA4.0 Cap­stone: Hyper­pa­ram­e­ter tun­ing for MELBO + repli­ca­tion on Llama-3.2-1b-Instruct

5 Oct 2024 11:30 UTC
34 points
2 comments8 min readLW link

Ex­plor­ing SAE fea­tures in LLMs with defi­ni­tion trees and to­ken lists

mwatkins4 Oct 2024 22:15 UTC
46 points
5 comments6 min readLW link

AXRP Epi­sode 37 - Jaime Sevilla on Fore­cast­ing AI

DanielFilan4 Oct 2024 21:00 UTC
21 points
3 comments56 min readLW link

[Question] Seek­ing Solu­tions for Ag­gre­gat­ing Clas­sifier Outputs

Saeid Ghafouri4 Oct 2024 17:39 UTC
−1 points
0 comments1 min readLW link

Amoeba roles in tech

Sindhu Shivaprasad4 Oct 2024 17:25 UTC
12 points
0 comments4 min readLW link

LASR Labs Spring 2025 ap­pli­ca­tions are open!

4 Oct 2024 13:44 UTC
38 points
0 comments4 min readLW link

(Maybe) A Bag of Heuris­tics is All There Is & A Bag of Heuris­tics is All You Need

Sodium3 Oct 2024 19:11 UTC
35 points
17 comments17 min readLW link

Does nat­u­ral se­lec­tion fa­vor AIs over hu­mans?

cdkg3 Oct 2024 18:47 UTC
20 points
1 comment1 min readLW link
(link.springer.com)

What Hayek Taught Us About Nature

Ground Truth Data3 Oct 2024 18:20 UTC
−1 points
6 comments2 min readLW link

Bi­as­ing VLM Re­sponse with Vi­sual Stimuli

Jaehyuk Lim3 Oct 2024 18:04 UTC
5 points
0 comments8 min readLW link

AI #84: Bet­ter Than a Podcast

Zvi3 Oct 2024 15:00 UTC
56 points
7 comments52 min readLW link
(thezvi.wordpress.com)

[Question] If I have some money, whom should I donate it to in or­der to re­duce ex­pected P(doom) the most?

KvmanThinking3 Oct 2024 11:31 UTC
35 points
37 comments1 min readLW link

Shut­ting down all com­pet­ing AI pro­jects might not buy a lot of time due to In­ter­nal Time Pressure

ThomasCederborg3 Oct 2024 0:01 UTC
12 points
7 comments12 min readLW link

“25 Les­sons from 25 Years of Mar­riage” by hon­orary ra­tio­nal­ist Fer­rett Stein­metz

CronoDAS2 Oct 2024 22:42 UTC
24 points
2 comments1 min readLW link
(theferrett.substack.com)

MIT Fu­tureTech are hiring for a Head of Oper­a­tions role

peterslattery2 Oct 2024 17:11 UTC
8 points
0 comments4 min readLW link

Can AI Quan­tity beat AI Qual­ity?

Gianluca Calcagni2 Oct 2024 15:21 UTC
2 points
0 comments5 min readLW link

[In­tu­itive self-mod­els] 3. The Homunculus

Steven Byrnes2 Oct 2024 15:20 UTC
78 points
39 comments25 min readLW link

AI Safety Univer­sity Or­ga­niz­ing: Early Take­aways from Thir­teen Groups

agucova2 Oct 2024 15:14 UTC
26 points
0 comments9 min readLW link

Three main ar­gu­ments that AI will save hu­mans and one meta-argument

avturchin2 Oct 2024 11:39 UTC
9 points
8 comments2 min readLW link

Should we ab­stain from vot­ing? (In non­de­ter­minis­tic elec­tions)

B Jacobs2 Oct 2024 10:07 UTC
5 points
8 comments4 min readLW link
(bobjacobs.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Septem­ber ’24

gasteigerjo2 Oct 2024 9:49 UTC
13 points
0 comments7 min readLW link
(aisafetyfrontier.substack.com)