How to Give in to Threats (with­out in­cen­tiviz­ing them)

Mikhail SaminSep 12, 2024, 3:55 PM
67 points
30 comments5 min readLW link

Another ar­gu­ment against util­ity-cen­tric al­ign­ment paradigms

Fiora SunshineSep 22, 2024, 7:28 AM
67 points
39 comments8 min readLW link

Book Re­view: On the Edge: The Fundamentals

ZviSep 23, 2024, 1:40 PM
64 points
3 comments31 min readLW link
(thezvi.wordpress.com)

[Question] Is cy­ber­crime re­ally cost­ing trillions per year?

Fabien RogerSep 27, 2024, 8:44 AM
63 points
28 comments1 min readLW link

Pay-on-re­sults per­sonal growth: first success

ChipmonkSep 14, 2024, 3:39 AM
63 points
8 comments4 min readLW link
(chrislakin.blog)

What is SB 1047 *for*?

RaemonSep 5, 2024, 5:39 PM
61 points
8 comments3 min readLW link

The Geom­e­try of Feel­ings and Non­sense in Large Lan­guage Models

Sep 27, 2024, 5:49 PM
61 points
10 comments4 min readLW link

Book Re­view: On the Edge: The Future

ZviSep 27, 2024, 2:00 PM
61 points
1 comment49 min readLW link
(thezvi.wordpress.com)

Base LLMs re­fuse too

Sep 29, 2024, 4:04 PM
60 points
20 comments10 min readLW link

On the UBI Paper

ZviSep 3, 2024, 2:50 PM
60 points
6 comments19 min readLW link
(thezvi.wordpress.com)

Pol­lsters Should Pub­lish Ques­tion Translations

jefftkSep 8, 2024, 10:10 PM
60 points
3 comments2 min readLW link
(www.jefftk.com)

AI #81: Alpha Proteo

ZviSep 12, 2024, 1:00 PM
59 points
3 comments35 min readLW link
(thezvi.wordpress.com)

Work with me on agent foun­da­tions: in­de­pen­dent fellowship

Alex_AltairSep 21, 2024, 1:59 PM
59 points
5 comments4 min readLW link

How you can help pass im­por­tant AI leg­is­la­tion with 10 min­utes of effort

ThomasWSep 14, 2024, 10:10 PM
59 points
2 comments2 min readLW link

Mira Mu­rati leaves OpenAI/​ OpenAI to re­move non-profit control

SodiumSep 25, 2024, 9:15 PM
58 points
4 comments2 min readLW link

Mak­ing Eggs Without Ovaries

Sep 22, 2024, 5:44 PM
58 points
3 comments16 min readLW link
(www.asimov.press)

Se­cret Col­lu­sion: Will We Know When to Un­plug AI?

Sep 16, 2024, 4:07 PM
57 points
7 comments31 min readLW link

Ev­i­dence against Learned Search in a Chess-Play­ing Neu­ral Network

p.b.Sep 13, 2024, 11:59 AM
57 points
3 comments6 min readLW link

On the Role of Proto-Languages

adamShimiSep 22, 2024, 4:50 PM
54 points
1 comment4 min readLW link
(epistemologicalfascinations.substack.com)

Re­for­ma­tive Hypocrisy, and Pay­ing Close Enough At­ten­tion to Selec­tively Re­ward It.

Andrew_CritchSep 11, 2024, 4:41 AM
53 points
11 comments3 min readLW link

[Question] If I wanted to spend WAY more on AI, what would I spend it on?

Logan Zoellner15 Sep 2024 21:24 UTC
53 points
16 comments1 min readLW link

Model evals for dan­ger­ous capabilities

Zach Stein-Perlman23 Sep 2024 11:00 UTC
51 points
11 comments3 min readLW link

AI and the Tech­nolog­i­cal Richter Scale

Zvi4 Sep 2024 14:00 UTC
51 points
9 comments13 min readLW link
(thezvi.wordpress.com)

AI #82: The Gover­nor Ponders

Zvi19 Sep 2024 13:30 UTC
50 points
8 comments27 min readLW link
(thezvi.wordpress.com)

The Frag­ility of Life Hy­poth­e­sis and the Evolu­tion of Cooperation

KristianRonn4 Sep 2024 21:04 UTC
50 points
6 comments11 min readLW link

Book re­view: Xenosystems

jessicata16 Sep 2024 20:17 UTC
50 points
18 comments37 min readLW link
(unstableontology.com)

Ap­pli­ca­tions of Chaos: Say­ing No (with Hast­ings Greer)

Elizabeth21 Sep 2024 16:30 UTC
50 points
16 comments2 min readLW link
(acesounderglass.com)

Con­flat­ing value al­ign­ment and in­tent al­ign­ment is caus­ing confusion

Seth Herd5 Sep 2024 16:39 UTC
49 points
18 comments5 min readLW link

We Don’t Know Our Own Values, but Re­ward Bridges The Is-Ought Gap

19 Sep 2024 22:22 UTC
48 points
48 comments5 min readLW link

In­ter­ested in Cog­ni­tive Boot­camp?

Raemon19 Sep 2024 22:12 UTC
48 points
0 comments2 min readLW link

I fi­nally got ChatGPT to sound like me

lsusr17 Sep 2024 9:39 UTC
47 points
18 comments6 min readLW link

AI #80: Never Have I Ever

Zvi10 Sep 2024 17:50 UTC
46 points
20 comments39 min readLW link
(thezvi.wordpress.com)

MIRI’s Septem­ber 2024 newsletter

Harlan16 Sep 2024 18:15 UTC
46 points
0 comments1 min readLW link
(intelligence.org)

Bounty for Ev­i­dence on Some of Pal­isade Re­search’s Beliefs

23 Sep 2024 20:01 UTC
46 points
4 comments2 min readLW link

Michael Dick­ens’ Caf­feine Tol­er­ance Research

niplav4 Sep 2024 15:41 UTC
46 points
5 comments2 min readLW link
(mdickens.me)

DunCon @Lighthaven

Duncan Sabien (Inactive)29 Sep 2024 4:56 UTC
45 points
2 comments1 min readLW link

A Path out of In­suffi­cient Views

Unreal24 Sep 2024 20:00 UTC
44 points
65 comments9 min readLW link

How difficult is AI Align­ment?

Sammy Martin13 Sep 2024 15:47 UTC
44 points
6 comments23 min readLW link

Eco­nomics Roundup #3

Zvi10 Sep 2024 13:50 UTC
44 points
9 comments20 min readLW link
(thezvi.wordpress.com)

Which LessWrong/​Align­ment top­ics would you like to be tu­tored in? [Poll]

Ruby19 Sep 2024 1:35 UTC
43 points
12 comments1 min readLW link

Char­ac­ter­iz­ing sta­ble re­gions in the resi­d­ual stream of LLMs

26 Sep 2024 13:44 UTC
42 points
4 comments1 min readLW link
(arxiv.org)

Aus­tralian AI Safety Fo­rum 2024

27 Sep 2024 0:40 UTC
42 points
0 comments2 min readLW link

Open Prob­lems in AIXI Agent Foundations

Cole Wyeth12 Sep 2024 15:38 UTC
42 points
2 comments10 min readLW link

For­mal­iz­ing the In­for­mal (event in­vite)

abramdemski10 Sep 2024 19:22 UTC
42 points
0 comments1 min readLW link

An In­ter­ac­tive Shap­ley Value Explainer

James Stephen Brown28 Sep 2024 5:01 UTC
42 points
9 comments1 min readLW link
(nonzerosum.games)

[Question] Im­pli­ca­tions of China’s re­ces­sion on AGI de­vel­op­ment?

Eric Neyman28 Sep 2024 1:12 UTC
41 points
3 comments1 min readLW link

Pro­gram­ming Re­fusal with Con­di­tional Ac­ti­va­tion Steering

Bruce W. Lee11 Sep 2024 20:57 UTC
41 points
0 comments11 min readLW link
(brucewlee.com)

in­struc­tion tun­ing and au­tore­gres­sive dis­tri­bu­tion shift

nostalgebraist5 Sep 2024 16:53 UTC
40 points
5 comments5 min readLW link

[Linkpost] Play with SAEs on Llama 3

25 Sep 2024 22:35 UTC
40 points
2 comments1 min readLW link

Gen­er­a­tive ML in chem­istry is bot­tle­necked by synthesis

Abhishaike Mahajan16 Sep 2024 16:31 UTC
38 points
2 comments14 min readLW link
(www.owlposting.com)