What do Marginal Grants at EAIF Look Like? Fund­ing Pri­ori­ties and Grant­mak­ing Thresh­olds at the EA In­fras­truc­ture Fund

Linch12 Oct 2023 21:40 UTC
20 points
0 comments1 min readLW link

unRLHF—Effi­ciently un­do­ing LLM safeguards

12 Oct 2023 19:58 UTC
117 points
15 comments20 min readLW link

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

12 Oct 2023 19:58 UTC
148 points
29 comments14 min readLW link

[Question] Look­ing for read­ing recom­men­da­tions: The­o­ries of right/​jus­tice that safe­guard against hav­ing one’s job au­to­mated?

bulKlub12 Oct 2023 19:40 UTC
−1 points
1 comment1 min readLW link

The In­ter­na­tional PauseAI Protest: Ac­tivism un­der uncertainty

Joseph Miller12 Oct 2023 17:36 UTC
32 points
1 comment1 min readLW link

AI #33: Cool New In­ter­pretabil­ity Paper

Zvi12 Oct 2023 16:20 UTC
46 points
18 comments46 min readLW link
(thezvi.wordpress.com)

Notic­ing con­fu­sion in physics

Jacob G-W12 Oct 2023 15:21 UTC
20 points
27 comments2 min readLW link
(jacobgw.com)

[Question] How to make to-do lists (and to get things done)?

TeaTieAndHat12 Oct 2023 14:26 UTC
9 points
13 comments2 min readLW link

Rele­vance of ‘Harm­ful In­tel­li­gence’ Data in Train­ing Datasets (We­bText vs. Pile)

MiguelDev12 Oct 2023 12:08 UTC
12 points
0 comments9 min readLW link

Soul­mate Fermi Es­ti­mate + My A(ltr)u[t]is­tic Mat­ing Strat­egy

Jordan Arel12 Oct 2023 8:32 UTC
0 points
9 comments3 min readLW link

Evolu­tion Solved Align­ment (what sharp left turn?)

jacob_cannell12 Oct 2023 4:15 UTC
20 points
89 comments4 min readLW link

The CHOICE

Gabi QUENE12 Oct 2023 3:02 UTC
−29 points
2 comments3 min readLW link

Sols­tice 2023 Roundup

dspeyer11 Oct 2023 23:09 UTC
28 points
6 comments1 min readLW link

Un­der­stand­ing LLMs: Some ba­sic ob­ser­va­tions about words, syn­tax, and dis­course [w/​ a con­jec­ture about grokking]

Bill Benzon11 Oct 2023 19:13 UTC
5 points
0 comments5 min readLW link

[Linkpost] Gen­er­al­iza­tion in diffu­sion mod­els arises from ge­om­e­try-adap­tive har­monic representation

Bogdan Ionut Cirstea11 Oct 2023 17:48 UTC
4 points
3 comments1 min readLW link

What I’ve been read­ing, Oc­to­ber 2023: The stir­rup in Europe, 19th-cen­tury art deco, and more

jasoncrawford11 Oct 2023 16:11 UTC
18 points
2 comments11 min readLW link
(rootsofprogress.org)

EA Madrid social

Pablo Villalobos11 Oct 2023 15:34 UTC
6 points
0 comments1 min readLW link

At­tribut­ing to in­ter­ac­tions with GCPD and GWPD

jenny11 Oct 2023 15:06 UTC
20 points
0 comments6 min readLW link

You’re Mea­sur­ing Model Com­plex­ity Wrong

11 Oct 2023 11:46 UTC
82 points
15 comments13 min readLW link

Up­date on the UK AI Task­force & up­com­ing AI Safety Summit

Elliot_Mckernon11 Oct 2023 11:37 UTC
82 points
2 comments4 min readLW link

An ex­pla­na­tion for ev­ery to­ken: us­ing an LLM to sam­ple an­other LLM

Max H11 Oct 2023 0:53 UTC
34 points
4 comments11 min readLW link

[Question] Ex­am­ples of Low Sta­tus Fun

niplav10 Oct 2023 23:19 UTC
18 points
17 comments1 min readLW link

A New Model for Com­pute Cen­ter Verification

Damin Curtis10 Oct 2023 19:22 UTC
8 points
0 comments5 min readLW link

An­nounc­ing MIRI’s new CEO and lead­er­ship team

Gretta Duleba10 Oct 2023 19:22 UTC
220 points
52 comments3 min readLW link

18 Hetero­dox lenses to look the world through

Shaurya Gupta10 Oct 2023 18:33 UTC
−1 points
2 comments5 min readLW link

Doc­u­ment­ing Jour­ney Into AI Safety

jacobhaimes10 Oct 2023 18:30 UTC
17 points
4 comments6 min readLW link

Look­ing for AI Art Col­lab­o­ra­tors!

beatrice@foresight.org10 Oct 2023 18:24 UTC
1 point
0 comments1 min readLW link

Child­hood Roundup #3

Zvi10 Oct 2023 14:30 UTC
48 points
3 comments30 min readLW link
(thezvi.wordpress.com)

My sim­ple model for Align­ment vs Capability

ryan_b10 Oct 2023 12:07 UTC
7 points
0 comments7 min readLW link

Next year in Jerusalem: The brilli­ant ideas and ra­di­ant legacy of Miriam Lip­schutz Ye­vick [in re­la­tion to cur­rent AI de­bates]

Bill Benzon10 Oct 2023 9:06 UTC
1 point
0 comments1 min readLW link
(3quarksdaily.com)

I’m a Former Is­raeli Officer. AMA

Yovel Rom10 Oct 2023 8:33 UTC
78 points
69 comments1 min readLW link

Be­come a PIBBSS Re­search Affiliate

10 Oct 2023 7:41 UTC
24 points
6 comments6 min readLW link

My 1st month at a “neu­ro­di­ver­gent gifted school” called Min­erva University

exanova10 Oct 2023 3:34 UTC
4 points
1 comment1 min readLW link
(inawe.substack.com)

Epistemic Mo­tif of Ab­stract-Con­crete Cy­cles & Do­main Expansion

Dalcy10 Oct 2023 3:28 UTC
23 points
2 comments3 min readLW link

Sim­ple Ter­mi­nal Colors

jefftk10 Oct 2023 0:40 UTC
11 points
1 comment1 min readLW link
(www.jefftk.com)

The Hand­book of Ra­tion­al­ity (2021, MIT press) is now open access

romeostevensit10 Oct 2023 0:30 UTC
48 points
4 comments1 min readLW link

Non-su­per­in­tel­li­gent pa­per­clip max­i­miz­ers are normal

jessicata10 Oct 2023 0:29 UTC
66 points
4 comments9 min readLW link
(unstableontology.com)

The Witch­ing Hour

Richard_Ngo10 Oct 2023 0:19 UTC
110 points
0 comments10 min readLW link
(www.narrativeark.xyz)

One: a story

Richard_Ngo10 Oct 2023 0:18 UTC
29 points
0 comments4 min readLW link
(www.narrativeark.xyz)

Truth­seek­ing when your dis­agree­ments lie in moral philosophy

10 Oct 2023 0:00 UTC
98 points
4 comments4 min readLW link
(acesounderglass.com)

NYT on the Man­i­fest fore­cast­ing conference

Austin Chen9 Oct 2023 21:40 UTC
45 points
14 comments1 min readLW link
(www.nytimes.com)

Fore­cast­ing and pre­dic­tion markets

CarlJ9 Oct 2023 20:43 UTC
3 points
0 comments1 min readLW link

Com­par­ing Two Fore­cast­ers in an Ideal World

nikos9 Oct 2023 19:52 UTC
5 points
0 comments6 min readLW link

The case for af­ter­mar­ket blind spot mirrors

Brendan Long9 Oct 2023 19:30 UTC
57 points
14 comments2 min readLW link
(www.brendanlong.com)

New con­trac­tor role: Web se­cu­rity task force con­trac­tor for AI safety announcements

9 Oct 2023 18:36 UTC
11 points
0 comments2 min readLW link
(survivalandflourishing.com)

[Question] Any­one work­ing on D. Amodei’s Bartlett show tran­script?

Leopard9 Oct 2023 18:17 UTC
10 points
0 comments1 min readLW link

AGI Align­ment is iso­mor­phic to Un­con­di­tional Love

Raghuvar Nadig9 Oct 2023 15:58 UTC
−11 points
0 comments11 min readLW link

Knowl­edge Base 3: Shop­ping ad­vi­sor and other uses of knowl­edge base about products

iwis9 Oct 2023 11:53 UTC
0 points
0 comments4 min readLW link

Knowl­edge Base 2: The struc­ture and the method of building

iwis9 Oct 2023 11:53 UTC
2 points
4 comments8 min readLW link

We don’t un­der­stand what hap­pened with cul­ture enough

Jan_Kulveit9 Oct 2023 9:54 UTC
86 points
21 comments6 min readLW link