Com­par­ing rep­re­sen­ta­tion vec­tors be­tween llama 2 base and chat

Nina PanicksseryOct 28, 2023, 10:54 PM
36 points
5 comments2 min readLW link

Vaniver’s thoughts on An­thropic’s RSP

VaniverOct 28, 2023, 9:06 PM
46 points
4 comments3 min readLW link

Book Re­view: Oral­ity and Liter­acy: The Tech­nol­o­giz­ing of the Word

Fergus FettesOct 28, 2023, 8:12 PM
13 points
0 comments16 min readLW link

Re­grant up to $600,000 to AI safety pro­jects with GiveWiki

Dawn DrescherOct 28, 2023, 7:56 PM
33 points
1 commentLW link

Shane Legg in­ter­view on alignment

Seth HerdOct 28, 2023, 7:28 PM
66 points
20 comments2 min readLW link
(www.youtube.com)

AI Ex­is­ten­tial Safety Fellowships

mmfliOct 28, 2023, 6:07 PM
5 points
0 comments1 min readLW link

AI Safety Hub Ser­bia Offi­cial Opening

Oct 28, 2023, 5:03 PM
55 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Manag­ing AI Risks in an Era of Rapid Progress

AlgonOct 28, 2023, 3:48 PM
36 points
5 comments11 min readLW link
(managing-ai-risks.com)

[Question] ELI5 Why isn’t al­ign­ment *eas­ier* as mod­els get stronger?

Logan ZoellnerOct 28, 2023, 2:34 PM
3 points
9 comments1 min readLW link

Truth­seek­ing, EA, Si­mu­lacra lev­els, and other stuff

Oct 27, 2023, 11:56 PM
45 points
12 comments9 min readLW link

[Question] Do you be­lieve “E=mc^2” is a cor­rect and/​or use­ful equa­tion, and, whether yes or no, pre­cisely what are your rea­sons for hold­ing this be­lief (with such a de­gree of con­fi­dence)?

l8cOct 27, 2023, 10:46 PM
10 points
14 comments1 min readLW link

Value sys­tem­ati­za­tion: how val­ues be­come co­her­ent (and mis­al­igned)

Richard_NgoOct 27, 2023, 7:06 PM
103 points
49 comments13 min readLW link

Techno-hu­man­ism is techno-op­ti­mism for the 21st century

Richard_NgoOct 27, 2023, 6:37 PM
88 points
5 comments14 min readLW link
(www.mindthefuture.info)

Sanc­tu­ary for Humans

Nikola JurkovicOct 27, 2023, 6:08 PM
22 points
9 comments1 min readLW link

Wire­head­ing and mis­al­ign­ment by com­po­si­tion on NetHack

pierlucadoroOct 27, 2023, 5:43 PM
34 points
4 comments4 min readLW link

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofskyOct 27, 2023, 3:19 PM
200 points
33 comments8 min readLW link

Aspira­tion-based Q-Learning

Oct 27, 2023, 2:42 PM
38 points
5 comments11 min readLW link

Linkpost: Rishi Su­nak’s Speech on AI (26th Oc­to­ber)

bideupOct 27, 2023, 11:57 AM
85 points
8 comments7 min readLW link
(www.gov.uk)

ASPR & WARP: Ra­tion­al­ity Camps for Teens in Taiwan and Oxford

Anna GajdovaOct 27, 2023, 8:40 AM
18 points
0 comments1 min readLW link

[Question] To what ex­tent is the UK Govern­ment’s re­cent AI Safety push en­tirely due to Rishi Su­nak?

Stephen FowlerOct 27, 2023, 3:29 AM
23 points
4 comments1 min readLW link

Bayesian Punishment

Rob LucasOct 27, 2023, 3:24 AM
1 point
1 comment6 min readLW link

On­line Dialogues Party — Sun­day 5th November

Ben PaceOct 27, 2023, 2:41 AM
28 points
1 comment1 min readLW link

OpenAI’s new Pre­pared­ness team is hiring

leopoldOct 26, 2023, 8:42 PM
60 points
2 comments1 min readLW link

Fake Deeply

Zack_M_DavisOct 26, 2023, 7:55 PM
33 points
7 comments1 min readLW link
(unremediatedgender.space)

Sym­bol/​Refer­ent Con­fu­sions in Lan­guage Model Align­ment Experiments

johnswentworthOct 26, 2023, 7:49 PM
116 points
50 comments6 min readLW link1 review

Un­su­per­vised Meth­ods for Con­cept Dis­cov­ery in AlphaZero

aogOct 26, 2023, 7:05 PM
9 points
0 comments1 min readLW link
(arxiv.org)

[Question] Non­lin­ear limi­ta­tions of ReLUs

magfrumpOct 26, 2023, 6:51 PM
13 points
1 comment1 min readLW link

AI Align­ment Prob­lem: Re­quire­ment not op­tional (A Crit­i­cal Anal­y­sis through Mass Effect Tril­ogy)

TAWSIF AHMEDOct 26, 2023, 6:02 PM
−9 points
0 comments4 min readLW link

[Thought Ex­per­i­ment] To­mor­row’s Echo—The fu­ture of syn­thetic com­pan­ion­ship.

Vimal NaranOct 26, 2023, 5:54 PM
−7 points
2 comments2 min readLW link

Disagree­ments over the pri­ori­ti­za­tion of ex­is­ten­tial risk from AI

Olivier CoutuOct 26, 2023, 5:54 PM
10 points
0 comments6 min readLW link

[Question] What if AGI had its own uni­verse to maybe wreck?

msealeOct 26, 2023, 5:49 PM
−1 points
2 comments1 min readLW link

Chang­ing Con­tra Dialects

jefftkOct 26, 2023, 5:30 PM
25 points
2 comments1 min readLW link
(www.jefftk.com)

5 psy­cholog­i­cal rea­sons for dis­miss­ing x-risks from AGI

Igor IvanovOct 26, 2023, 5:21 PM
24 points
6 comments4 min readLW link

5. Risks from pre­vent­ing le­gi­t­i­mate value change (value col­lapse)

Nora_AmmannOct 26, 2023, 2:38 PM
13 points
1 comment9 min readLW link

4. Risks from caus­ing ille­gi­t­i­mate value change (perfor­ma­tive pre­dic­tors)

Nora_AmmannOct 26, 2023, 2:38 PM
8 points
3 comments5 min readLW link

3. Premise three & Con­clu­sion: AI sys­tems can af­fect value change tra­jec­to­ries & the Value Change Problem

Nora_AmmannOct 26, 2023, 2:38 PM
28 points
4 comments4 min readLW link

2. Premise two: Some cases of value change are (il)legitimate

Nora_AmmannOct 26, 2023, 2:36 PM
24 points
7 comments6 min readLW link

1. Premise one: Values are malleable

Nora_AmmannOct 26, 2023, 2:36 PM
21 points
1 comment15 min readLW link

0. The Value Change Prob­lem: in­tro­duc­tion, overview and motivations

Nora_AmmannOct 26, 2023, 2:36 PM
32 points
0 comments5 min readLW link

EPUBs of MIRI Blog Archives and se­lected LW Sequences

mesaoptimizerOct 26, 2023, 2:17 PM
44 points
5 comments1 min readLW link
(git.sr.ht)

UK Govern­ment pub­lishes “Fron­tier AI: ca­pa­bil­ities and risks” Dis­cus­sion Paper

A.H.Oct 26, 2023, 1:55 PM
5 points
0 comments2 min readLW link
(www.gov.uk)

AI #35: Re­spon­si­ble Scal­ing Policies

ZviOct 26, 2023, 1:30 PM
66 points
10 comments55 min readLW link
(thezvi.wordpress.com)

RA Bounty: Look­ing for feed­back on screen­play about AI Risk

WriterOct 26, 2023, 1:23 PM
32 points
6 comments1 min readLW link

Sen­sor Ex­po­sure can Com­pro­mise the Hu­man Brain in the 2020s

trevorOct 26, 2023, 3:31 AM
17 points
6 comments10 min readLW link

Notes on “How do we be­come con­fi­dent in the safety of a ma­chine learn­ing sys­tem?”

RohanSOct 26, 2023, 3:13 AM
4 points
0 comments13 min readLW link

Ap­ply to the Con­stel­la­tion Visit­ing Re­searcher Pro­gram and As­tra Fel­low­ship, in Berkeley this Winter

Nate ThomasOct 26, 2023, 3:07 AM
42 points
10 comments1 min readLW link

CHAI in­tern­ship ap­pli­ca­tions are open (due Nov 13)

Erik JennerOct 26, 2023, 12:53 AM
34 points
0 comments3 min readLW link

Ar­chi­tects of Our Own Demise: We Should Stop Devel­op­ing AI Carelessly

RokoOct 26, 2023, 12:36 AM
170 points
75 comments3 min readLW link

EA In­fras­truc­ture Fund: June 2023 grant recommendations

LinchOct 26, 2023, 12:35 AM
21 points
0 commentsLW link

Re­spon­si­ble Scal­ing Poli­cies Are Risk Man­age­ment Done Wrong

simeon_cOct 25, 2023, 11:46 PM
123 points
35 comments22 min readLW link1 review
(www.navigatingrisks.ai)