A dis­cus­sion of nor­ma­tive ethics

9 Jan 2024 23:29 UTC
10 points
6 comments25 min readLW link

On the Con­trary, Steel­man­ning Is Nor­mal; ITT-Pass­ing Is Niche

Zack_M_Davis9 Jan 2024 23:12 UTC
39 points
31 comments4 min readLW link

[Question] What’s the pro­to­col for if a novice has ML ideas that are un­likely to work, but might im­prove ca­pa­bil­ities if they do work?

drocta9 Jan 2024 22:51 UTC
6 points
2 comments2 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC
46 points
8 comments36 min readLW link

Bent or Blunt Hoods?

jefftk9 Jan 2024 20:10 UTC
23 points
0 comments1 min readLW link
(www.jefftk.com)

2024 ACX Pre­dic­tions: Blind/​Buy/​Sell/​Hold

Zvi9 Jan 2024 19:30 UTC
33 points
2 comments31 min readLW link
(thezvi.wordpress.com)

An­nounc­ing the Dou­ble Crux Bot

9 Jan 2024 18:54 UTC
44 points
6 comments3 min readLW link

Does AI risk “other” the AIs?

Joe Carlsmith9 Jan 2024 17:51 UTC
59 points
3 comments8 min readLW link

AI de­mands un­prece­dented reliability

Jono9 Jan 2024 16:30 UTC
22 points
5 comments2 min readLW link

Uncer­tainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC
25 points
6 comments35 min readLW link

Com­pen­sat­ing for Life Biases

Jonathan Moregård9 Jan 2024 14:39 UTC
24 points
6 comments3 min readLW link
(honestliving.substack.com)

Can Mo­ral­ity Be Quan­tified?

Julius9 Jan 2024 6:35 UTC
3 points
0 comments5 min readLW link

Learn­ing Math in Time for Alignment

NicholasKross9 Jan 2024 1:02 UTC
32 points
3 comments3 min readLW link

Brief Thoughts on Jus­tifi­ca­tions for Paternalism

Srdjan Miletic9 Jan 2024 0:36 UTC
4 points
0 comments4 min readLW link
(dissent.blog)

Hiring de­ci­sions are not suit­able for pre­dic­tion markets

SimonM8 Jan 2024 21:11 UTC
12 points
6 comments1 min readLW link

Bet­ter Anomia

jefftk8 Jan 2024 18:40 UTC
8 points
0 comments1 min readLW link
(www.jefftk.com)

A starter guide for evals

8 Jan 2024 18:24 UTC
44 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Is it jus­tifi­able for non-ex­perts to have strong opinions about Gaza?

8 Jan 2024 17:31 UTC
23 points
12 comments30 min readLW link

Pro­ject ideas: Backup plans & Co­op­er­a­tive AI

Lukas Finnveden8 Jan 2024 17:19 UTC
18 points
0 comments1 min readLW link
(lukasfinnveden.substack.com)

Hackathon and Stay­ing Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC
11 points
0 comments1 min readLW link
(into-ai-safety.github.io)

When “yang” goes wrong

Joe Carlsmith8 Jan 2024 16:35 UTC
72 points
6 comments13 min readLW link

Task vec­tors & anal­ogy mak­ing in LLMs

Sergii8 Jan 2024 15:17 UTC
8 points
1 comment4 min readLW link
(grgv.xyz)

[Question] How to find trans­la­tions of a book?

Viliam8 Jan 2024 14:57 UTC
9 points
8 comments1 min readLW link

[Question] Why aren’t Yud­kowsky & Bostrom get­ting more at­ten­tion now?

JoshuaFox8 Jan 2024 14:42 UTC
14 points
8 comments1 min readLW link

2023 Pre­dic­tion Evaluations

Zvi8 Jan 2024 14:40 UTC
46 points
0 comments28 min readLW link
(thezvi.wordpress.com)

There is no sharp bound­ary be­tween de­on­tol­ogy and consequentialism

quetzal_rainbow8 Jan 2024 11:01 UTC
8 points
2 comments1 min readLW link

Reflec­tions on my first year of AI safety research

Jay Bailey8 Jan 2024 7:49 UTC
52 points
3 comments1 min readLW link

Why There Is Hope For An Align­ment Solution

Darklight8 Jan 2024 6:58 UTC
9 points
0 comments12 min readLW link

Sled­ding Among Hazards

jefftk8 Jan 2024 3:30 UTC
19 points
5 comments1 min readLW link
(www.jefftk.com)

Utility is relative

CrimsonChin8 Jan 2024 2:31 UTC
2 points
4 comments2 min readLW link

A model of re­search skill

L Rudolf L8 Jan 2024 0:13 UTC
49 points
6 comments12 min readLW link
(www.strataoftheworld.com)

We shouldn’t fear su­per­in­tel­li­gence be­cause it already exists

Spencer Chubb7 Jan 2024 17:59 UTC
−22 points
14 comments1 min readLW link

(Par­tial) failure in repli­cat­ing de­cep­tive al­ign­ment experiment

claudia.biancotti7 Jan 2024 17:56 UTC
1 point
0 comments1 min readLW link

Pro­ject ideas: Sen­tience and rights of digi­tal minds

Lukas Finnveden7 Jan 2024 17:34 UTC
20 points
0 comments1 min readLW link
(lukasfinnveden.substack.com)

Bench­mark Study #4: AI2 Rea­son­ing Challenge (Task(s), MCQ)

Bruce W. Lee7 Jan 2024 17:13 UTC
6 points
0 comments5 min readLW link

De­cep­tive AI ≠ De­cep­tively-al­igned AI

Steven Byrnes7 Jan 2024 16:55 UTC
97 points
19 comments6 min readLW link

Bayesi­ans Com­mit the Gam­bler’s Fallacy

Kevin Dorst7 Jan 2024 12:54 UTC
46 points
28 comments8 min readLW link
(kevindorst.substack.com)

Towards AI Safety In­fras­truc­ture: Talk & Outline

Paul Bricman7 Jan 2024 9:31 UTC
10 points
0 comments2 min readLW link
(www.youtube.com)

Bench­mark Study #3: Hel­laSwag (Task, MCQ)

Bruce W. Lee7 Jan 2024 4:59 UTC
2 points
4 comments6 min readLW link
(arxiv.org)

Defend­ing against hy­po­thet­i­cal moon life dur­ing Apollo 11

eukaryote7 Jan 2024 4:49 UTC
57 points
9 comments32 min readLW link
(eukaryotewritesblog.com)

The Se­quences on YouTube

Neil 7 Jan 2024 1:44 UTC
26 points
9 comments2 min readLW link

AI Risk and the US Pres­i­den­tial Candidates

Zane6 Jan 2024 20:18 UTC
41 points
22 comments6 min readLW link

A Challenge to Effec­tive Altru­ism’s Premises

False Name6 Jan 2024 18:46 UTC
−26 points
3 comments3 min readLW link

Lack of Spi­der-Man is ev­i­dence against the simu­la­tion hypothesis

RamblinDash6 Jan 2024 18:17 UTC
6 points
22 comments1 min readLW link

A Land Tax For Britain

A.H.6 Jan 2024 15:52 UTC
6 points
9 comments4 min readLW link

Book re­view: Trick or treat­ment (2008)

Fleece Minutia6 Jan 2024 15:40 UTC
1 point
0 comments2 min readLW link

Are we in­side a black hole?

Jay6 Jan 2024 13:30 UTC
2 points
5 comments1 min readLW link

Sur­vey of 2,778 AI au­thors: six parts in pictures

KatjaGrace6 Jan 2024 4:43 UTC
80 points
1 comment2 min readLW link

Bench­mark Study #2: Truth­fulQA (Task, MCQ)

Bruce W. Lee6 Jan 2024 2:39 UTC
11 points
2 comments4 min readLW link
(arxiv.org)

Pro­ject ideas: Epistemics

Lukas Finnveden5 Jan 2024 23:41 UTC
41 points
4 comments1 min readLW link
(lukasfinnveden.substack.com)