[Question] Should we openly talk about ex­plicit use cases for Au­toGPT?

ChristianKlApr 20, 2023, 11:44 PM
20 points
4 comments1 min readLW link

United We Align: Har­ness­ing Col­lec­tive Hu­man In­tel­li­gence for AI Align­ment Progress

Shoshannah TekofskyApr 20, 2023, 11:19 PM
41 points
13 comments25 min readLW link

[Question] Where to start with statis­tics if I want to mea­sure things?

mattoApr 20, 2023, 10:40 PM
21 points
7 comments1 min readLW link

Up­skil­ling, bridge-build­ing, re­search on se­cu­rity/​cryp­tog­ra­phy and AI safety

Allison DuettmannApr 20, 2023, 10:32 PM
14 points
0 comments4 min readLW link

Be­havi­oural statis­tics for a maze-solv­ing agent

Apr 20, 2023, 10:26 PM
46 points
11 comments10 min readLW link

An in­tro­duc­tion to lan­guage model interpretability

Alexandre VariengienApr 20, 2023, 10:22 PM
14 points
0 comments9 min readLW link

The Case for Brain-Only Preservation

Mati_RoyApr 20, 2023, 10:01 PM
21 points
7 comments1 min readLW link
(biostasis.substack.com)

[Question] Prac­ti­cal ways to ac­tu­al­ize our be­liefs into con­crete bets over a longer time hori­zon?

M. Y. ZuoApr 20, 2023, 9:21 PM
4 points
2 comments1 min readLW link

LW mod­er­a­tion: my cur­rent thoughts and ques­tions, 2023-04-12

RubyApr 20, 2023, 9:02 PM
53 points
30 comments10 min readLW link

Pro­posal: Us­ing Monte Carlo tree search in­stead of RLHF for al­ign­ment research

Christopher KingApr 20, 2023, 7:57 PM
2 points
7 comments3 min readLW link

Deep­Mind and Google Brain are merg­ing [Linkpost]

Orpheus16Apr 20, 2023, 6:47 PM
55 points
5 comments1 min readLW link
(www.deepmind.com)

Ideas for stud­ies on AGI risk

dr_sApr 20, 2023, 6:17 PM
5 points
1 comment11 min readLW link

Study 1b: This One Weird Trick does NOT cause in­cor­rect­ness cascades

Robert_AIZIApr 20, 2023, 6:10 PM
5 points
0 comments6 min readLW link
(aizi.substack.com)

An open let­ter to SERI MATS pro­gram organisers

Roman LeventovApr 20, 2023, 4:34 PM
26 points
26 comments4 min readLW link

De­cep­tion Strategies

Thoth HermesApr 20, 2023, 3:59 PM
−7 points
2 comments5 min readLW link
(thothhermes.substack.com)

Paper­clip Club (AI Safety Meetup)

LThorburnApr 20, 2023, 3:55 PM
1 point
0 comments1 min readLW link

AI #8: Peo­ple Can Do Rea­son­able Things

ZviApr 20, 2023, 3:50 PM
100 points
16 comments55 min readLW link
(thezvi.wordpress.com)

OpenAI could help X-risk by wa­ger­ing itself

VojtaKovarikApr 20, 2023, 2:51 PM
31 points
16 comments1 min readLW link

Ja­pan AI Align­ment Con­fer­ence Postmortem

Apr 20, 2023, 10:58 AM
71 points
8 comments8 min readLW link

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

OzyrusApr 20, 2023, 6:04 AM
11 points
3 comments1 min readLW link
(github.com)

The Quan­tum Wave Func­tion is Re­lated to a Philos­o­phy Concept

Richard AragonApr 20, 2023, 3:16 AM
−11 points
3 comments6 min readLW link

A poem writ­ten by a fancy autocomplete

Christopher KingApr 20, 2023, 2:31 AM
1 point
0 comments1 min readLW link

List of com­monly used bench­marks for LLMs

DizietApr 20, 2023, 2:25 AM
8 points
0 comments1 min readLW link

A test of your ra­tio­nal­ity skills

Max HApr 20, 2023, 1:19 AM
11 points
11 comments4 min readLW link

Lan­guage Models are a Po­ten­tially Safe Path to Hu­man-Level AGI

Nadav BrandesApr 20, 2023, 12:40 AM
28 points
7 comments8 min readLW link1 review

Alien Axiology

snerxApr 20, 2023, 12:27 AM
3 points
2 comments5 min readLW link

Re­spon­si­ble De­ploy­ment in 20XX

CarsonApr 20, 2023, 12:24 AM
4 points
0 comments4 min readLW link

[Question] How do I get all re­cent less­wrong posts that doesn’t have AI tag?

Duck DuckApr 19, 2023, 11:39 PM
5 points
2 comments1 min readLW link

Stop try­ing to have “in­ter­est­ing” friends

eqApr 19, 2023, 11:39 PM
42 points
15 comments6 min readLW link

[Question] Is there any liter­a­ture on us­ing so­cial­iza­tion for AI al­ign­ment?

Nathan1123Apr 19, 2023, 10:16 PM
10 points
9 comments2 min readLW link

I Believe I Know Why AI Models Hallucinate

Richard AragonApr 19, 2023, 9:07 PM
−10 points
6 comments7 min readLW link
(turingssolutions.com)

What if we Align the AI and no­body cares?

Logan ZoellnerApr 19, 2023, 8:40 PM
−5 points
23 comments2 min readLW link

Orthog­o­nal: A new agent foun­da­tions al­ign­ment organization

Tamsin LeakeApr 19, 2023, 8:17 PM
217 points
4 comments1 min readLW link
(orxl.org)

How to ex­press this sys­tem for eth­i­cally al­igned AGI as a Math­e­mat­i­cal for­mula?

Oliver SiegelApr 19, 2023, 8:13 PM
−1 points
0 comments1 min readLW link

How could you pos­si­bly choose what an AI wants?

So8resApr 19, 2023, 5:08 PM
108 points
19 comments1 min readLW link

[Question] Does ob­ject per­ma­nence of simu­lacrum af­fect LLMs’ rea­son­ing?

ProgramCrafterApr 19, 2023, 4:28 PM
1 point
1 comment1 min readLW link

Davi­dad’s Bold Plan for Align­ment: An In-Depth Explanation

Apr 19, 2023, 4:09 PM
168 points
40 comments21 min readLW link2 reviews

GWWC Re­port­ing At­tri­tion Visualization

jefftkApr 19, 2023, 3:40 PM
16 points
0 comments1 min readLW link
(www.jefftk.com)

Keep hu­mans in the loop

Apr 19, 2023, 3:34 PM
23 points
1 comment10 min readLW link

Ap­prox­i­ma­tion is ex­pen­sive, but the lunch is cheap

Apr 19, 2023, 2:19 PM
70 points
3 comments16 min readLW link

Le­gi­t­imis­ing AI Red-Team­ing by Public

VojtaKovarikApr 19, 2023, 2:05 PM
10 points
7 comments3 min readLW link

More on Twit­ter and Algorithms

ZviApr 19, 2023, 12:40 PM
37 points
7 comments13 min readLW link
(thezvi.wordpress.com)

[Cross­post] Or­ga­niz­ing a de­bate with ex­perts and MPs to raise AI xrisk aware­ness: a pos­si­ble blueprint

otto.bartenApr 19, 2023, 11:45 AM
8 points
0 comments4 min readLW link
(forum.effectivealtruism.org)

The key to un­der­stand­ing the ul­ti­mate na­ture of re­al­ity is: Time. The key to un­der­stand­ing Time is: Evolu­tion.

Dr_WhatApr 19, 2023, 10:05 AM
−10 points
0 comments3 min readLW link

Open Brains

George3d6Apr 19, 2023, 7:35 AM
7 points
0 comments6 min readLW link
(cerebralab.com)

The Learn­ing-The­o­retic Agenda: Sta­tus 2023

Vanessa KosoyApr 19, 2023, 5:21 AM
144 points
21 comments56 min readLW link3 reviews

Pay­ing the cor­rigi­bil­ity tax

Max HApr 19, 2023, 1:57 AM
14 points
1 comment13 min readLW link

Notes on Teach­ing in Prison

jsdApr 19, 2023, 1:53 AM
290 points
13 comments12 min readLW link

Con­scious­ness as re­cur­rence, po­ten­tial for en­forc­ing al­ign­ment?

FoyleApr 18, 2023, 11:05 PM
−2 points
6 comments1 min readLW link

En­courag­ing New Users To Bet On Their Beliefs

YafahEdelmanApr 18, 2023, 10:10 PM
49 points
8 comments2 min readLW link