When did hu­mans be­come self-aware?

Derek M. JonesApr 23, 2023, 10:36 PM
6 points
2 comments1 min readLW link
(vectors.substack.com)

[Question] Are there AI poli­cies that are ro­bustly net-pos­i­tive even when con­sid­er­ing differ­ent AI sce­nar­ios?

Noosphere89Apr 23, 2023, 9:46 PM
11 points
1 comment1 min readLW link

Get­ting Started With Naturalism

LoganStrohlApr 23, 2023, 9:02 PM
69 points
4 comments11 min readLW link1 review

[Question] Why do we care about agency for al­ign­ment?

Chris_LeongApr 23, 2023, 6:10 PM
22 points
19 comments1 min readLW link

Tam­ing the Fire of Intelligence

Peter KuhnApr 23, 2023, 5:41 PM
0 points
7 comments5 min readLW link

Prevent­ing AI Mi­suse: State of the Art Re­search and its Flaws

Madhav MalhotraApr 23, 2023, 5:37 PM
15 points
0 comments11 min readLW link
(forum.effectivealtruism.org)

[Question] Could trans­former net­work mod­els learn mo­tor plan­ning like they can learn lan­guage and image gen­er­a­tion?

mu_(negative)Apr 23, 2023, 5:24 PM
2 points
4 comments1 min readLW link

Could a su­per­in­tel­li­gence de­duce gen­eral rel­a­tivity from a fal­ling ap­ple? An investigation

titotalApr 23, 2023, 12:49 PM
148 points
39 comments9 min readLW link

Endo-, Dia-, Para-, and Ecto-sys­temic novelty

TsviBTApr 23, 2023, 12:25 PM
17 points
3 comments5 min readLW link

An In­tro to An­thropic Rea­son­ing us­ing the ‘Boy or Girl Para­dox’ as a toy example

TobyCApr 23, 2023, 10:20 AM
31 points
28 comments19 min readLW link

[Question] Se­man­tics, Syn­tax and Prag­mat­ics of the Mind?

Ben AmitayApr 23, 2023, 6:13 AM
2 points
0 comments1 min readLW link

A great talk for AI noobs (ac­cord­ing to an AI noob)

dovApr 23, 2023, 5:34 AM
10 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Bits of NEFFA

jefftkApr 23, 2023, 2:20 AM
5 points
0 comments1 min readLW link
(www.jefftk.com)

“Rate limit­ing” as a mod tool

RaemonApr 23, 2023, 12:42 AM
48 points
36 comments4 min readLW link

What should we cen­sor from train­ing data?

wassnameApr 22, 2023, 11:33 PM
16 points
4 comments1 min readLW link

Ar­chi­tec­ture-aware op­ti­mi­sa­tion: train ImageNet and more with­out hyperparameters

Chris MingardApr 22, 2023, 9:50 PM
6 points
2 comments2 min readLW link

OpenAI’s GPT-4 Safety Goals

PeterMcCluskeyApr 22, 2023, 7:11 PM
3 points
3 comments4 min readLW link
(bayesianinvestor.com)

In­tro­duc­ing the Nuts and Bolts Of Naturalism

LoganStrohlApr 22, 2023, 6:31 PM
77 points
2 comments3 min readLW link

We Need To Know About Con­tinual Learning

michael_mjdApr 22, 2023, 5:08 PM
30 points
14 comments4 min readLW link

The Se­cu­rity Mind­set, S-Risk and Pub­lish­ing Pro­saic Align­ment Research

lukemarksApr 22, 2023, 2:36 PM
40 points
7 comments5 min readLW link

[Question] How did LW up­date p(doom) af­ter LLMs blew up?

FinalFormal2Apr 22, 2023, 2:21 PM
24 points
29 comments1 min readLW link

The Cruel Trade-Off Between AI Mi­suse and AI X-risk Concerns

simeon_cApr 22, 2023, 1:49 PM
24 points
1 comment2 min readLW link

five ways to say “Al­most Always” and ac­tu­ally mean it

Yudhister KumarApr 22, 2023, 10:38 AM
17 points
3 comments2 min readLW link
(www.ykumar.org)

P(doom|su­per­in­tel­li­gence) or coin tosses and dice throws of hu­man val­ues (and other re­lated Ps).

MuyydApr 22, 2023, 10:06 AM
−7 points
0 comments4 min readLW link

[Question] Is it al­lowed to post job post­ings here? I am look­ing for a new PhD stu­dent to work on AI In­ter­pretabil­ity. Can I ad­ver­tise my po­si­tion?

TiberiusApr 22, 2023, 1:22 AM
5 points
4 comments1 min readLW link

LessWrong mod­er­a­tion mes­sag­ing container

RaemonApr 22, 2023, 1:19 AM
21 points
13 comments1 min readLW link

Neu­ral net­work poly­topes (Co­lab note­book)

Zach FurmanApr 21, 2023, 10:42 PM
11 points
0 comments1 min readLW link
(colab.research.google.com)

Read­abil­ity is mostly a waste of characters

vlad.proexApr 21, 2023, 10:05 PM
21 points
7 comments3 min readLW link

The Re­la­tion­ship be­tween RLHF and AI Psy­chol­ogy: De­bunk­ing the Shog­goth Argument

FinalFormal2Apr 21, 2023, 10:05 PM
−11 points
8 comments2 min readLW link

Think­ing about max­i­miza­tion and corrigibility

James PayorApr 21, 2023, 9:22 PM
63 points
4 comments5 min readLW link

Would we even want AI to solve all our prob­lems?

So8resApr 21, 2023, 6:04 PM
98 points
15 comments2 min readLW link

The Com­mis­sion for Stop­ping Fur­ther Im­prove­ments: A let­ter of note from Isam­bard K. Brunel

jasoncrawfordApr 21, 2023, 5:42 PM
39 points
0 comments4 min readLW link
(rootsofprogress.org)

Should we pub­lish mechanis­tic in­ter­pretabil­ity re­search?

Apr 21, 2023, 4:19 PM
106 points
40 comments13 min readLW link

500 Million, But Not A Sin­gle One More—The Animation

WriterApr 21, 2023, 3:48 PM
47 points
0 commentsLW link

Talk­ing pub­li­cly about AI risk

Jan_KulveitApr 21, 2023, 11:28 AM
180 points
9 comments6 min readLW link

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

JakubKApr 21, 2023, 10:07 AM
16 points
0 comments5 min readLW link
(sohl-dickstein.github.io)

Req­ui­site Variety

Stephen FowlerApr 21, 2023, 8:07 AM
6 points
0 comments5 min readLW link

The Agency Overhang

Jeffrey LadishApr 21, 2023, 7:47 AM
85 points
6 comments6 min readLW link

[Question] What would “The Med­i­cal Model Is Wrong” look like?

EloApr 21, 2023, 1:46 AM
8 points
7 comments2 min readLW link

Gas and Water

jefftkApr 21, 2023, 1:30 AM
17 points
9 comments1 min readLW link
(www.jefftk.com)

[Question] Did the fonts change?

the gears to ascensionApr 21, 2023, 12:40 AM
2 points
1 comment1 min readLW link

[Question] Should we openly talk about ex­plicit use cases for Au­toGPT?

ChristianKlApr 20, 2023, 11:44 PM
20 points
4 comments1 min readLW link

United We Align: Har­ness­ing Col­lec­tive Hu­man In­tel­li­gence for AI Align­ment Progress

Shoshannah TekofskyApr 20, 2023, 11:19 PM
41 points
13 comments25 min readLW link

[Question] Where to start with statis­tics if I want to mea­sure things?

mattoApr 20, 2023, 10:40 PM
21 points
7 comments1 min readLW link

Up­skil­ling, bridge-build­ing, re­search on se­cu­rity/​cryp­tog­ra­phy and AI safety

Allison DuettmannApr 20, 2023, 10:32 PM
14 points
0 comments4 min readLW link

Be­havi­oural statis­tics for a maze-solv­ing agent

Apr 20, 2023, 10:26 PM
46 points
11 comments10 min readLW link

An in­tro­duc­tion to lan­guage model interpretability

Alexandre VariengienApr 20, 2023, 10:22 PM
14 points
0 comments9 min readLW link

The Case for Brain-Only Preservation

Mati_RoyApr 20, 2023, 10:01 PM
21 points
7 comments1 min readLW link
(biostasis.substack.com)

[Question] Prac­ti­cal ways to ac­tu­al­ize our be­liefs into con­crete bets over a longer time hori­zon?

M. Y. ZuoApr 20, 2023, 9:21 PM
4 points
2 comments1 min readLW link

LW mod­er­a­tion: my cur­rent thoughts and ques­tions, 2023-04-12

RubyApr 20, 2023, 9:02 PM
53 points
30 comments10 min readLW link