Gell-Mann checks

CroissanthologySep 26, 2024, 10:45 PM
23 points
7 comments3 min readLW link

[Question] Do­ing Noth­ing Utility Function

k64Sep 26, 2024, 10:05 PM
9 points
9 comments1 min readLW link

Stanis­lav Petrov Quar­terly Perfor­mance Review

Ricki HeicklenSep 26, 2024, 9:20 PM
147 points
3 comments5 min readLW link
(bayesshammai.substack.com)

Self lo­ca­tion for LLMs by LLMs: Self-Assess­ment Check­list.

CanalettoSep 26, 2024, 7:57 PM
11 points
0 comments5 min readLW link

Four Levels of Vot­ing Methods

hiveSep 26, 2024, 6:15 PM
17 points
3 comments9 min readLW link
(hiveism.substack.com)

Char­ac­ter­iz­ing sta­ble re­gions in the resi­d­ual stream of LLMs

Sep 26, 2024, 1:44 PM
42 points
4 comments1 min readLW link
(arxiv.org)

Chevy Bolt Review

jefftkSep 26, 2024, 1:40 PM
13 points
2 comments1 min readLW link
(www.jefftk.com)

AI #83: The Mask Comes Off

ZviSep 26, 2024, 12:00 PM
82 points
20 comments36 min readLW link
(thezvi.wordpress.com)

The Ex­is­ten­tial Dread of Be­ing a Pow­er­ful AI System

testingthewatersSep 26, 2024, 10:56 AM
6 points
1 comment2 min readLW link

[Question] What pre­vents SB-1047 from trig­ger­ing on deep fake porn/​voice clon­ing fraud?

ChristianKlSep 26, 2024, 9:17 AM
27 points
21 comments1 min readLW link

[Com­pleted] The 2024 Petrov Day Scenario

Sep 26, 2024, 8:08 AM
136 points
114 comments5 min readLW link

Source Con­trol for Pro­to­typ­ing and Analysis

jefftkSep 26, 2024, 1:50 AM
12 points
0 comments1 min readLW link
(www.jefftk.com)

[Linkpost] Play with SAEs on Llama 3

Sep 25, 2024, 10:35 PM
40 points
2 comments1 min readLW link

Mira Mu­rati leaves OpenAI/​ OpenAI to re­move non-profit control

SodiumSep 25, 2024, 9:15 PM
58 points
4 comments2 min readLW link

Com­par­ing Fore­cast­ing Track Records for AI Bench­mark­ing and Beyond

ChristianWilliamsSep 25, 2024, 9:01 PM
11 points
0 commentsLW link
(www.metaculus.com)

Ex­tend­ing the Off-Switch Game: Toward a Ro­bust Frame­work for AI Corrigibility

OwenChenSep 25, 2024, 8:38 PM
3 points
0 comments4 min readLW link

Eval­u­at­ing Syn­thetic Ac­ti­va­tions com­posed of SAE La­tents in GPT-2

Sep 25, 2024, 8:37 PM
29 points
0 comments3 min readLW link
(arxiv.org)

Cli­mate Change And Global Warming

Zero ContradictionsSep 25, 2024, 7:13 PM
−7 points
0 comments1 min readLW link
(zerocontradictions.net)

How to pre­vent col­lu­sion when us­ing un­trusted mod­els to mon­i­tor each other

BuckSep 25, 2024, 6:58 PM
89 points
11 comments22 min readLW link

Align­ment by de­fault: the simu­la­tion hypothesis

gbSep 25, 2024, 4:26 PM
21 points
40 comments1 min readLW link

A Dialogue on De­cep­tive Align­ment Risks

Rauno ArikeSep 25, 2024, 4:10 PM
11 points
0 comments18 min readLW link

[Paper] Hid­den in Plain Text: Emer­gence and Miti­ga­tion of Stegano­graphic Col­lu­sion in LLMs

Sep 25, 2024, 2:52 PM
37 points
2 comments4 min readLW link
(arxiv.org)

AIS Hun­gary Oper­a­tions Officer role, Dead­line: 2024 Oc­to­ber 6th

gergogasparSep 25, 2024, 1:54 PM
1 point
0 comments1 min readLW link

[In­tu­itive self-mod­els] 2. Con­scious Awareness

Steven ByrnesSep 25, 2024, 1:29 PM
82 points
61 comments16 min readLW link

Book Re­view: On the Edge: The Business

ZviSep 25, 2024, 12:20 PM
38 points
0 comments36 min readLW link
(thezvi.wordpress.com)

Join the $10K Au­toHack 2024 Tournament

Paul BricmanSep 25, 2024, 11:54 AM
5 points
0 comments1 min readLW link
(noemaresearch.com)

[Paper] A is for Ab­sorp­tion: Study­ing Fea­ture Split­ting and Ab­sorp­tion in Sparse Autoencoders

Sep 25, 2024, 9:31 AM
73 points
16 comments3 min readLW link
(arxiv.org)

[Question] Non-hu­man cen­tric view of existence

ZYSep 25, 2024, 5:47 AM
−3 points
14 comments1 min readLW link

How to Live Well: My Philos­o­phy of Life

Philosofer123Sep 25, 2024, 1:13 AM
−8 points
0 comments1 min readLW link
(philosofer123.wordpress.com)

An open re­sponse to Wit­tkot­ter and Yampolskiy

Donald HobsonSep 24, 2024, 10:27 PM
8 points
0 comments4 min readLW link

A Path out of In­suffi­cient Views

UnrealSep 24, 2024, 8:00 PM
44 points
65 comments9 min readLW link

How to give effec­tively to US Dems

Hauke HillebrandtSep 24, 2024, 2:38 PM
2 points
0 commentsLW link
(www.slowboring.com)

[Question] How do you fol­low AI (safety) news?

PeterHSep 24, 2024, 1:58 PM
4 points
2 comments1 min readLW link

In­struc­tion Fol­low­ing with­out In­struc­tion Tuning

Bogdan Ionut CirsteaSep 24, 2024, 1:49 PM
17 points
0 comments1 min readLW link
(arxiv.org)

Book Re­view: On the Edge: The Gamblers

ZviSep 24, 2024, 11:50 AM
35 points
1 comment89 min readLW link
(thezvi.wordpress.com)

Edit­ing at the Take Level

jefftkSep 24, 2024, 11:30 AM
12 points
1 comment1 min readLW link
(www.jefftk.com)

Us­ing LLM’s for AI Foun­da­tion re­search and the Sim­ple Solu­tion assumption

Donald HobsonSep 24, 2024, 11:00 AM
5 points
0 comments2 min readLW link

When to join a re­spectabil­ity cascade

B JacobsSep 24, 2024, 7:54 AM
10 points
1 comment2 min readLW link
(bobjacobs.substack.com)

Sam­pling Effects on Strate­gic Be­hav­ior in Su­per­vised Learn­ing Models

Phil BlandSep 24, 2024, 7:44 AM
1 point
0 comments6 min readLW link

In Praise of the Beatitudes

robotelvisSep 24, 2024, 5:08 AM
9 points
7 comments3 min readLW link
(messyprogress.substack.com)

[Question] What are the best ar­gu­ments for/​against AIs be­ing “slightly ‘nice’”?

RaemonSep 24, 2024, 2:00 AM
99 points
61 comments31 min readLW link

Strug­gling like a Shadowmoth

RaemonSep 24, 2024, 12:47 AM
184 points
38 comments7 min readLW link

Bounty for Ev­i­dence on Some of Pal­isade Re­search’s Beliefs

Sep 23, 2024, 8:01 PM
46 points
4 comments2 min readLW link

Pre­dict­ing In­fluenza Abun­dance in Wastew­a­ter Me­tage­nomic Se­quenc­ing Data

jefftkSep 23, 2024, 5:25 PM
27 points
0 comments4 min readLW link
(naobservatory.org)

A primer on ML in an­ti­body engineering

Abhishaike MahajanSep 23, 2024, 5:03 PM
11 points
0 comments25 min readLW link
(www.owlposting.com)

[Question] On the sub­ject of in-house large lan­guage mod­els ver­sus im­ple­ment­ing fron­tier models

AnnapurnaSep 23, 2024, 3:00 PM
7 points
1 comment1 min readLW link

A ba­sic sys­tems ar­chi­tec­ture for AI agents that do au­tonomous research

BuckSep 23, 2024, 1:58 PM
189 points
16 comments8 min readLW link

Book Re­view: On the Edge: The Fundamentals

ZviSep 23, 2024, 1:40 PM
64 points
3 comments31 min readLW link
(thezvi.wordpress.com)

Switch­ing to a 4GB SD

jefftkSep 23, 2024, 11:20 AM
11 points
1 comment1 min readLW link
(www.jefftk.com)

Model evals for dan­ger­ous capabilities

Zach Stein-PerlmanSep 23, 2024, 11:00 AM
51 points
11 comments3 min readLW link