Not all bi­ases are equal—a study of syco­phancy and bias in fine-tuned LLMs

jakub_krysNov 11, 2024, 11:11 PM
8 points
0 comments7 min readLW link

AI Craftsmanship

abramdemskiNov 11, 2024, 10:17 PM
66 points
7 comments4 min readLW link

Elec­tric Grid Cy­ber­at­tack: An AI-In­formed Threat Model

moonlightmazeNov 11, 2024, 9:34 PM
22 points
0 comments29 min readLW link

o1 is a bad idea

abramdemskiNov 11, 2024, 9:20 PM
161 points
39 comments2 min readLW link

In­fer­en­tial Game: The For­ag­ing (Ex-)Bandit

abstractapplicNov 11, 2024, 4:59 PM
27 points
4 comments1 min readLW link

The Evals Gap

Marius HobbhahnNov 11, 2024, 4:42 PM
55 points
7 comments7 min readLW link
(www.apolloresearch.ai)

Sum­mary: “Imag­in­ing and build­ing wise ma­chines: The cen­tral­ity of AI metacog­ni­tion” by John­son, Karimi, Ben­gio, et al.

Chris_LeongNov 11, 2024, 4:13 PM
29 points
8 comments8 min readLW link
(arxiv.org)

The On­line Sports Gam­bling Ex­per­i­ment Has Failed

ZviNov 11, 2024, 2:30 PM
285 points
59 comments11 min readLW link
(thezvi.wordpress.com)

How I Learned That You Should Push Chil­dren Into Ponds

omnizoidNov 11, 2024, 2:20 PM
−3 points
3 comments4 min readLW link

The new rul­ing philos­o­phy re­gard­ing AI

Mitchell_PorterNov 11, 2024, 1:28 PM
29 points
0 comments5 min readLW link

What Ke­tamine Ther­apy Is Like

SableNov 11, 2024, 11:09 AM
47 points
8 comments6 min readLW link
(affablyevil.substack.com)

Spher­i­cal cow

dkl9Nov 11, 2024, 3:10 AM
7 points
0 comments1 min readLW link
(dkl9.net)

[Question] how to truly feel my be­liefs?

KvmanThinkingNov 11, 2024, 12:04 AM
6 points
6 comments1 min readLW link

Bay Win­ter Sols­tice 2024: song lead­ing auditions

tcheasdfjklNov 10, 2024, 11:59 PM
28 points
0 comments1 min readLW link

[Question] A Co­or­di­na­tion Cook­book?

azerganteNov 10, 2024, 11:20 PM
2 points
0 comments1 min readLW link

Towards a Clever Hans Test: Un­mask­ing Sen­tience Bi­ases in Chat­bot In­ter­ac­tions

glykokalyxNov 10, 2024, 10:34 PM
4 points
0 comments1 min readLW link

Ur­bit New England Meetup

Conquerer CohenNov 10, 2024, 5:56 PM
−4 points
0 comments1 min readLW link

Per­sonal AI Planning

jefftkNov 10, 2024, 2:00 PM
68 points
11 comments2 min readLW link
(www.jefftk.com)

AI al­ign­ment via civ­i­liza­tional cog­ni­tive updates

AtillaYasarNov 10, 2024, 9:33 AM
1 point
10 comments6 min readLW link

[Question] How should ve­g­ans think about Methio­nine needs?

ChristianKlNov 10, 2024, 9:28 AM
32 points
3 comments1 min readLW link

Is P(Doom) Mean­ingful? Bayesian vs. Pop­pe­rian Episte­mol­ogy Debate

LironNov 9, 2024, 11:39 PM
5 points
0 comments124 min readLW link
(www.youtube.com)

Bel­le­vue Library Meetup—Nov 23

CedarNov 9, 2024, 11:05 PM
5 points
3 comments1 min readLW link

LifeKeeper Diaries: Ex­plor­ing Misal­igned AI Through In­ter­ac­tive Fiction

Nov 9, 2024, 8:58 PM
15 points
5 comments2 min readLW link

[Question] Poll: what’s your im­pres­sion of al­tru­ism?

David GrossNov 9, 2024, 8:28 PM
2 points
4 comments1 min readLW link

Chaos The­ory in Ecology

ElizabethNov 9, 2024, 5:50 PM
15 points
4 comments20 min readLW link
(acesounderglass.com)

Some Com­ments on Re­cent AI Safety Developments

testingthewatersNov 9, 2024, 4:44 PM
4 points
0 comments8 min readLW link

For­mal­ize the Hash­iness Model of AGI Un­con­tain­abil­ity

RemmeltNov 9, 2024, 4:10 PM
3 points
0 commentsLW link
(docs.google.com)

Agenda Manipulation

PazzazNov 9, 2024, 2:13 PM
2 points
0 comments3 min readLW link

Force Se­quen­tial Out­put with SCP?

jefftkNov 9, 2024, 12:40 PM
9 points
4 comments1 min readLW link
(www.jefftk.com)

An­thropic teams up with Palan­tir and AWS to sell AI to defense customers

Matrice JacobineNov 9, 2024, 11:50 AM
9 points
0 comments2 min readLW link
(techcrunch.com)

GPT-4o Can In Some Cases Solve Moder­ately Com­pli­cated Captchas

dirkNov 9, 2024, 4:04 AM
12 points
2 comments1 min readLW link

Stone Age Her­bal­ist’s notes on ant war­fare and slavery

trevorNov 9, 2024, 2:40 AM
32 points
0 comments3 min readLW link
(x.com)

LLMs Look In­creas­ingly Like Gen­eral Reasoners

eggsyntaxNov 8, 2024, 11:47 PM
94 points
45 comments3 min readLW link

ov­ereng­ineered air filter shelving

bhauthNov 8, 2024, 10:04 PM
26 points
2 comments5 min readLW link
(bhauth.com)

Big­ger Livers?

sarahconstantinNov 8, 2024, 9:50 PM
98 points
17 comments6 min readLW link
(sarahconstantin.substack.com)

New UChicago Ra­tion­al­ity Group

Noah BirnbaumNov 8, 2024, 9:20 PM
9 points
0 comments1 min readLW link

Ac­tive Re­call and Spaced Rep­e­ti­tion are Differ­ent Things

Saul MunnNov 8, 2024, 8:14 PM
49 points
2 comments3 min readLW link
(www.brasstacks.blog)

What AI safety re­searchers can learn from Ma­hatma Gandhi

Lysandre TerrisseNov 8, 2024, 7:49 PM
−6 points
0 comments3 min readLW link

The King and the Golem—The Animation

WriterNov 8, 2024, 6:23 PM
70 points
0 comments1 min readLW link

Bor­ing & straight­for­ward trauma explanation

lemonhopeNov 8, 2024, 9:45 AM
24 points
7 comments2 min readLW link

Cur­ricu­lum of Ascension

andrew sauerNov 7, 2024, 11:54 PM
13 points
0 comments18 min readLW link

An­a­lyz­ing how SAE fea­tures evolve across a for­ward pass

Nov 7, 2024, 10:07 PM
47 points
0 comments1 min readLW link
(arxiv.org)

Mar­kets Are In­for­ma­tion—Beat­ing the Sports­books at Their Own Game

JJXWNov 7, 2024, 8:58 PM
9 points
1 comment2 min readLW link
(thehobbyist.substack.com)

Sig­nal­ing with Small Orange Diamonds

jefftkNov 7, 2024, 8:20 PM
40 points
1 comment1 min readLW link
(www.jefftk.com)

Fun­da­men­tal Uncer­tainty: Chap­ter 9 - How do we live with un­cer­tainty?

Gordon Seidoh WorleyNov 7, 2024, 6:15 PM
11 points
2 comments15 min readLW link

AI #89: Trump Card

ZviNov 7, 2024, 4:30 PM
42 points
12 comments42 min readLW link
(thezvi.wordpress.com)

Quan­tum Im­mor­tal­ity: A Per­spec­tive if AI Doomers are Prob­a­bly Right

Nov 7, 2024, 4:06 PM
11 points
55 comments14 min readLW link

On Tar­geted Ma­nipu­la­tion and De­cep­tion when Op­ti­miz­ing LLMs for User Feedback

Nov 7, 2024, 3:39 PM
51 points
7 comments11 min readLW link

In the Name of All That Needs Saving

pleiotrothNov 7, 2024, 3:26 PM
18 points
3 comments22 min readLW link

Agency over­hang as a proxy for Sharp left turn

Nov 7, 2024, 12:14 PM
6 points
0 comments5 min readLW link