RSS

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
8 points
0 comments9 min readLW link

WSJ: Think­ing doesn’t have to feel so hard

trevor27 Apr 2024 10:14 UTC
8 points
0 comments3 min readLW link
(www.wsj.com)

[Question] Plau­si­bil­ity of Get­ting Early Warn­ing Shots be­cause AIs can’t co­or­di­nate?

hmys27 Apr 2024 8:02 UTC
5 points
0 comments1 min readLW link

AI Safety Sphere

Myles H27 Apr 2024 1:49 UTC
−3 points
0 comments3 min readLW link

Ex­plor­ing the Eso­teric Path­ways to AI Sen­tience (Part One)

jeffreycaruso27 Apr 2024 1:02 UTC
−11 points
2 comments2 min readLW link

Su­per­po­si­tion is not “just” neu­ron polysemanticity

LawrenceC26 Apr 2024 23:22 UTC
25 points
0 comments13 min readLW link

D&D.Sci Long War: Defen­der of Data-mocracy

aphyer26 Apr 2024 22:30 UTC
34 points
1 comment3 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

Screwtape26 Apr 2024 21:58 UTC
55 points
3 comments9 min readLW link

We are headed into an ex­treme com­pute overhang

devrandom26 Apr 2024 21:38 UTC
22 points
8 comments2 min readLW link

[Con­cept Depen­dency] Edge Reg­u­lar Lat­tice Graph

Johannes C. Mayer26 Apr 2024 21:14 UTC
5 points
0 comments1 min readLW link

[Con­cept Depen­dency] Con­cept Depen­dency Posts

Johannes C. Mayer26 Apr 2024 20:57 UTC
8 points
2 comments2 min readLW link

Ar­gu­men­ta­tion and Col­lege Admissions

Michael Michalchik26 Apr 2024 20:52 UTC
1 point
0 comments2 min readLW link

[Question] Wouldn’t weak AI agents provide warn­ing?

Mandatory Topic26 Apr 2024 19:34 UTC
5 points
0 comments1 min readLW link

Duct Tape security

Isaac King26 Apr 2024 18:57 UTC
64 points
7 comments5 min readLW link

Fun­da­men­tal Uncer­tainty: Chap­ter 8 - When does fun­da­men­tal un­cer­tainty mat­ter?

Gordon Seidoh Worley26 Apr 2024 18:10 UTC
9 points
2 comments32 min readLW link

Scal­ing of AI train­ing runs will slow down af­ter GPT-5

Maxime Riché26 Apr 2024 16:05 UTC
32 points
5 comments3 min readLW link

Spa­tial at­ten­tion as a “tell” for em­pa­thetic simu­la­tion?

Steven Byrnes26 Apr 2024 15:10 UTC
50 points
7 comments8 min readLW link

Arch-anarchy

Peter lawless 26 Apr 2024 15:05 UTC
−1 points
1 comment25 min readLW link

Bread­board­ing a Whis­tle Synth

jefftk26 Apr 2024 15:00 UTC
9 points
2 comments2 min readLW link
(www.jefftk.com)

An In­tro­duc­tion to AI Sandbagging

26 Apr 2024 13:40 UTC
28 points
0 comments8 min readLW link