Zvi

Karma: 54,454

New Statement Calls For Not Building Superintelligence For Now

Zvi24 Oct 2025 17:40 UTC

60 points

3 comments7 min readLW link

(thezvi.wordpress.com)

AI #139: The Overreach Machines

Zvi23 Oct 2025 15:30 UTC

31 points

4 comments52 min readLW link

(thezvi.wordpress.com)

On Dwarkesh Patel’s Podcast With Andrej Karpathy

Zvi21 Oct 2025 16:00 UTC

30 points

6 comments31 min readLW link

(thezvi.wordpress.com)

Zvi 21 Oct 2025 13:04 UTC
2 points
0
in reply to: PeterMcCluskey’s comment on: Bubble, Bubble, Toil and Trouble
I wouldn’t obviously even put AMD on the list given that they’re up on rather big single stock news, but yes, good note, there is that.

Bubble, Bubble, Toil and Trouble

Zvi20 Oct 2025 13:22 UTC

74 points

7 comments15 min readLW link

(thezvi.wordpress.com)

AI #138 Part 2: Watch Out For Documents

Zvi17 Oct 2025 11:50 UTC

39 points

8 comments45 min readLW link

(thezvi.wordpress.com)

AI #138 Part 1: The People Demand Erotic Sycophants

Zvi16 Oct 2025 15:41 UTC

24 points

7 comments46 min readLW link

(thezvi.wordpress.com)

Monthly Roundup #35: October 2025

Zvi15 Oct 2025 19:50 UTC

23 points

1 comment49 min readLW link

(thezvi.wordpress.com)

Trade Escalation, Supply Chain Vulnerabilities and Rare Earth Metals

Zvi14 Oct 2025 15:30 UTC

29 points

0 comments9 min readLW link

(thezvi.wordpress.com)

OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53

Zvi13 Oct 2025 15:00 UTC

104 points

2 comments23 min readLW link

(thezvi.wordpress.com)

2025 State of AI Report and Predictions

Zvi10 Oct 2025 17:30 UTC

27 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Zvi 9 Oct 2025 21:34 UTC
5 points
0
on: Realistic Reward Hacking Induces Different and Deeper Misalignment
Would a reasonable way to summarize this be that if you train on pretend reward hacking you get emergent misalignment that takes the form of pretending (playacting) misbehaving and being evil, whereas if you here train on realistic reward hacking examples it starts realistically (and in some ways strategically) misbehaving and doing other forms of essentially reward hacking instead?