Steven

Karma: 38

Steven 5 Jun 2026 1:11 UTC
1 point
0
on: Building Better Activation Oracles
Your github link is broken at the top of the page. Goes to 404 not found

Steven 26 May 2026 2:15 UTC
2 points
−1
on: Cognitive Security as an AI Safety Cause Area
This reminds me of METR’s study on the effects of AI on software engineer productivity. I bet there’s a small window where you could convince people to be in the control group and never interact with AI, so you could do that right now, but a few years from now, I’m not so sure.
There are still disanalogies like software engineers being comparatively weirder and more niche than ‘people who vote’ and that LLMs are used for productivity, not conversations. On the other side, having an LLM delete your production database or cause something catastrophic seems (I don’t have data on this) to happen way more often than catastrophically bad chatbot conversations.
Another issue is that how are you going to stop the AI from manipulating you a few months / years from now when they are everywhere, even better at manipulation than they are today? I’m mostly unimpressed by AI persuasion today, but I doubt that will hold

Steven 25 May 2026 1:49 UTC
0 points
−5
on: Women should be able to open things
Maybe this is insensitive but have you considered getting stronger? I have no idea how much you work out or how much your housemates worked out, but ranting about the tyranny of grip strength seems like it won’t work as well as increasing your grip strength. Since you said that you’re sure you’re low in female grip strength distribution, getting stronger would actually be pretty easy

Redwood City, USA—ACX Spring Schelling 2026

Steven2 Apr 2026 6:04 UTC

2 points

0 comments1 min readLW link

Steven 14 Mar 2026 22:00 UTC
1 point
1
on: How well do models follow their constitutions?
Each transcript was scored on 38 dimensions, then reviewed independently by Claude to confirm whether flagged transcripts indicated genuine violations
This seems like an odd choice to me, could you share the prompt for conversations checking for violations? I think it’s worth making sure that Claude doesn’t have a non-neutral understanding of its own constitution where other models might disagree

Steven 14 Mar 2026 18:28 UTC
1 point
0
on: The Lethal Reality Hypothesis
We surely now:
Small typo

Steven 6 Oct 2025 14:36 UTC
1 point
0
on: Notes on fatalities from AI takeover
Wouldn’t there be even cheaper ways to satisfy preferences about living humans? A fake, cheap version which satisfies that preference would probably be possible in the same way that a preference for a pet can be satisfied by a plush toy. Wanting humans or uploads but not being able to satisfy that desire with something fake seems like it isn’t how many of our actual desires work

Steven 12 Sep 2025 0:48 UTC
2 points
0
on: How anticipatory cover-ups go wrong
You can keep talking more. You can repeat the proper analysis for your vaccine, talk about your own behavior, talk about why other people analyzing your behavior is either good or bad. You don’t have to concede the public square to someone else because you’re concerned they will misinterpret things and in fact these examples seem like situations where you can and should talk your way out of them

Redwood City – ACX Meetups Everywhere Fall 2025

Steven3 Sep 2025 22:08 UTC

2 points

0 comments1 min readLW link

Steven 27 Aug 2025 2:22 UTC
3 points
−2
on: Before LLM Psychosis, There Was Yes-Man Psychosis
Isn’t the theory that consultants add value by saying true obvious things? If you realize you’re surrounded by sycophants, you might need someone who you’re sure won’t just tell you that you’re amazing (unless the consultant is also a yes man and dooms you even harder)

Steven 26 Aug 2025 15:50 UTC
4 points
0
on: Epistemic advantages of working as a moderate
Thanks for writing this. I’m not sure I’d call your beliefs moderate, since they involve extracting useful labor from misaligned AI by making deals with them, sometimes for pieces of the observable universe or by verifying with future tech.

On the point of “talking to AI companies”, I think this would be a healthy part of any attempted change although I see that PauseAI and other orgs tend to talk to AI companies in a way that seems to try to make them feel bad by directly stating that what they are doing is wrong. Maybe the line here is “You make sure that what you say will still result in you getting invited to conferences” which is reasonable but I don’t think that talking to AI companies gets at the difference between you and other forms of activism.

Steven 30 Aug 2024 1:02 UTC
8 points
−2
on: Why Large Bureaucratic Organizations?
I think you’re pretty severely mistaken about bullshit jobs. You said

At the start of this post we mentioned “bullshit jobs” as a major piece of evidence that standard “theory of the firm” models of organization size don’t really seem to capture reality. What does the dominance-status model have to say about bullshit jobs?

But there are many counter examples of this not being a real concept. See here for many of them: https://www.thediff.co/archive/bullshit-jobs-is-a-terrible-curiosity-killing-concept/

Steven 15 Aug 2024 14:43 UTC
0 points
0
on: Fields that I reference when thinking about AI takeover prevention
How would a military which is increasingly run by AI factor into these scenarios? It seems most similar to organizational safety a la google building software with SWEs but the disanalogy might be that the AI is explicitly supposed to take over some part of the world and maybe it interpreted a command incorrectly. Or does this article only consider the AI taking over because it wanted to take over?

Steven 11 Aug 2024 1:55 UTC
5 points
0
on: Parasites (not a metaphor)
Huh, did you experience any side effects?

Steven 5 Aug 2024 16:09 UTC
1 point
4
on: You don’t know how bad most things are nor precisely how they’re bad.
I think discernment is not essential to entertainment. If people really want to learn what a slightly off piano sounds like and also pay for expert piano tuning, then that’s fine, but I don’t think people should be looked down upon for not having that level of discernment.

Steven 1 Aug 2024 14:21 UTC
2 points
0
on: Self-Other Overlap: A Neglected Approach to AI Alignment
How would the agent represent non-coherent others? Like humans don’t have entirely coherent goals and in cases where the agent learns that it may satisfy one or another goal, how would it select which goal to choose? Take a human attempting to lose weight, with goals to eat to satisfaction and to not eat. Would the agent give the human food or withhold it?

Steven 30 Jul 2024 21:53 UTC
3 points
2
on: What are you getting paid in?
One thing I find weird is that most of these objects of payment are correlated. The best paying jobs also have the best peers also have the most autonomy also have the most fun. Low paid jobs were mostly drudgery along all axes in my experience

Steven 25 Jul 2024 22:51 UTC
1 point
0
on: A simple case for extreme inner misalignment
Thanks for the summary. Why should this be true?

The fact that sympathy for hedonic utilitarianism is strongly correlated with intelligence is a somewhat worrying datapoint in favor of the plausibility of squiggle-maximizers.

Embracing positive sensory experience due to higher human levels of intelligence implies a linearity that I don’t think is true among other animals. Are chimps more hedonic utilitarian than ants than bacteria? Human intelligence is too narrow for this to be evidence of what something much smarter would do

Steven 24 Jul 2024 16:50 UTC
2 points
0
on: Raising children on the eve of AI
Thank you for writing this. My girlfriend and I would like kids, but I generally try not to bring AI up around her. She got very anxious while listening to an 80k hours podcast on AI and it seemed generally bad for her. I don’t think any of my work will end up making an impact on AI, so I think basically the CS Lewis quote applies. Even if you know the game you’re playing is likely to end, there isn’t anything to do since there are no valid moves if the new game actually starts.
I did want to ask, how did you think about putting your children in school? Did you send them to a public school?

Steven 4 Jul 2024 13:48 UTC
1 point
0
on: OthelloGPT learned a bag of heuristics
What does impossible mean in the context of clock neurons?

impossible in the first few moves.

What causes them to be unable to fire?

Steven

Red­wood City, USA—ACX Spring Schel­ling 2026

Red­wood City – ACX Mee­tups Every­where Fall 2025

Redwood City, USA—ACX Spring Schelling 2026

Redwood City – ACX Meetups Everywhere Fall 2025