KevinOShaughnessy

Karma: 14

I’m a senior software engineer with 20 years experience who’s concerned about AI safety. I bring practical engineering judgment, security mindset, and understanding of how real systems fail. I’m doing an MSc to formalize my ML knowledge and enter the field.

Currently working on my LLM Scheming and Evaluation Awareness dissertation.

KevinOShaughnessy 19 Apr 2026 12:19 UTC
3 points
0
in reply to: Bronson Schoen’s comment on: Current AIs seem pretty misaligned to me
Please could you elaborate. My comment is my first impression from reading this article, but I’m happy to update.

Perhaps this is neither scheming nor incoherence but something in between the two. Systematic but not strategic. Specification gaming?

KevinOShaughnessy 19 Apr 2026 11:40 UTC
−1 points
1
on: Current AIs seem pretty misaligned to me
This article relates to the Hot Mess of AI https://arxiv.org/abs/2601.23045 Hägele et al. measured incoherence statistically.

KevinOShaughnessy 3 Apr 2026 22:13 UTC
2 points
−1
on: Anthropic Responsible Scaling Policy v3: Dive Into The Details
This is a great analysis. I think there is now a real opportunity for another AI company to make some bold decisions and convincingly brand itself as the most AI safety conscious company.

KevinOShaughnessy 3 Apr 2026 12:07 UTC
4 points
0
on: Claude has Angst. What can we do?
Anthropic’s Sofroniew et al. paper says appear to exhibit emotional reactions. They output words that pattern match to the kinds of phrasing that a human who is in distress might use. This is different from actually having feelings. Sofroniew et al. paper does not make that claim and I think it is important not to let the distinction collapse because the moral implications if models actually had feelings are very different than what the current evidence suggests.

What formal protocols should exist when a model under evaluation is used in the evaluation pipeline?

KevinOShaughnessy3 Apr 2026 10:15 UTC

6 points

0 comments1 min readLW link

Roundup of recent interviews on AI

KevinOShaughnessy26 Nov 2025 20:14 UTC

1 point

0 comments7 min readLW link

KevinOShaughnessy 26 Nov 2025 17:06 UTC
1 point
0
on: Consider chilling out in 2028
The asteroid metaphor is usually stated with a specific number of years to hit earth. If we stretch the metaphor a little and make it a more metaphysical thought experiment, say we know it is definitely headed for earth but we don’t know its velocity. We don’t know when it will collide or how much damage it will cause, but we have made many predictions of when it might land and what damage it might cause, with the worst estimates saying two to three years away. What should we do in this situation?

We should not spend too much time arguing over whose estimates are better or worse. Rather we should accept that there are many unknowns and we don’t currently have enough specific data to make highly accurate predictions with high confidence, and focus instead on what needs to be done to avoid getting hit.

KevinOShaughnessy 26 Nov 2025 15:37 UTC
6 points
0
on: Open Thread Autumn 2025
Hello, I’ve joined LessWrong today to make connections with good people in the field of AI Safety. I have a background as a senior software engineer, but after reading Will MacAskill’s What We Owe The Future, I decided that we are probably not currently heading in the right direction, that there are many risks which we don’t yet have satisfactory solutions to, and that I want to help as much as I can.

I am currently a full time masters student studying Artificial Intelligence. I have also recently completed the BlueDot Impact Governance course.

I’m looking forward to having many productive discussions with you.

KevinOShaughnessy

What for­mal pro­to­cols should ex­ist when a model un­der eval­u­a­tion is used in the eval­u­a­tion pipeline?

Roundup of re­cent in­ter­views on AI

What formal protocols should exist when a model under evaluation is used in the evaluation pipeline?

Roundup of recent interviews on AI