benjamin ar

Karma: 42

Previously worked as a startup founding engineer (does that mean anything?). Now, doing my best to think rigorously and contribute to technical safety.

My personal website.

The tick in my back

benjamin ar27 Feb 2026 21:49 UTC

12 points

0 comments4 min readLW link

(bjar.substack.com)

benjamin ar 24 Feb 2026 18:42 UTC
5 points
0
on: Realistic Evaluations Will Not Prevent Evaluation Awareness
As an exercise for myself, my simplified understanding of your argument is as follows:
A model that cannot distinguish between evaluation and deployment (i.e., not evaluation aware) does not preclude scheming abilities. Therefore, as model capabilities improve we should be skeptical of evaluations as a green light for safety cases.
After writing this, it’s hard for me to imagine a world in which it’s possible to build a complete safety case for a model before deployment. I think I am reiterating simple conclusions here, but as I’m learning, I want to double check. Am I missing anything?

Stay in your human loop

benjamin ar12 Feb 2026 21:05 UTC

22 points

0 comments5 min readLW link

(bjar.substack.com)

benjamin ar 22 Jan 2026 21:11 UTC
7 points
0
on: How Could I Have Learned That Faster?
I am constantly in analysis paralysis. I personally waste way too much time trying to prematurely optimize how I am going to learn something before actually learning any piece of the issue. To remedy this, I try to tell myself that I will naturally optimize the learning process as I progress. I think this was nicely explained by Andrej Karpathy in a podcast.

Recently, I have been thinking about this a lot as I am on full-time sabbatical, learning technical alignment research. There is so much to do and a general feeling of ignorance in any direction, making it ripe for constant optimization. I would love to have an answer, but at this point I try to operate mostly on the faith that I will optimize and the more I force premature optimization, the more time I waste. At some point, I have to just Do The Work.

benjamin ar 13 Jan 2026 18:20 UTC
1 point
0
on: Tensor-Transformer Variants are Surprisingly Performant
Tensor networks might actually be a viable alternative to typical NNs! However, if scaling is way worse (say 50%), then I highly doubt they’ll be deployed as a frontier model.
But suppose we can solve ambitious mech interp w/ tensor networks (debatable but I lean yes), then there are two regimes:
1. Low Reliability
2. High Reliability
Excited by the ambitious effort. Re: the impact argument above, I’m understanding your logic to be:
To completely replace NNs in frontier applications, scaling of TNs needs to be on par. Therefor, if we needed to replace them for TNs to be useful, we should test the scaling laws first. However, in a world where scaling is worse, TNs can still be useful by allowing for “ambitious mech interp” which would result in a High Reliability model. These two regimes aren’t mutually exclusive.

Am I following the argument correctly?

benjamin ar 13 Jan 2026 6:28 UTC
2 points
0
in reply to: JBlack’s comment on: ben ar’s Shortform
Thank you for the nudge. I hadn’t realized that I have been solely thinking about value to the collective rather than to an individual. This is most likely because I have heard this topic come up in collective benefit discourse, e.g., “These are the benefits of AI to humanity:”.

benjamin ar 12 Jan 2026 19:52 UTC
1 point
0
on: ben ar’s Shortform
I don’t think I have done the requisite work to understand the arguments, but I, by default, do not understand the appeal of uploaded minds. Are there utility arguments around being able to access a highly productive individuals mind forever? Or, is it just a more emotional appeal to living forever?

ben ar’s Shortform

benjamin ar13 Dec 2025 0:06 UTC

1 point

3 comments1 min readLW link

benjamin ar

The tick in my back

Stay in your hu­man loop

ben ar’s Shortform

Stay in your human loop