Ephraiem Sarabamoun

Karma: 10

Should AI Safety Researchers Experiment with Automated Research

Ephraiem Sarabamoun2 Jun 2026 23:18 UTC

1 point

0 comments1 min readLW link

Ephraiem Sarabamoun 18 May 2026 20:36 UTC
1 point
2
on: A relatively brief explanation of Boltzmann Brains
I don’t think the idea of boltzmann brains and the ideas of a thermalized universe are related. The argument that you offer does effectively refute the boltzman brain problem but does not argue against that our universe is moving towards thermal equilibrium.

This whole idea of “choas” hides a lot of baggage. For every physical system you must define the states of maximal entropy. Those states are only relevant given your system constraints. When people talk of “choas” or maximal entropy I always think that they are in the back of their mind thinking of a homogeneous gas of particles bouncing around. That is the state of maximal entropy for an ideal gas but that does not transfer to other physical systems.

Bringing this back to the subject at hand, the reason that I and other physicists think the universe is handed towards a total thermalization is because of dark energy which is a physical and empirical effect that has been deduced from observations done by independent teams of astronomers. There is still an area where little is understood, and it is an ongoing area of study, but what would demonstrate that this is wrong are empirical results.

One point of clarification that might be useful, the “thermal equilibrium” that current cosmology suggests we are moving towards, bears no relation at all to the thermal equilibrium of an ideal gas where you have a well mixed box of particles bouncing around and occasionally fluctuating into complex objects. If dark energy dominates, then particles and structures will become increasingly causally isolated and all ordinary interactions will eventually die out (so it is the kind of thermal equilibrium that can never give rise to a conscious experience). The thermalized universe current theories of cosmology predict would not lead to the Boltzmann brain problem.

Ephraiem Sarabamoun 11 May 2026 13:47 UTC
3 points
0
on: The Darwinian Honeymoon—Why I am not as impressed by human progress as I used to be
Interesting read, thank you for sharing. I think I would distinguish between two “optimization processes” that have molded the fates of living things on this planet (I don’t love the term “optimization process” but it will do).

1. Evolution by natural selection which selects for “fitness”
2. Human intentional selection, which selects for usefulness to humans.
As a general principle, I do think you are on to something with the idea of a “honeymoon phase”. The general pattern is, new niche opens up, one specie moves into this niche and dominates it (honeymoon phase), other species move into the niche making it much less comfortable and much more competitive than it used to be.

I do think though, that with the emergence of humans on the planet, the second optimization process is dominant. Like you mentioned with the example of the chickens, I do think it does make sense to translate the idea of a honeymoon phase to this optimization process as well, but we must account for the fact that unlike natural selection, what is driving the train now are human preferences. This makes me more optimistic than you are that barring major cataclysms or the emergence of superhuman AIs, humanity will have a bright future.

One last note, if AI development continues and we do enter a post human AI age, then the world might be molded by what the AIs want rather than what we want. In that world, humans might indeed enter a honeymoon phase where the AIs treat us well at first and then really poorly after (like the chicken).

Would be interested in hearing your thoughts on this, and thanks again for sharing!

Ephraiem Sarabamoun 10 May 2026 13:27 UTC
1 point
0
on: Asymmetry Between Defensive and Acquisitive Instrumental Deception
Thank you for sharing! I was wondering how closely this mirrors human studies. This may be a learned tendency from pre-training.

Ephraiem Sarabamoun 8 May 2026 20:06 UTC
3 points
0
in reply to: ChristianKl’s comment on: How Go Players Disempower Themselves to AI
Hey Christian, thank you for your thoughtful comment. Yes, I remember also hearing about such studies, and you are correct that such studies would imply that engineers are more bullish on the usefulness of models than is merited which is difficult to reconcile with the general tone of my comment (suggesting that engineers around me seem to underplay how much they themselves use models and downplay the capabilities of models)

I can think of a few interesting suggestions though that might be insightful:
1. Maybe the effects that I have observed are not representative of the field as a whole. I have noticed that junior developers and startup founders are generally much more open about their LLM usage and much more bullish on model capabilities than senior developers at large companies.
2. It is possible this is a social effect. I.e engineers today do think that LLMs are capable but downplay this fact in public. If, for example, the studies that you site are based on anonymous feedback then that discrepancy would be consistent with this effect. Engineers will reply anonymously that they are more productive with models but publicly will downplay this.
3. Models have gotten a lot better. I would be interested to see if those studies would yield the same results today as they did then. I personally would agree that a year ago models were probably a net negative on productivity especially for senior developers (they would introduce more subtle bugs that are a pain to debug). Today though, I think they are certainly a net productivity gain, and it would be interesting to see if developers today are undervaluing the productivity gains from models.

These are just a few thoughts I had. I would be interesting in hearing what you think about this. Ultimately this is only my experience, I would be interested to know what others’ experiences are

Ephraiem Sarabamoun 8 May 2026 18:17 UTC
5 points
2
on: How Go Players Disempower Themselves to AI
I have seen a softer version of this in software engineering. Even though in the case of software engineering, using AI would not count as “cheating”, still, many engineers will actively downplay the degree to which they rely on models for their work. I also noticed a general reluctance to acknowledge when models perform well and an eagerness to highlight when models perform poorly. I think this is partially driven by job insecurity (If you acknowledge that a model is better at coding than you are, then why should a company keep you on the payroll) and partially by feelings of inferiority (people will see me as a fraud, etc.). While such behaviors and feelings are understandable, I do think they are fundamentally disempowering and come from a place of weakness and insecurity.

Ephraiem Sarabamoun 8 May 2026 17:55 UTC
2 points
0
on: Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI
Hey Eliezer, I am a big fan of your work. I did have one idea about this and would be interested in hearing your thoughts. I liked your example about the Maginot line, if one German division crosses into France, you do not have time to build a new better Maginot line before the next division arrives. This is both true and pretty funny. Here, the reason why you cannot rely on the “gradualness of the problem” as a source of hope is that the intervention technique (building the Maginot line) takes orders of magnitude more time to implement than the developing problem (German divisions moving into France). For AI takeoff though, perhaps the unfolding problem takes long enough to develop that interventions (a world wide ban on training frontier models for example) actually can work mid-disaster. I would be interested in hearing your thoughts on this. Thank you!

Ephraiem Sarabamoun 2 May 2026 19:20 UTC
1 point
0
in reply to: StanislavKrym’s comment on: How does Reinforcement Learning Affect Models
Hey there! Thank you very much for your comment. Yep, I do think all the links you included are highly relevant to this post.

How does Reinforcement Learning Affect Models

Ephraiem Sarabamoun27 Apr 2026 5:22 UTC

2 points

2 comments2 min readLW link

AI Safety Manual

Ephraiem Sarabamoun31 Mar 2026 4:21 UTC

1 point

0 comments10 min readLW link

Ephraiem Sarabamoun

Should AI Safety Re­searchers Ex­per­i­ment with Au­to­mated Research

How does Re­in­force­ment Learn­ing Affect Models

AI Safety Manual

Should AI Safety Researchers Experiment with Automated Research

How does Reinforcement Learning Affect Models