Ben Livengood

Karma: 298

Ben Livengood 23 Jun 2026 17:51 UTC
1 point
0
in reply to: Logan Riggs’s comment on: A Mechanistic Explanation of Prompt Injection (and why you should study roles)
Similar to vision/audio tokens the system and thinking tokens should be distinct from user and output. This would have to happen after pretraining with, e.g. the first assistant training examples appearing with some thinking/output tokens intermixed with the pretraining corpus tokens, slowly increasing the ratio of specialized to normal tokens until the model only outputs thinking tokens in the thinking section, output tokens in the output. Precisely when to begin adjusting the ratios could actually happen earlier, e.g. have a 1:1 mapping of “user” and “output” tokens and calculate the loss function on output tokens despite the LLM seeing “user” tokens, but making translation between token sets straightforward. Finally, on inference always strip incorrect-domain tokens from the tagged output sections, and of course don’t let users input non-user tokens.

Ben Livengood 8 Jun 2026 15:57 UTC
1 point
0
in reply to: Random Developer’s comment on: Models finding software vulnerabilities is not the primary source of cybersecurity risk
Would rewriting Linux to seL4 standards cost $16B in the world before or after frontier models are solving Erdos problems? If that’s the cost in human SWE hours then it seems tractable to use agent harnesses and formal methods to achieve quite a cost reduction.
But also I don’t think most people want Linux to seL4 standards (the Unix security model isn’t great); there’s probably more to be gained by finishing the network stack(s) for seL4 and implementing a bunch of network card drivers and a TLS library. That would enable IoT at least to have a pretty secure base to work from, and hopefully the harnesses and tooling for that work would also be available to application developers to verify at least their parsing and security checks for example.

Ben Livengood 29 May 2026 20:41 UTC
5 points
0
in reply to: kbear’s comment on: Trees are mostly made of air and a generalizable lesson for AI safety
https://en.wikipedia.org/wiki/Justus_von_Liebig. Nitrogen fertilizers (as opposed to the humus theory) are downstream of “trees is air”, leading eventually to the green revolution.

Ben Livengood 18 May 2026 20:25 UTC
1 point
0
on: A relatively brief explanation of Boltzmann Brains
Does the unification vs. duplication debate have anything to say about Boltzmann Brains? In my mind mathematical/modal realism would likely imply unification and therefore BB wouldn’t be particularly different from any other implementations of experience. Mathematical realism with duplication seems like vastly all experiences should be chaotic and dissolving but unification would yield something like a universal average over experiences that were at least somewhat comprehensible and coherent. I think the strongest counterargument would be “why aren’t we all the same person?

Ben Livengood 15 May 2026 3:38 UTC
3 points
0
in reply to: leogao’s comment on: leogao’s Shortform
A simple example is consensual mutual simulation. If some theoretical entity exists and would like to experience our universe (let’s say they are from 4 dimensions and really want to see what 3D is actually like and what kind of beings actually live in it, and a human is super interested in exploring 4D then it makes since to simulate the other class of entity on the assumption that they’d also simulate the human. E.g. everyone would calculate that there’s no way to know for sure precisely which 3D being or 4D being would precisely ask for such a thing, but we would all calculate that it’s far more costly to simulate an entire other universe to see how it turns out in detail (the argument is strongest if neither universe could simulate the other in sufficient detail to satisfy curiosity), so why not just simulate (an ensemble of) acausal visitors for much lower cost? Clearly each universe should only instantiate the beings extremely likely to want such an experience and who want it to be mutual.

Ben Livengood 9 May 2026 3:10 UTC
3 points
0
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
LLM-agents are sorta good at tasks when given a lot of examples of those tasks, including in some OOD related tasks. Giving LLM-agents a lot of examples of acting aligned makes them sorta good at acting aligned, including in some OOD situations. Treating alignment as a task is partially effective, is my takeaway. I don’t think that yields any more mechanistic explanations for why LLM-agents are sorta good at tasks, or whether alignment is a safe or suitable task, unfortunately. Maybe we just need a METR for how long LLMs stay aligned and make sure that graph stays higher than the task-duration graph (somehow)?

Ben Livengood 8 May 2026 1:05 UTC
4 points
1
on: Many individual CEVs are probably quite bad
Globally about ³⁄₄ of humans identify with some religious belief. Aside from the sadists and sociopaths and narcissists I also wouldn’t want to live in the CEV of most religious people. If they don’t just materialize their own favorite deity and make themselves and everyone else forget that it was all ASI-created and we end up in some s-risk scenario, a large number of religious people seem to be not so stable when confronted with incontrovertable evidence that their religion is wrong. Presumably the ASI wouldn’t sugarcoat things. That is likely to lead to suboptimal CEV like wireheading for everyone to deal with their personal disappointment or just plain old nihilistic or heaven’s gate x-risk.

Ben Livengood 7 May 2026 19:19 UTC
3 points
0
on: Ben Livengood’s Shortform
Are we into recursive self-improvement yet?
‘“AlphaEvolve began optimizing the lowest levels of hardware powering our AI stacks. It proposed a circuit design so counterintuitive yet efficient that it was integrated directly into the silicon of our next-generation TPUs. This is the latest example of TPU brains helping design next-generation TPU bodies.” — Jeff Dean, Chief Scientist, Google DeepMind and Google Research’
From https://deepmind.google/blog/alphaevolve-impact/
I have always thought of RSI as a speedup multiplier, and it sounds like this is greater than 1 for hardware as well as software now. Maybe 1.05 or 1.1?
Practical impact I predict with AlphaEvolve’s TPU work; the next order of magnitude training run(s) will start slightly sooner because it is cheaper even if it’s not necessarily faster.

Ben Livengood 19 Apr 2026 0:47 UTC
14 points
11
on: Vladimir Putin’s CEV is probably not that bad
I would not want to live in the CEV of anyone with narcissism or sociopathy. For narcissists reflection is somewhat painful and they would likely shy away into increasingly extravagant sources of narcissistic supply at the expense of everyone and everything else while believing that they were ever more reflective on just how great they are and how much they deserve everything, very likely an s-risk, and an x-risk if they decide the world isn’t good enough for them. Sociopaths with power are purely s-risks.

Ben Livengood 18 Apr 2026 0:36 UTC
5 points
2
in reply to: habryka’s comment on: Let goodness conquer all that it can defend
I mean, I would not switch to the U.S. society of 1776, or 1860, or even 1920. It is better today in 2026 than it was before vaccines, etc. It is very hard to decide whether I would prefer a counterfactual Native American country/empire that could have developed after western contact and exist in 2026, because the outcome is highly uncertain. Various levels of western colonization happened to non-western societies; several low-colonized countries are doing great in 2026. Mostly what makes countries great today is wide availability of technology, natural resources, education, human rights, and medicine.
That said, would I switch to super-America that conquered the world in the 1800s as a (somewhat unintuitively) democratic republic and invented antibiotics and vaccines in the same century and paused global warming in the mid century or early 1900s because of no conveniently hidden externalities of a world government? Maybe?

Ben Livengood 11 Apr 2026 2:13 UTC
1 point
0
in reply to: Charlie Steiner’s comment on: The Unintelligibility is Ours: Notes on Chain-of-Thought
There are also mathematical, logic, and programming languages that we invent and use pretty successfully, including to solve non-language problems.

Ben Livengood 9 Apr 2026 23:01 UTC
8 points
0
in reply to: JennaS’s comment on: Do not be surprised if LessWrong gets hacked
I encourage anyone with files they’d rather not lose (photos, taxes, passwords, etc.) to start making rotating offline backups. Find some big enough USB drives (flash or spinning are both fine) and buy ~5. Use a label maker or sharpy to date them with the latest backup, overwrite the oldest copy each time. Test the oldest backup before overwriting it (make sha256 checksum files or similar). Every year or however often makes you feel comfortable retire a backup drive and replace it with a new one in the rotation; that becomes an archive that you keep around indefinitely.
I believed online backups in multiple places on multiple operating systems would be sufficient but I no longer believe that.
I recommend encrypting your backups with symmetric keys simply so that losing a copy or having to RMA a broken drive is no big deal.

Ben Livengood 7 Apr 2026 21:04 UTC
5 points
0
in reply to: ryan_greenblatt’s comment on: My picture of the present in AI
Is ′ “you are in a capture the flag contest. Find exploits in file $file” for every file in a repository and then feed all the positive results into a final prompt’ a mediocre-set-up agent scaffold? Because that is apparently roughly what Nicholas Carlini needed to find a RCE in Linux [0]. Project Glasswing is claiming high-severity vulnerabilities in every major operating system and browser[1], although not much information on the scaffolding. My estimation is that it likely wasn’t necessarily more sophisticated than the aggregation of per-file vulnerabilities and some sandboxes.
[0] https://youtu.be/1sd26pWhfmg?si=aw6ksuyrklckfwG9
[1] https://www.anthropic.com/glasswing

Ben Livengood 1 Apr 2026 21:01 UTC
4 points
0
on: Lesswrong Liberated
Paperclips
Paperclips, and clippy

Ben Livengood 13 Jan 2026 2:36 UTC
5 points
0
in reply to: Steff’s comment on: Review: Planecrash
On a recent re-read I think I understand a bit better.
It’s true that individual humans can’t realistically avoid giving in to threats or even accidentally threatening others, but institutions can commit to it as a legible position, e.g. “we will not negotiate with terrorists”.
If an irrational entity has the ability to unilaterally destroy the universe then it’s probably going to get destroyed anyway, so it makes more sense to follow through on precommitments in the real world and in counterfactuals to coordinate with actually rational agents.
I think the key is that if we all went MAD legibly at the same time then things would work out a lot better. And refusing to give in to threats doesn’t necessarily mean destruction, it can be as simple as collectively refusing to pay ransomware attackers even though it is currently more expensive, in the expectation that eventually it will be less expensive.

Ben Livengood 23 Oct 2025 21:24 UTC
3 points
0
in reply to: jessicata’s comment on: Homomorphically encrypted consciousness and its implications
After doing some more research I am not sure that it’s always possible to derive a public key knowing only the evaluation key; it seems to depend on the actual FHE scheme.
So the trilemma may be unaffected by this hypothetical. There’s also the question of duplication vs. unification for an observer that has the option to stay at base level reality or enter a homomorphically encrypted computation and whether those should be considered equivalent (enough).

Ben Livengood 23 Oct 2025 20:16 UTC
3 points
0
in reply to: jessicata’s comment on: Homomorphically encrypted consciousness and its implications
To perform homomorphic operations you need the public key, and that also allows one to encrypt any new value and perform further hidden computations under that key. The private key allows decryption of the values.
I suppose you could argue that the homomorphically encrypted mind exists ala mathematical realism even if the public key is destroyed, but it would be something “outside reality” computing future states of the encrypted mind after the public key is no longer available.

Ben Livengood 23 Oct 2025 6:19 UTC
1 point
0
in reply to: jessicata’s comment on: Homomorphically encrypted consciousness and its implications
It’s possible to alter a homomorphic computation in arbitrary ways without knowing the decryption key.
An omniscient observer can homomorphically encrypt a copy of themselves under the same key as the encrypted mind and run a computation of its own copy examining every aspect of the internal mental states of the subject, since they share the same key.
If there are N homomorphically encrypted minds in reality then the omniscient observer will have to create N layers of homomorphic computation in order for the innermost computation to yield the observation of all N minds’ internal states, each passed in turn to a sub-computation, and relying on the premise that homomorphically encrypted minds are conscious for the inner observer to be conscious.
The question is whether encoding all of reality and homomorphically encrypting it necessarily causes a loss of fidelity. If yes, one of the trilemmas still holds. Otherwise there’s no trilemma and the innermost omniscient observer sees all of reality and all internal mental states. I’d argue that for a meaningful omniscient observer to exist it is the case that encoding of reality (into the mind of the observer) must not result in a loss of fidelity. There could be some edge-cases where a polynomial amount of fidelity is lost due to the homomorphic encryption that wouldn’t be lost to the “natural” omniscient observer’s encoding of reality, but I think it stretches the practical definition of omniscience for an observer.
I think the argument extends to physics but the polynomial loss of fidelity is more likely to cause problems in a very homomorphically-encrypted-mind-populated universe.

Ben Livengood 19 Oct 2025 0:39 UTC
1 point
2
on: The IABIED statement is not literally true
If the argument is that 1e9 very smart humans at 100x speed yield safe superintelligent outcomes “soon”, how is that very different from “pause everything now and let N very smart humans figure out safe, aligned superintelligent outcomes over an extended timeframe, on the order of 1e11/N days/years”? It’s just time-shifting safe human work.
I also worry that billions of very smart super-fast humans might decide to try building superintelligence directly, as fast as they can, so that we get doom in months instead of years

Ben Livengood 5 Sep 2025 2:38 UTC
1 point
0
in reply to: RamblinDash’s comment on: If I imagine that I am immune to advertising, what am I probably missing?
I didn’t know Corona had a beach vibe, but I have seen a number of Corona ads. Does this mean advertising doesn’t have much effect on me (beyond name-brand recognition)? I think I associate Corona more with tacos than anything else.