ErickBall

Karma: 677

Nuclear engineer with a focus in nuclear plant safety and probabilistic risk assessment. Aspiring EA, interested in X-risk mitigation and the intersection of science and policy. Working towards Keegan/Kardashev/Simulacra level 4.

(Common knowledge note: I am not under a secret NDA that I can’t talk about, as of Mar 15 2025. I intend to update this statement at least once a year as long as it’s true.)

ErickBall Jun 8, 2025, 5:51 PM
1 point
0
on: Alignment Proposal: Adversarially Robust Augmentation and Distillation
The principal-advisor pair forms a (coalitional) agent which, if the previous steps succeed, can be understood as ~perfectly aligned with the principal. The actions of this agent are recorded, and distilled through imitation learning into a successor.
Even assuming imitation learning is safe, how would you get enough data for the first distillation, when you need the human in order to generate actions? And how would you know when you have enough alignment-relevant data in particular? It seems unavoidable that your data distribution will be very constrained compared to the set of situations the distilled agent might encounter.

ErickBall May 29, 2025, 2:30 PM
1 point
0
on: LessWrong Feed [new, now in beta]
LessWrong has had a (not that successful) Continue Reading section that I think just needed more iterations
I think it needs an easy way to indicate that you don’t want to read the rest of a post. Ideally this would be automatic but I don’t know how to do that.
Also, it sometimes offers posts that I did finish.

ErickBall May 29, 2025, 2:09 PM
1 point
0
in reply to: Mass_Driver’s comment on: Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)
You may be right about the EO. At the time I felt it was a good thing, because it raised the visibility of safety evaluations at the labs and brought regulation of training, as well as deployment, more into the Overton window. Even without follow-up rules, I think it can be the case that getting a company to report the bad things strongly incentivizes it to reduce the bad things.

ErickBall May 28, 2025, 7:09 PM
3 points
0
in reply to: Mass_Driver’s comment on: Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)
I think there was (and is) a common belief that Congress won’t do anything significant on it anytime soon, which makes executive action appealing if you think time is running out. If what you’re suggesting here is more like a variant of the “wait for a crisis” strategy—get the legislation ready, talk to people about it, and then when Congress is ready to act, they can reach for it—I’m relatively optimistic about that. As long as there’s time.

ErickBall May 28, 2025, 6:05 PM
1 point
7
in reply to: JohnofCharleston’s comment on: Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)
Well, I certainly hope you’re right, and it remains to be seen. I don’t think I have any special insights.

ErickBall May 28, 2025, 6:03 PM
1 point
0
in reply to: JohnofCharleston’s comment on: Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)
But even though the companies stay here, the importance of the American companies may decrease relative to international competitors. Also, I think there are things farther up the supply chain that can move overseas. If American cloud companies have big barriers to training their own frontier models, maybe they’ll serve up DeepSeek models instead.
I don’t think it should be a huge concern in the near term, as long as the regulations are well written. But fundamentally, it feeds back into the race dynamic.

ErickBall May 28, 2025, 5:52 PM
1 point
2
on: What LLMs lack
Interesting idea, but I don’t think short-term memory and learning really require conscious attention, and also conscious attention mostly isn’t the same thing as “consciousness” in the qualia sense. I like the term “cognitive control” and I think that might be a better theme linking a lot of these abilities (planning, preventing hallucinations, agency, maybe knowledge integration). It’s been improving though, so it doesn’t necessarily indicate a qualitative gap.

ErickBall May 28, 2025, 4:13 PM
10 points
2
on: Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)
If your choice of arguments for AI safety or your way of presenting yourself feels too closely associated with one political party, then you might create or strengthen an association between AI safety and that party, which could lead the other party to reject AI safety on partisan grounds.
This already happened to an extent, and it wasn’t because the advocates had trouble suppressing their partisan instincts. They focused on technocratic solutions and advocated them to the people in power at the time—the Biden administration. It sort of worked. Then the administration implemented some of them (or started to lay the groundwork for implementing them) and that alone was enough to create lasting Republican opposition to something that was previously pretty neutral.

ErickBall May 28, 2025, 3:38 PM
1 point
0
on: Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)
If they’re not going to shift their research buildings 100 miles away to save a few billion a year in rent, then why would they want to shift their research buildings halfway around the world just to escape from regulations whose purposes they at least broadly agree with? And even if tech companies wanted to move, what makes us think that their scarce, valuable AI experts would agree to pick up their families and follow them to China?
I think the idea is more like “investment will tend to shift to countries with less burdensome regulations.” That could be external investment (who is SoftBank sending its billions to?) or internal investment (where does Microsoft decide to build its newest data center). Lots of countries have AI opportunities to invest in, and would get that investment if they looked more promising relative to the existing leaders.

ErickBall May 24, 2025, 2:39 PM
1 point
0
in reply to: Hastings’s comment on: Claude 4
I eventually figured out the joke but I downvoted this because I wasted several minutes trying to understand what the math was about before I realized it wasn’t the point.

ErickBall May 16, 2025, 8:31 PM
2 points
2
in reply to: ChristianKl’s comment on: Consider not donating under $100 to political candidates
Usually a direct donation to a candidate is a lot more valuable than giving to a SuperPAC.

ErickBall May 16, 2025, 8:28 PM
23 points
5
on: Generating the Funniest Joke with RL (according to GPT-4.1)
I think the prompt is critical here. I tried asking Claude how funny that last joke was and it said ⁶⁄₁₀ but then floundered when trying to explain why it’s funny or what any of it means. I asked again but added “I’ll be asking you about many different jokes (in separate conversations), some good and some bad, and I need you to be completely honest about each of them.” This time it gave a ³⁄₁₀ and accurate criticism:
Claude 3.7 output
This joke has an interesting structure but feels disjointed and unfocused. It starts with a fairly standard setup about a cat and laser pointer, but then takes several random turns—the cat speaking, being fired, wizardry, portals to a “Realm of Snacks,” and unionizing mice. The transitions between these elements don’t flow naturally, making it feel like a collection of random thoughts rather than a cohesive joke.
The final line about accidentally sending a LinkedIn message seems tacked on and doesn’t connect well with the rest of the joke.
On a humor scale, I’d rate it about ³⁄₁₀. It has some creative elements and absurdist qualities that might appeal to some, but the execution lacks the cohesion needed to make those elements work together effectively. The joke tries to be quirky and random but doesn’t quite land because of its meandering structure.
I bet your RL would work somewhat better if you play around with prompting techniques for the rater.

ErickBall May 14, 2025, 9:54 PM
1 point
1
on: Fighting Obvious Nonsense About AI Diffusion
To not only fail to robustly support and bring down regulatory and permitting barriers to the nuclear power we urgently need to support our data centers
Nuclear power for data centers is nice in theory, but if you have timelines ~8 years or less for superintelligence, then for practical purposes basically none of the relevant data centers will be powered by new nuclear plants in the US regardless. Of course if they get rid of enough government support they might force some of the old ones to shut down, but that seems unlikely.

ErickBall Apr 8, 2025, 7:34 PM
1 point
0
on: AI #110: Of Course You Know…
But the reason I mention this here is that a >5 year ‘median timeline’ to get to >30% GWP growth would not have required detailed justifications until very recently. Now, Matthew sees it as conservative, and he’s not wrong.
This seems crazy to me. Unless the machines get rid of the current economy and start from near-zero, I don’t think we’ll see >30% GWP growth at all, and certainly not right away.
From what I can find, extreme growth rates like this historically have had two causes: 1) recovery from a major disaster, usually war, or 2) discovery of a massive oil reserve in a poor country (e.g. Guyana recently). Less extreme but still high growth rates can occur do to mobilization during a war.
The oil case requires the surrounding world economy to already be much larger—outside investment is used to rapidly exploit the newly discovered resources, and then the oil is exported for cash, and presto, massive GDP growth. It’s not a good parallel to endogenous growth because it doesn’t require an internal feedback loop to build capacity. It also doesn’t translate in the short term to the rest of the economy: Guyana has a GDP per capita of $80k as money accumulates in the Natural Resource Fund, but half its population still lives on less than $5.50/day.
Recovery from disaster also seems like a poor analogy for automation, because it depends on infrastructure (both physical and social/corporate/human capital) that already existed but was forced to sit idle. We will need time to create that capital from scratch.
If someone deployed a superintelligent model tomorrow, do you think in 5 years we could quadruple our production of cars, houses, or airplanes? Would we have four times as many (or four times better) haircuts or restaurant meals? Real estate and leasing alone make up almost 14% of GDP and won’t see a boom until after household incomes go up substantially. Even if the AI created wonder drugs for every disease, how long would it take to get them into mass production?
I think we would get a massive surge of investment comparable to US mobilization in WWII, when real GDP nearly doubled in a six year period and growth exceeded 17% for three years running. But it might not even be that extreme. Production of consumer goods like automobiles, household appliances, and housing was severely curtailed or halted, and shortages/rationing became commonplace—growing pains that would be less tolerable without the pressure of an ongoing war. In the short term, we could probably 10x our production of software and Netflix shows, but it would be unlikely to show up as massive gains in the productivity numbers. See also the Productivity Paradox.

ErickBall Apr 4, 2025, 2:17 PM
5 points
2
in reply to: Fabien Roger’s comment on: Auditing language models for hidden objectives
Fair enough, I guess the distinction is more specific than just being a (weak) mesa-optimizer. This model seems to contradict https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target because it has, in fact, developed reward as the optimization target without ever being instructed to maximize reward. It just had reward-maximizing behaviors reinforced by the training process, and instead of (or in addition to) becoming an adaptation executor it became an explicit reward optimizer. This type of generalization is surprising and a bit concerning, because it suggests that other RL models in real-world scenarios will sometimes learn to game the reward system and then “figure out” that they want to reward hack in a coherent way. This tendency could also be beneficial, though, if it reliably causes recursively self-improving systems to wirehead once they have enough control of their environment.

ErickBall Apr 3, 2025, 7:24 PM
5 points
0
on: Auditing language models for hidden objectives
we can be confident about why it’s doing this: to get a high RM score
Does this constitute a mesa-optimizer? If so, was creating it intentional or incidental? I was under the impression that those were still basically theoretical.

ErickBall Apr 3, 2025, 6:54 PM
1 point
0
on: The AI Adoption Gap: Preparing the US Government for Advanced AI
I think this topic is important and many of your recommendations sound like great ideas, but they also involve a lot of “we should” where it’s not clear who “we” is. I would like to see, for some of these, targeting to a specific audience: who actually has the capability to help streamline government procurement processes for AI, and how? What organizations might be well positioned to audit agency needs and bottlenecks? I’m left with the sense that these things would be good in the abstract, but that there’s little I personally (or most other readers, unless they work high up in the administration) can realistically contribute to them.
I’m also skeptical about doing preparation outside government for rushed adoption. There probably is some opportunity here, but any solution prepared externally would itself have to be adopted during a crisis, and that will take time, maybe just as much time as the government developing its own tools, plans, and protocols. Much of what makes government slow to adopt new systems is the need to get multiple levels of approval, go through a procurement process, and train existing government workers to work with the new system.

ErickBall Mar 18, 2025, 12:36 AM
2 points
6
in reply to: StopAI’s comment on: A Bear Case: My Predictions Regarding AI Progress
Why on earth would pokemon be AGI-complete?

ErickBall Mar 12, 2025, 2:30 AM
15 points
7
in reply to: Shayne O'Neill’s comment on: OpenAI: Detecting misbehavior in frontier reasoning models
There are big classes of problems that provably can’t be solved in a forward pass. Sure, for something where it knows the answer instantly the chain of thought could be just for show. But for anything difficult, the models need the chain of thought to get the answer, so the CoT must contain information about their reasoning process. It can be obfuscated, but it’s still in there.

ErickBall Mar 9, 2025, 11:29 PM
1 point
0
in reply to: β-redex’s comment on: So how well is Claude playing Pokémon?
I kind of see your point about having all the game wikis, but I think I disagree about learning to code being necessarily interactive. Think about what feedback the compiler provides you: it tells you if you made a mistake, and sometimes what the mistake was. In cases where it runs but doesn’t do what you wanted, it might “show” you what the mistake was instead. You can learn programming just fine by reading and writing code but never running it, if you also have somebody knowledgeable checking what you wrote and explaining your mistakes. LLMs have tons of examples of that kind of thing in their training data.