roha

Karma: 170

roha 5 Jul 2026 19:28 UTC
2 points
0
in reply to: roha’s comment on: (Don’t fear) the strangelet
The blast wave circled the globe three times [...]
A seismic wave in the Earth’s crust, generated by the shock wave of the explosion, also circled the globe three times.
The atmospheric pressure wave resulting from the explosion was recorded three times in New Zealand [...]
Glass shattered in windows 780 km (480 mi) from the explosion
Is there an upper limit to the possible yield of thermonuclear weapons? Could the pressure wave become strong enough to break windows and more around the globe?

roha 5 Jul 2026 19:16 UTC
3 points
0
in reply to: Haiku’s comment on: (Don’t fear) the strangelet
Tsar bomba as reference for comparison: It’s the most powerful weapon tested so far and “yielded the equivalent of 50 megatons of TNT”, while Project Sundial “intended to have a yield of 10 gigatons of TNT”, i.e. a factor 200 between them.

roha 26 Jun 2026 18:36 UTC
4 points
0
in reply to: Adam Bandel’s comment on: AI pause: the case for ASAP
The labs themselves have communicated their desire for building a coordination mechanism that enables a pause. Even capitalists don’t want to die from losing control over RSI AI systems. I’m not sure what’s going on in the mind of military strategists though. They might prefer to die from their own weapons rather than those of their enemies.

roha 25 Jun 2026 16:38 UTC
10 points
0
on: AI pause: the case for ASAP
Anthropic and OpenAI have recently published statements on the topics of RSI / AI doing AI research and the need for a coordinated pause / slowdown. I’ll post one paragraph and link to the rest.
Anthropic:
What should we do? If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing. But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe. Without a global coordination mechanism, companies and governments will have to make difficult decisions about safety while under competitive and geopolitical pressures.^[1]
OpenAI:
As frontier AI development continues, we expect national and global coordination to become more important. We have long believed there should ultimately be an international organization that helps coordinate leading AI efforts to reduce catastrophic risk. Cooperation and shared safety standards are an important part of the path forward, especially because the incentives around commercial and national competition are hard to escape. One goal of such an organization should be to make it possible for the world to take coordinated action, including slowing frontier development when needed, so societal resilience, safety, and alignment can keep pace.^[2]
Is anybody aware of statements by other relevant actors I’ve missed?
1. ^
  https://www.anthropic.com/institute/recursive-self-improvement
2. ^
  https://openai.com/index/built-to-benefit-everyone-our-plan

roha 17 Jun 2026 18:00 UTC
1 point
0
in reply to: Hans Niemand’s comment on: A frontier AI company should shut down
Yes, it might not even require the halting of R&D operations. A Mythos level model without rare failure modes, no hallucinations, much better understood internal representations and algorithms, without an abundance of possible jailbreaks, reliably refusing or not having knowledge about biosecurity-relevant skills, with excellent harnesses for different use cases, etc., would be both a safer and in many senses more capable model that does not increase but reduce overall risk. An organization could halt all efforts of scaling up the models into existentially dangerous terrain, where they can’t be responsibly deployed anyways, and instead redirect all resources into building a highly safe, useful and well-understood model, which can therefore be made broadly accessible and accelerate basic and applied research in all science and engineering domains that pose no existential risks. This could be marketed similarly as a complete shutdown of frontier development, or perhaps better so, as a redefinition what frontier and SOTA model should actually mean, i.e. not an ever faster plane but an ever more robust one that gets you where you want without falling apart on the way or arriving elsewhere.

roha 10 Jun 2026 23:50 UTC
4 points
0
in reply to: Dagon’s comment on: The Machines Lack Honour
Honour regularly leads humans to commit acts of extreme violence that can hardly be motivated or internally justified without that concept, such as killing themselves ([1], [2], [3], [4]), family members ([5], [6], [7]), and others ([8], [9], [10]). In my opinion, a mechanism that frequently and independently comes with these side effects is a highly suspicious candidate for an ingredient of an alignment strategy.
Edit: Link formatting

roha 9 Jun 2026 21:05 UTC
0 points
1
on: The Machines Lack Honour
Honour has many failure modes. Let’s not continue to sample them in new environments.

roha 25 May 2026 20:39 UTC
7 points
0
in reply to: Karl Krueger’s comment on: Karl Krueger’s Shortform
“em”

roha 20 May 2026 3:02 UTC
1 point
0
on: Implications Of Predicting The Next Token
Not sure this is a coherent thought since I’m sleep deprived, but I’ll share it anyways before going to bed: Rather than predicting what the phantom’s planning module would do, is it be cheaper to reuse your own planning module for that prediction (assuming it’s already closely enough converged towards the phantom’s one)? For predicting other agents, you’d probably build a model of their planning module, but for predicting an agent similar enough to yourself, you might be able to reuse machinery rather than build yet another model?

roha 17 May 2026 19:05 UTC
2 points
0
on: Convergent Abstraction Hypothesis
This line of thinking seems very plausible and important to me. Can you share a bit about how you arrived at it and can you recommend related material? Also, do you have ideas about how it could be empirically tested and refined?

roha 17 May 2026 19:02 UTC
1 point
0
in reply to: RogerDearnaley’s comment on: Convergent Abstraction Hypothesis
If our abstraction of good is contingent on specifics of the social primate developmental context, should we expect the abstraction of good in LLMs to be substantially different? If so, how could we find it out before handing over our fate to them? Is this the only abstraction where divergence would be a problem?

roha 17 May 2026 16:54 UTC
1 point
0
in reply to: wunan’s comment on: How to Reason about Your Health Issues
I think the plausibility might differ between different kinds of systems. For example: What if you replace car with computer operating system, and add that it has been hardened for a billion years by an optimization process and as one result is full of error-correcting/self-repair mechanisms? Does that change how plausible it is that observed issues mostly have external causes rather than the system breaking under its own normal operation?

roha 17 May 2026 16:29 UTC
2 points
1
in reply to: philip_b’s comment on: How to Reason about Your Health Issues
Replace always with in the majority of cases and the idea seems fine. A complication is also that genetic problems we currently understand are mostly monogenic ones, while in the territoty we should expect a lot of polygenic issues that we can’t put on the map yet.

roha 17 May 2026 16:24 UTC
4 points
1
in reply to: Ninety-Three’s comment on: How to Reason about Your Health Issues
I agree that the system can break by itself without specific external cause, but I also think from my own observations that current medical practice is extremely often accepting too shallow explanations and applying far suboptimal treatments because of it. I like the look-until-the-cause-is-external heuristic, though it may be unachievable in many cases (e.g. polygenic diseases are poorly understood) and occasionally wrong when the cause is actually a random degradation inside the body.

roha 17 May 2026 2:02 UTC
1 point
0
in reply to: jmh’s comment on: Models finding software vulnerabilities is not the primary source of cybersecurity risk
I’m not familiar with it. I’d guess that a formally verified kernel would be a solid first step towards a secure operating system that even successor models of Mythos won’t be able to attack (sans hardware vulnerabilities that can be exploited by software and can’t be captured by a formal specification).

roha 17 May 2026 1:49 UTC
1 point
0
in reply to: RHollerith’s comment on: Models finding software vulnerabilities is not the primary source of cybersecurity risk
My thought process on securing hardware: If SOTA models can find obscure vulnerabilites in software as well as attack strategies that exploit one or several of them, I assume mankind can not be far from having models that are able to discover novel hardware problems (e.g. something like GPUHammer) and utilize them, though the feedback loop for experimentation might be much trickier to be set up than in the software case. If some of these new hardware flaws can’t be fixed by a firmware update or disabling problematic functionality on critical infrastructure, then physical devices will need to be replaced, which in my model of the world should happen at a much slower pace than the writing and distribution of software patches. If defenders have an advantage by getting earlier model access, it could be negated if downstream fixes can’t arrive fast enough to outpace the attackers.

roha 14 May 2026 16:32 UTC
20 points
14
in reply to: RHollerith’s comment on: Models finding software vulnerabilities is not the primary source of cybersecurity risk
I agree that software vulnerabilities are not a law of nature but essentially a skill and resource issue. If mankind manages with the help of AI to create operating systems and applications without any exploitable bugs, which is at least a conceptual possibility, there’s still the hardware layer and the social layer that can be targeted. I think hardware can in principle be fixed as well, though at a slower pace that might give attackers a relevant advantage. I don’t think human users can possibly be fixed. So point 2 and 3 of OP look to me like permanent issues we didn’t have before and won’t get rid of, i.e. an irreversible change of the game state. I suppose the larger issues will come in other fields though, where hardening potential is equally or more limited and potential damage is much larger, e.g. in biosecurity and autonomous weapon systems.

roha 11 May 2026 16:26 UTC
2 points
0
on: Anthropic’s strange fixation on hyperstition
What would programming look like if writing tests could increase the chance of a bug appearing in the code, not just the chance to discover an existing bug? I guess it would depend on the precise mechanism and one would try to understand the linkage and decouple the two activities rather than attempting to minimize problems by getting rid of tests.

roha 11 May 2026 15:31 UTC
3 points
0
in reply to: StanislavKrym’s comment on: StanislavKrym’s Shortform
Related: How much ahead are capabilities of unpublished or undisclosed models? I’d like to read estimates based on extrapolation from past observations. Is anybody aware of such?

roha 11 May 2026 15:29 UTC
1 point
0
in reply to: Nico Hillbrand’s comment on: Nico Hillbrand’s Shortform
My intuition is that 3.3 gaining more traction would be good for the world, because to me a) that seems most realistic based on the evidence I’ve seen so far and how I interpret it, and b) least problematic in case it’s wrong and we live in a world where alignment isn’t hard and organisations act competently. What reasons make your intuition point towards 3.2?