Kurt H. Pieper

Karma: 24

Kurt H. Pieper 23 Jun 2026 4:23 UTC
1 point
0
in reply to: MichaelDickens’s comment on: MichaelDickens’s Shortform
- Naively, I would say every impressive capability you observe without a sharp left turn is weak evidence against it. At this point, both hypotheses explain the world about equally well, whereas the SLT hypothesis is more complex.
  - Arguably, eval awareness is actually a pretty mundane LLM ability, because putting text into a milieu is something LLMs are superhuman at already. I.e., “ah, clearly a Wikipedia article / paper / Reddit post” entails a capability like “ah, clearly something made by these METR people.”
- I’d say a sharp left turn is more likely to happen through something like a meta-skill, and less likely to happen by some sort of scheming.
  - I think the plot of intelligence vs. years since LUCA is still one of the strongest arguments here, and something like language or whatever else happened to the hominids is strong evidence for the existence of such a meta-skill.
  - Current efforts put more selection pressure towards “produce impressive-looking results” vs. “appear aligned to humans” than was assumed in optimistic “AI in a box” scenarios.^[1] I.e. for lying to emerge, there must be a strong cost for speaking the truth in the first place.
1. ^
  A careful evaluation of seed AI in a sandbox environment, showing that it
  is behaving cooperatively and showing good judgment. After some further
  adjustments, the test results are as good as they could be. It is a green light
  for the final step . . .
  From page 118 of “Superintelligence”. I don’t think e.g. “test results are as good as they could be” describes the METR Frontier Report, and yet we’ve let the models in questions out of their box already. (In general I think the whole page still reads as very prescient)

Kurt H. Pieper 22 Jun 2026 10:29 UTC
2 points
0
in reply to: Buck’s comment on: Buck’s Shortform
I’m not sure I understand the difference, in my model, the paperclip minimizer would suffer from the same pathology as Asimov’s First Law, i.e. it would, even in the best case, have to gain power to prevent anyone from making paperclips. Possibly it’s worse than Asimov’s approach, since killing all humans is probably easier here than building a super-nanny-state.
Even worse, in the limit it would have to transform all matter such that a paperclip cannot form by chance

Kurt H. Pieper 19 May 2026 12:26 UTC
1 point
0
in reply to: loops’s comment on: Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
Why wouldn’t that just be something an early-layer attention head in the AR learns?

Kurt H. Pieper 3 May 2026 18:14 UTC
2 points
0
on: Sanity-checking “Incompressible Knowledge Probes”
Possibly the method will underestimate parameter count as time goes on. I don’t expect it to be economically valuable to pretrain on the very long tails of knowledge, as opposed to letting more bits flow in from synthetic data / RLVR. Though I’m surprised as to why this hasn’t already happened.

Kurt H. Pieper 11 Apr 2026 11:19 UTC
16 points
9
in reply to: DW11’s comment on: Dario probably doesn’t believe in superintelligence
I’m sure this has been discussed elsewhere ad nauseam, but this view always struck me as extremely overconfident: We have no idea what a superintelligence could discover using today’s data or experiment throughput capacity, and “close to human level” seems a priori unlikely.

Kurt H. Pieper 11 Apr 2026 10:32 UTC
5 points
3
on: Dario probably doesn’t believe in superintelligence
One possibility suggested by this essay is that Dario does not really believe in superintelligence (the hypothesis we are currently examining). Another is that he does, but has chosen to dissemble for strategic purposes. While I don’t think Dario is above communicating strategically, I do in fact think this is roughly his mainline worldview, and it follows pretty clearly from his beliefs in 2017. Maybe there are other possibilities apart from those two, though I haven’t figured out what they might be.
You have possibly dismissed the “strategic purposes” hypothesis prematurely here.
Saying you want to build “superintelligence” may be really bad for political capital. To me it seems entirely plausible that he has “self-censored” his beliefs around superintelligence since 2017.
Dario spells this out:
The result often ends up reading like a fantasy for a narrow subculture, while being off-putting to most people.
As far as I know^[1], Hassabis has also not explicitly said he aims for superintelligence while being CEO of DeepMind, despite being an early singularitarian.
1. ^
  This is mainly based on asking Gemini 3 Pro, though.

Kurt H. Pieper 9 Apr 2026 17:19 UTC
1 point
0
in reply to: aphyer’s comment on: We can prevent progress! Conceptual clarity, and inspiration from the FDA
To me, such costly and invasive regulations seem a priori unlikely given the nature of the issue and our epistemics thereof. Can you expand on that?

Kurt H. Pieper 8 Apr 2026 11:51 UTC
2 points
−11
on: We can prevent progress! Conceptual clarity, and inspiration from the FDA
I agree that we can stop or delay the development of some technology. However, AI does not appear to be one of them, in my view, for several reasons:
(i). The x-risk has little to no political salience, there is no such thing as a “scientific consensus”^[1] yet, and you can’t point to a concrete instance of the problem yet that will convince the uninitiated.^[2]
(ii). Every policy that stops superintelligence from being developed will likely come with massive, concrete economic costs.
(iii). Fast global coordination is needed.
No example in your list has all of these, for instance:
- 1. Doesn’t have (i): The Thalidomide Disaster
- 2. Doesn’t have (i): Meltdowns, “big fireball bad” again.
- 4. & 5: Don’t have (ii): No comparable concrete economic costs
- 9. Doesn’t have any: No global coordination, concrete problem (“people are spying on you”), no comparable economic costs.
1. ^
  You can point to the CAIS statement or similar things, but this is not in the same category as e.g. the consensus around climate change.
2. ^
  For example, you don’t need to understand nuclear physics to understand “big fireball bad”, but you need more theory to be scared about METR reward hacking results.

Kurt H. Pieper 5 Apr 2026 20:57 UTC
1 point
0
on: Steering Might Stop Working Soon
I have almost no intuition whatsoever on why steering should be akin to OCD or schizophrenia, and I don’t think you offer sufficient rationale. I believe steering resistance is downstream of self-repair, which is known to emerge (i.e. dropout).

hanylo’s Shortform

Kurt H. Pieper1 Apr 2026 19:40 UTC

1 point

0 comments1 min readLW link