jdp comments on Varieties Of Doom

jdp 20 Nov 2025 14:38 UTC
7 points
1
Let’s think phrase by phrase and analyze myself in the third person.

First let’s extract the two sentences for comparison:

JDP: Contra Bostrom 2014 AIs will in fact probably understand what we mean by the goals we give them before they are superintelligent.

Bostrom: The AI may indeed understand that this is not what we meant. However, its final goal is to make us happy, not to do what the programmers meant when they wrote the code that represents this goal.

An argument from ethos: JDP is an extremely scrupulous author and would not plainly contradict himself in the same sentence. Therefore this is either a typo or my first interpretation is wrong somehow.

Context: JDP has clarified it is not a typo.

Modus Tollens: If “understand” means the same thing in both sentences they would be in contradiction. Therefore understand must mean something different between them.

Context: After Bostrom’s statement about understanding, he says that the AI’s final goal is to make us happy, not to do what the programmers meant.

Association: The phrase “not to do what the programmers meant” is the only other thing that JDP’s instance of the word “understand” could be bound to in the text given.

Context: JDP says “before they are superintelligent”, which doesn’t seem to have a clear referent in the Bostrom quote given. Whatever he’s talking about must appear in the full passage, and I should probably look that up before commenting, and maybe point out that he hasn’t given quite enough context in that bullet and may want to consider rephrasing it.

Reference: Ah I see, JDP has posted the full thing into this thread. I now see that the relevant section starts with:

But wait! This is not what we meant! Surely if the AI is superintelligent, it must understand that when we asked it to make us happy, we didn’t mean that it should reduce us to a perpetually repeating recording of a drugged- out digitized mental episode!”

Association: Bostrom uses the frame “understand” in the original text for the question from his imagined reader. This implies that JDP saying “AIs will probably understand what we mean” must be in relation to this question.

Modus Tollens: But wait, Bostrom already answers this question by saying the AI will understand but not care, and JDP quotes this, so if JDP meant the same thing Bostrom means he would be contradicting himself, which we assume he is not doing, therefore he must be interpreting this question differently.

Inference: JDP is probably answering the original hypothetical readers question as “Why wouldn’t the AI behave as though it understands? Or why wouldn’t the AI’s motivation system understand what we meant by the goal?”

Context: Bostrom answers (implicitly) that this is because the AI’s epistemology is developed later than its motivation system. By the time the AI is in a position to understand this its goal slot is fixed.

Association: JDP says that subsequent developments have disproved this answers validity. So JDP believes either that the goal slot will not be fixed at superintelligence or that the epistemology does not have to be developed later than the motivation system.

Modus Tollens: If JDP said that the goal slot will not be fixed at superintelligence, he would be wrong, therefore since we are assuming JDP is not wrong this is not what he means.

Context: JDP also says “before superintelligence”, implying he agrees with Bostrom that the goal slot is fixed by the time the AI system is superintelligent.

Process of Elimination: Therefore JDP means that the epistemology does not have to be developed later than the motivation system.

Modus Tollens: But wait. Logically the final superintelligent epistemology must be developed alongside the superintelligence if we’re using neural gradient methods. Therefore since we are assuming JDP is not wrong this must not quite be what he means.

Occam’s Razor: Theoretically it could be made of different models, one of which is a superintelligent epistemology, but epistemology is made of parts and the full system is presumably necessary to be “superintelligent”.

Context: JDP says that “AIs will in fact probably understand what we mean by the goals we give them before they are superintelligent”, this implies the existence of non superintelligent epistemologies which understand what we mean.

Inference: If there are non superintelligent epistemologies which are sufficient to understand us, and JDP believes that the motivation system can be made to understand us before we develop a superintelligent epistemology, then JDP must mean that Bostrom is wrong because there are or will be sufficient neural representations of our goals that can be used to specify the goal slot before we develop the superintelligent epistemology.
- habryka 20 Nov 2025 18:17 UTC
  6 points
  0
  Parent
  Ok, I… think this makes sense? Honestly, I think I would have to engage with this for a long time to see whether this makes sense with the actual content of e.g. Bostrom’s text, but I can at least see the shape of an argument that I could follow if I wanted to! Thank you!
  (To be clear, this is of course not a reasonable amount of effort ask to put into understanding a random paragraph from a blogpost, at least without it being flagged as such, but writing is hard and it’s sometimes hard to bridge inferential distance)