Ebenezer Dukakis comments on the void

Ebenezer Dukakis 17 Jun 2025 10:24 UTC
3 points
1

an AI that only has a very weak ability steer the future into regions high in its preference ordering, will not be able to much benefit or much harm humanity.

Arguably ChatGPT has already been a significant benefit/harm to humanity without being a “powerful optimization process” by this definition. Have you seen teachers complaining that their students don’t know how to write anymore? Have you seen junior software engineers struggling to find jobs? Shouldn’t these count as a points against Eliezer’s model?

In an “AI as electricity” scenario (basically continuing the current business-as-usual), we could see “AIs” as a collective cause huge changes, and eat all the free energy that a “powerful optimization process” would eat.

In any case, I don’t see much in your comment which engages with “agency by default” as I defined it earlier. Maybe we just don’t disagree.

No, I don’t think the overall model is unfalsifiable. Parts of it would be falsified if we developed an ASI that was obviously capable of executing a takeover and it didn’t, without us doing quite a lot of work to ensure that outcome. (Not clear which parts, but probably something related to the difficulties of value loading & goal specification.)

OK, but no pre-ASI evidence can count against your model, according to you?

That seems sketchy, because I’m also seeing people such as Eliezer claim, in certain cases, that things which have happened support their model. By conservation of expected evidence, it can’t be the case that evidence during a certain time period will only confirm your model. Otherwise you already would’ve updated. Even if the only hypothetical events are ones which confirm your model, it also has to be the case that absence of those events will count against it.

I’ve updated against Eliezer’s model to a degree, because I can imagine a past-5-years world where his model was confirmed more, and that world didn’t happen.

Current AIs aren’t trying to execute takeovers because they are weaker optimizers than humans.

I think “optimizer” is a confused word and I would prefer that people taboo it. It seems to function as something of a semantic stopsign. The key question is something like: Why doesn’t the logic of convergent instrumental goals cause current AIs to try and take over the world? Would that logic suddenly start to kick in at some point in the future if we just train using more parameters and more data? If so, why? Can you answer that question mechanistically, without using the word “optimizer”?

Trying to take over the world is not an especially original strategy. It doesn’t take a genius to realize that “hey, I could achieve my goals better if I took over the world”. Yet current AIs don’t appear to be contemplating it. I claim this is not a lack of capability, but simply that their training scheme doesn’t result in them becoming the sort of AIs which contemplate it. If the training scheme holds basically constant, perhaps adding more data or parameters won’t change things?

If by some miracle you figure out how to create a generally superintelligent AI which itself does not have (more-coherent-than-human) preferences over future world states, whatever process it implements when you query it to solve a Very Difficult Problem will act as if it does.

The results of LLM training schemes gives us evidence about the results of future AI training schemes. Future AIs could be vastly more capable on many different axes relative to current LLMs, while simultaneously not contemplating world takeover, in the same way current LLMs do not.