Roman Leventov comments on Epistemological Vigilance for Alignment

Roman Leventov 4 Sep 2022 12:30 UTC
1 point
0
The list of assumptions seems hung up in the air, so it’s hard to perceive it. Who takes (will take? should take?) these assumptions: AGI developers, alignment researchers, AGIs, or the people with whom AGI presumably should align? Are these assumptions behind a specific theory or framework of alignment (or of certain people), or are these sort of “shared”, or “considered common sense” assumptions, that you think most researchers in the field have? Are these assumptions behind a certain conclusion or an estimate of the probability of people’s survival (low, uncertain, high)?
Ok… Upon reading the whole post, I understand that this is the list of “[standard, unchecked, unconscious] assumptions in science”, which don’t apply in the field of AI Safety. The biggest confusion is due to the fact that the list is immediately preceded by the words “here is my current list”, which gives a strong sense that these are your assumptions.
I think it would be much clearer for the readers, and also has a better chance of sticking in the LW/alignment slang, if the list was called “epistemic/research complications/challenges [for alignment]”, and the titles of the sections were inverted: boundedness → unboundedness, direct access → lack of direct access, etc. I think it’s much more natural to think about these things in this way, and you yourself sometimes slip into this frame in the text, calling these “epistemic problems” rather than “assumptions”.
Unclear what you mean by “Newtonian assumption”. If you mean a sort of epistemological method, then I’m familiar with this term as so-called “Newtonian-Cartesian thinking”, which amounts to (classical) rationalism, reductionism, belief that there is the best (optimal) decision, solution, theory, and explanation of events. But this is not quite what you are talking about in the respective section (rather, you talk about reductionism, for instance, in the preceding section).
Rather, in the section about the Newtonian assumption, you seem to try to point to the agency, self-interest, and intentionality of AIs. There is an indirect relation to Newton (he probably thought there is just one agent in the universe: God?). However, the way this section is written makes it hard to infer what you tried to point to in it.