“Value function” is a description of the system’s behavior and so the Orthogonality Thesis is about possible descriptions: if including “don’t break the world model” actually results in maximum utility, then your system is still optimizing your original value function. And it doesn’t work on low level either—you can just have separate value function, but only call value function with additions from your search function. Or just consider these additions as parts of search function.
“Value function” is a description of the system’s behavior and so the Orthogonality Thesis is about possible descriptions: if including “don’t break the world model” actually results in maximum utility, then your system is still optimizing your original value function. And it doesn’t work on low level either—you can just have separate value function, but only call value function with additions from your search function. Or just consider these additions as parts of search function.