Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions in the moral landscape that makes agents converge towards cooperation and similar things. I read this post recently and Leo Gao made an argument that concave agents generally don’t exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape.
Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I’m also uncertain about the evidence that it actually gives you.
Orthogonality thesis says that it’s invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It’s entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it’s in fact likely. But you do need to ask, that’s the point of orthogonality thesis, its narrow scope.
Yeah, I agree with what you just said; I should have been more careful with my phrasing.
Maybe something like: “The naive version of the orthogonality thesis where we assume that AIs can’t converge towards human values is assumed to be true too often”
Compared to other people on this site this is a part of my alignment optimism. I think that there are Natural abstractions in the moral landscape that makes agents converge towards cooperation and similar things. I read this post recently and Leo Gao made an argument that concave agents generally don’t exist because since they stop existing. I think that there are pressures that conform agents to part of the value landscape.
Like I agree that the orthogonality thesis is presumed to be true way too often. It is more like an argument that it may not happen by default but I’m also uncertain about the evidence that it actually gives you.
Orthogonality thesis says that it’s invalid to conclude benevolence from the premise of powerful optimization, it gestures at counterexamples. It’s entirely compatible with benevolence being very likely in practice. You then might want to separately ask yourself if it’s in fact likely. But you do need to ask, that’s the point of orthogonality thesis, its narrow scope.
Could you help me understand how is it possible? Why an intelligent agent should care about humans instead of defending against unknown threats?
Yeah, I agree with what you just said; I should have been more careful with my phrasing.
Maybe something like: “The naive version of the orthogonality thesis where we assume that AIs can’t converge towards human values is assumed to be true too often”