My impression is that basically no one knows how reasoning works, so people either make vague statements (I don’t know what shard theory is supposed to be but when I’ve looked briefly at it it’s either vague or obvious), or retreat to functional descriptions like “the AI follows a policy that acheives high reward” or “the AI is efficient relative to humans” or “the AI pumps outcomes” (see e.g. here: https://www.lesswrong.com/posts/7im8at9PmhbT4JHsW/ngo-and-yudkowsky-on-alignment-difficulty?commentId=5LsHYuXzyKuK3Fbtv ).
My impression is that basically no one knows how reasoning works, so people either make vague statements (I don’t know what shard theory is supposed to be but when I’ve looked briefly at it it’s either vague or obvious), or retreat to functional descriptions like “the AI follows a policy that acheives high reward” or “the AI is efficient relative to humans” or “the AI pumps outcomes” (see e.g. here: https://www.lesswrong.com/posts/7im8at9PmhbT4JHsW/ngo-and-yudkowsky-on-alignment-difficulty?commentId=5LsHYuXzyKuK3Fbtv ).