Also, there’s the possibility that the relevant thing that is tested is not just “which of these designs work better?” but “is my best attempt good enough to trust or do I need to spend more resources/give up?”.
Also, there’s the possibility that the relevant thing that is tested is not just “which of these designs work better?” but “is my best attempt good enough to trust or do I need to spend more resources/give up?”.