Depends on what you mean by “valid”, I mean it certainly can be called an alignment benchmark, but I will not be confident in how good the benchmark is (as in how much the score in this benchmark will correlate to probability of alignment). The Minecraft context will obviously make the LLMs know it is inside a game, and we have seen LLMs being willing to do more deception/harm inside a game.
Depends on what you mean by “valid”, I mean it certainly can be called an alignment benchmark, but I will not be confident in how good the benchmark is (as in how much the score in this benchmark will correlate to probability of alignment). The Minecraft context will obviously make the LLMs know it is inside a game, and we have seen LLMs being willing to do more deception/harm inside a game.