Note that (2) and (3) are formal tasks where the optimizer has access to the full set of rules. My understanding is that a lot of chips designed by optimizing a simulator have been pretty lousy in the real world, either being complete failures that only worked in the simulator because of bugs, or being real solutions, but not being usefully robust.
Actually for (2) the optimizer didn’t know the set of rules, it played the game as if it were normal player, controlling only keyboard. It has in fact started exploiting “bugs” of which its creator were unaware. (Eg. in Supermario, Mario can stomp enemies in mid air, from below, as long as in the moment of collision it is already falling)
Note that (2) and (3) are formal tasks where the optimizer has access to the full set of rules. My understanding is that a lot of chips designed by optimizing a simulator have been pretty lousy in the real world, either being complete failures that only worked in the simulator because of bugs, or being real solutions, but not being usefully robust.
Actually for (2) the optimizer didn’t know the set of rules, it played the game as if it were normal player, controlling only keyboard. It has in fact started exploiting “bugs” of which its creator were unaware. (Eg. in Supermario, Mario can stomp enemies in mid air, from below, as long as in the moment of collision it is already falling)
It knows the rules in the sense that the game is built into the optimizer. There’s a reason “time travel” is in the title of the paper.