Speedrun ruiner research idea

Edit Apr 14: To be perfectly clear, this is another cheap thing you can add to your monitoring/​control system; this is not a panacea or deep insight folks. Just a Good Thing You Can Do™.

  • Central claim: If you can make a tool to prevent players from glitching games *in the general case*, then it will probably also work pretty well for RL with (non-superintelligent) advanced AI systems.

    • Alternative title: RL reward+environment autorobustifier

  • Problem addressed: every RL thing ever trained found glitches/​edge-cases in the reward function or the game/​physics-sim/​etc and exploited those until the glitches were manually patched

    • Months ago I saw a tweet from someone at OpenAI saying, yes, of course this happens with RLHF as well. (I can’t find it. Anyone have it bookmarked?

  • Obviously analogous ‘problem’: Most games get speedrun into oblivion by gamers.

Portal: How To Get Outside Without Cheats (360)
  • Idea: Make a software system that can automatically detect glitchy behavior in the RAM of **any** game (like a cheat engine in reverse) and thereby ruin the game’s speedrunability.

    • You can imagine your system gets a score from a human on a given game:

      • Game is unplayable:
        score := -1

      • Blocks glitch:
        score += 10 * [importance of that glitch]

      • Blocks unusually clever but non-glitchy behavior:
        score -=5 * [in-game benefit of that behavior]

      • Game is laggy:[1]
        score := score * [proportion of frames dropped]

      • Tool requires non-glitchy runs on a game as training data:
        score -= 2 * [human hours required to make non-glitchy runs]
        / [human hours required to discover the glitch]

  • Further defense of the analogy between general anti-speedrun tool and general RL reward+environment robustifier:

    • Speedrunners are smart as hell

    • Both have similar fuzzy boundaries that are hard to formalize:
      ‘player played game very well’ vs ‘player broke the game and didn’t play it’
      is like
      ’agent did the task very well’ vs ‘agent broke our sim and did not learn to do what we need it to do’

      • In other words, you don’t want to punish talented-but-fair players.

    • Both must run tolerably fast (can’t afford to kill the AI devs’ research iteration speed or increase training costs much)

    • Both must be ‘cheap enough’ to develop & maintain

  • Breakdown of analogy: I think such a tool could work well through GPT alphazero 5, but probably not GodAI6

(Also if random reader wants to fund this idea, I don’t have plans for May-July yet.)

metadata = {
  "effort: "just thought of this 20 minutes ago",
  "seriousness": "total",
  "checked if someone already did/said this": false,
  "confidence that": {
    "idea is worth doing at all": "80%",
    "one can successfully build a general anti-speedrun thing": "25%",
    "tools/methods would transfer well to modern AI RL training": "50%"
  }
}
  1. ^

    Note that “laggy” is indeed the correct/​useful notion, not eg “average CPU utilization increase” because “lagginess” conveniently bundles key performance issues in both the game-playing and RL-training case: loading time between levels/​tasks is OK; more frequent & important actions being slower is very bad; turn-based things can be extremely slow as long as they’re faster than the agent/​player; etc.