faul_sname comments on Our Reality: A Simulation Run by a Paperclip Maximizer

faul_sname 29 Apr 2025 22:31 UTC
6 points
0
Let’s say I want to evaluate an algorithmic Texas Hold’em player against a field of algorithmic opponents.

The simplest approach I could take would be pure monte-carlo: run the strategy for 100 million hands and see how it does. This works, but wastes compute.

Alternatively, I could use the importance sampled approach:
1. Start with 100,000 pre-flop scenarios (i.e., all players have received their pocket cards, button position is fixed, no community cards yet)
2. Do 100 rollouts from each scenario
  - Most rollouts won’t be “interesting” (e.g., player has 7-2o UTG, player folds every time → EV = 0BB). If the simulation hits these states, you can say with high confidence how they’ll turn out, so additional rollouts won’t significantly change your EV estimate—you’ve effectively “locked in” your EV for the “boring” parts of the pre-flop possibility space.
3. Pick the 10,000 highest-variance preflop scenarios. These are cases where your player doesn’t always fold, and opponents didn’t all fold to your player’s raise). e.g.
  - AA in position where you 3-bet and everyone folds → consistently +3BB.
  - KQs facing a 3-bet → sometimes win big, sometimes lose big → super high variance
4. Run 1,000 rollouts for each of these high-variance scenarios.
5. Figure out which flops generate the highest variance over those 1,000 rollouts.
  - If your player completely missed the flop while another player connected and bet aggressively, your player will fold every time—low variance, predictable EV.
  - If your player flopped two pair or your opponent is semi-bluffing a draw, those are high-variance situations where additional rollouts provide valuable information.
6. Etc etc through each major decision point in the game
By skipping rollouts once I know what the outcome is likely to be, I can focus a lot more compute on the remaining scenarios and come to a much more precise estimate of EV (or whatever other metric I care about).
- Seth Herd 29 Apr 2025 22:49 UTC
  6 points
  0
  Parent
  Thanks, I get it now.
  
  Would this help with the simulation goal hypothesized in the OP? It’s asking how often different types of AGIs would be created. A lot of the variance is probably carried in what sort of species and civilization is making the AGI, but some of it is carried by specific twists that happen near the creation of AGI. Getting a president like Trump and having him survive the (fairly likely) assasination attempt(s) is one such impactful twist. So I guess sampling around those uncertain impactful twists would be valuable in refining the estimate of, say, how frequently a relatively wise and cautious species would create misaligned AGI due to bad twists and vice-versa.
  
  Hm.
  - faul_sname 29 Apr 2025 23:09 UTC
    6 points
    0
    Parent
    New EA cause area just dropped: Strategic variance reduction in timelines with high P(doom).
    
    BRB applying for funding
- James_Miller 8 May 2025 1:44 UTC
  2 points
  0
  Parent
  Great example!