Archimedes comments on Contra Yudkowsky on Doom from Foom #2

Archimedes 30 Apr 2023 5:20 UTC
1 point
0
791556 is nowhere near the strongest network available. It’s packaged with lc0 as a nice small net. The BT2 net currently playing at tcec-chess.com is several hundreds of Elo stronger than T79 and likely close to superhuman level, depending on the time control. It’s not the very latest and greatest, but it is publicly available for download and should work with the 0.30.0-rc1 pre-release version of lc0 that supports the newer transformer architecture if you want to try it yourself. If you only want completely “official” nets, at least grab one of the latest networks from the main T80 run.
I’m not confident that BT2 is strictly superhuman using pure policy but I’m pretty sure it’s at least close. LazyBot is a Lichess bot that plays pure policy but uses a T80 net that is likely at least 100 Elo weaker than BT2.
- GoteNoSente 30 Apr 2023 12:14 UTC
  1 point
  0
  Parent
  Thanks for the information. I’ll try out BT2. Against LazyBot I was just then able to get a draw in a blitz game with 3 seconds increment, which I don’t think I could do within a few tries against an opponent of, say, low grandmaster strength (with low grandmaster strength being quite far way away from superhuman still). Since pure policy does not improve with thinking time, I think my chances would be much better at longer time controls. Certainly its lichess rating at slow time controls suggests that T80 is not more than master strength when its human opponents have more than 15 minutes for the whole game.
  
  Self-play elo vastly exaggerates playing strength differences between different networks, so I would not expect a BT2 vs T80 difference of 100 elo points to translate to close to 100 elo playing strength difference against humans.
  - Archimedes 30 Apr 2023 17:04 UTC
    1 point
    0
    Parent
    Yes, clearly the less time the human has, the better Leela will do relatively. One thing to note though is that Lichess Elo isn’t completely comparable across different time controls. If you look at the player leaderboard, you can see that the top scores for bullet are ~600 greater than for classical, so scores need to be interpreted in context.
    
    Self-Elo inflation is a fair point to bring up and I don’t have information on how well it translates.