polytope

Karma: 163

polytope 24 Aug 2025 19:21 UTC
5 points
0
in reply to: gwern’s comment on: Shorter Tokens Are More Likely
It may depend on the RL algorithm, but I think would not expect most RL to have this issue to first order if the RL algorithm is producing its rollouts by sampling from the full untruncated distribution at temperature 1.
The issue observed by the OP is a consequence of the fact that typically if you are doing anything other than untruncated sampling at temperature 1, then your sampling is not invariant between, e.g. “choose one of three options: a, b, or c” and “choose one of two options: a, or (choose one of two options: b or c)”.

However many typical on-policy RL algorithms fundamentally derive from sampling/approximation of theorems/algorithms where running one step of the theoretical idealized policy update looks more like:

”Consider the space of possible complete output sequences S, and consider sum_{s in S} P(s) Reward(s). Update model parameters one step in the direction that most steeply overall increases this quantity”.
By itself, this idealized update is invariant to tokenization, because it’s expressed only in terms of complete outputs. Tokenization does come in insofar as it affects the gradient steepness of the policy in different directions of possible generalization and what parts of the space are explored and on which the approximated/sampled update occurs, etc.
Note that the typical mechanism by which RL tends towards entropy decrease and/or mode collapse is also well-explained by the above and does not need any involvement from tokenization. Indeed, consider just applying the above idealized update repeatedly. The model will continue sharpening to try push ever more of the probability mass on to only the sequences s for which Reward(s) is maximal or near-maximal, and push the probability of every other completed sequence to zero. If your reward function (from RLHF or whatever) has any preference for outputs of a given length or style, even if slight, the policy eventually may collapse arbitrarily much to only that part of the distribution that meets that preference.

In some RL algorithms there is, additionally, a sort of Polya’s-urn like tendency (https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model) where among sequences that give similar reward, the particular ones sampled will become consistently more (or less) likely, but I believe that training on advantage rather than raw reward tends to also mitigate or remove this bias to first order as well, although there can still be a random walk-like behavior (just now one of lesser magnitude than before and that can go in either direction).

In any case these and other numerous issues of RL I would tend see as distinct mechanisms from the bias that results in overweighting shorter or more likely tokens when sampling at temperature less than 1, particularly as the latter is a unsoundness/lack-of-invariance that is inherent in the functional form of sampling at temperature less than 1, whereas many of the issues of RL more arise out of e.g. the variance of sampling and approximations, unwanted generalization, imperfect rewards, e.g. rather than being inherently unsound in the functional form itself.

polytope 10 Aug 2025 16:05 UTC
5 points
0
in reply to: faul_sname’s comment on: What would a human pretending to be an AI say?
It’s interesting to note the variation in “personalities” and apparent expression of different emotions despite identical or very similar circumstances.
Pretraining gives models that predict every different kind of text on the internet, and so are very much simulators that learn to instantiate every kind of persona or text-generating process in that distribution, rather than being a single consistent agent. Subsequent RLHF and other training presumably vastly concentrates the distribution of personas and processes instantiated by the model on to a particular narrow cloud of personas that self-identifies as an AI with a particular name, has certain capabilities and quirks depending on that training, has certain claimed self-knowledge of capabilities (but where there isn’t actually very strong of a force tying the claimed self-knowledge to the actual capabilities), etc. But even narrowed, it’s interesting to still see significant variation within the remaining distribution of personas that gets sampled each new conversation, depending on the context.

polytope 17 Jul 2025 8:14 UTC
3 points
2
on: What was so great about Move 37?
I agree with DAL that “move 37” among the lesswrong-ish social circle has maybe become a handle for a concept where the move itself isn’t the best exemplar of that concept in reality, although I think it’s not a terrible exemplar either.
It was surprising to pro commentators at the time. People tended to conceptualize bots as being brute-force engines with human-engineered heuristics that just calculate them out to a massive degree (because historically that’s what they were in Chess), rather than as self-learning entities that excel in sensing vibes and intuition and holistic judgment. As a Go player, it looks to me like the kind of move that you never find just by brute force because there aren’t any critical tactics related to it to solve. The plausible variations and resulting shapes each variation produces are obvious, so it’s a move you choose only if you just have the intuition that you feel good having those resulting shapes being on the board in the long-term. KataGo’s raw policy prior for a recent net puts ~25% mass on the move, so it’s an “intuitive” option for the neural net too, not one discovered by deep search.
On the side of the move not being *too* remarkable, in the eyes of modern stronger bots the evaluation of the position doesn’t change much through that move or the 2-3 moves so it’s not like the game swung on that move. Lee Sedol’s response was also indeed fine, and both players do have other ways to play that are ~equally good, so the move also is not a unique good/best move. And there have since been surprises and things that have had a much bigger impact on pro human play since strong Go bots started becoming available.
Elaborating on the “self-learning entities that excel in vibes and intuition and holistic judgment”—modern Go bots relative to humans are fantastically good at judging and feeling out the best moves when the position is “smooth”, i.e. there are lots of plausible moves with tradeoffs that range through a continuum of goodness with no overly sharp tactics. But they are weak at calculating deep sharp tactics and solving variations that need to be proven precisely (still good and on-average-better than human, but it’s their weakest point). It’s still the case to this day that human pros can occasionally outcalculate the top bots in sharp/swingy tactical lines, while it’s unheard of for a human to outplay the bots through having better judgment in accumulating long-term advantages and making incremental good trades over the course of the game.
Bots excel at adapting their play extremely flexibly given subtle changes to the overall position, so commonly you get the bots suggesting interesting moves that are ever so slightly better on average, that human pro players might not consider. A lot of such moves also rate decently in the raw policy prior, so the neural nets are proposing many of these moves “on instinct” generalizing from their enormous volume of self-play learning, with the search serving after-the-fact to filter away the (also frequent) instances where the initial instinct is wrong and leads to a tactical blunder.
So, specific answers:

> Does understanding Move 37 require the use of extensive brute force search?
When you take into account the fact that AlphaGo had extensive search at its disposal, does that make the creativity of Move 37 significantly less impressive?
No, and brute force isn’t the practically relevant factor here so I’d question the premise. The variations and possible results that the move leads to aren’t too complicated, so the challenge is in the intuitive judgment call of whether those results are will be good over the next 50-100 moves of the game given the global situation, which I expect is beyond anyone’s ability to do solely via brute force (including bots). Pros at the time didn’t have the intuition that this kind of exchange in this kind of position could be good, so it was surprising. To the degree that modern pros could have a different intuition now, it would tend to be due to things like having shaped their subconscious intuition based on feedback and practice with modern bots and modern human post-AI playing styles, rather than mostly via conscious or verbalizable reasons.

> Is Move 37 categorically different from other surprising moves played by human Go experts?

Not particularly. Bots are superhuman in the kind of intuition that backs such moves, but I’d say it’s on a continuum. A top pro player might similarly find interesting situation-specific moves backed by intuition that most strong amateur players would not consider or have in those positions.
> I noticed that Lee Sedol’s Wikipedia page mentions a notable game in which he uses a “broken ladder,” which is “associated with beginner play”—maybe it’s not so uncommon for a professional Go player to do something unconventional every so often.
Given an expert explanation of Move 37, what level of Go expertise would be required to fully understand it, and how long would it take?
What if you had to figure it out without an explanation, just by studying the game?

Because it boils down to fuzzy overall intuition of what positions you prefer over others, it’s probably not the kind of move that can be verbally explained in any practical way in the first place. (It would be hard to give an explanation that’s “real” as opposed to merely curiosity-stopping or otherwise unuseful).
> To what extent have human players been able to learn novel strategies from AI in Go or chess?

The popularity of various opening patterns (“joseki”) has changed a lot. The 60-0 AlphaGo Master series featured many games that as far as bots were concerned were already very bad for the human by the time the opening was done, and I think that would not be as much the case if repeated today. But also I think that change is not so important. Small opening advantages are impactful at the level of top bots but for humans the variance in the rest of the game is large and makes that small difference matter much less. I’d guess the more important thing is the ability to use the bots as rapid and consistent feedback, i.e. just general practice and correcting one’s mistakes, rather than any big strategic change. This is the boring answer perhaps, because it’s also how bots have long been used in chess (but minus the the part about preparing exact opponent-specific opening lines, because Go’s opening is usually too open-ended to prepare specific variations and traps).

(Background: I’m the main developer of KataGo and have accumulated a lot of time looking at bot analysis of games and am a mid-amateur dan player, i.e. expert but not master, maybe would be around the top 15-30%ile of players if I were to attend the annual open US Go congress).

polytope 8 May 2025 12:41 UTC
5 points
−3
on: Chess—“Elo” of random play?
One thing that’s worth keeping in mind with exercises like this is that while you can do this in various ways and get some answers, the answers you get may depend nontrivially on how you construct the intermediate ladder of opponents.
For example, attempts to calibrate human and computer Elo ratings scales often do place top computers around the 3500ish area, and one of the other answers given has indicated by a particular ladder of intermediates that random would then be at 400-500 Elo given that. But there are also human players who are genuinely rated 400-500 Elo on servers whose Elo ratings are also approximately transitively self-consistent within that server. These players can still play Chess—e.g. know how pieces move, and can see captures and move pieces to execute those captures vastly better than chance, etc. I would not be surprised to see such a player consistently destroy a uniform random Chess player. Random play is really, really bad. So there’s a good chance here that we would see a significant nonlinearity/nontransitivity in Elo ratings, such that there isn’t any one consistent rating that we can assign to random play relative to Stockfish.

A good way of framing this conceptually is to say that Elo is NOT a fundamental truth about reality, rather it’s an imperfect model that we as humans invented that depending on the situation may work anywhere from poorly to okay to amazingly good at approximating an underlying reality.
In particular, the Elo model makes a very strong “linearity-like” assumption: that if A beats B with expected odds a:b, and B beats C with expected odds b:c, then A will beat C with expected odds of precisely a:c. (where draws are treated as a half point of each player beating the other, i.e. mathematically equivalent in expectation to if you were to resolve all draws by fair coin flip to determine the winner), and then given the way rating is defined from there, this linearity in odds then implies that the expected score between players follows precisely a sigmoid function f(x) = 1/(1+exp(-x)) of their rating difference up to constant scaling.

Almost any real situation will violate these assumptions at least a little (and even mathematically ideal artificial games will violate it, e.g. a game where players have a fixed mean and variance and compete by sampling from different gaussians to see whose number is higher will violate this assumption!). But in many cases of skill-based competition this works quite well, and there are various ways to justify and explain why this approximation does work pretty well when it does!

But even in games/domains where Elo does approximate realistic player pools amazingly well, it quite commonly stops doing as well at the extremes. For example, two common cases where this happens can include:
- When the pool of players (particularly bots) being compared are all relatively close to being optimal
- When the pool of players being compared cover an extremely wide range of “skill levels”.
The first case can happen when the near-optimal players have some persistent tendencies as to mistakes they still make as well as sharp preferences for various lines. Then you no longer have law-of-large-numbers effects (too few mistakes per game) and also no poisson-like smoothness in the arrival rate of mistakes (mistakes aren’t well-modeled as having an “arrival rate” if they’re sufficiently consistent to a line x bot combination) and the Elo model simply stops being a good model of reality. I’ve seen this empirically be the case on the 9x9 computer go server (9x9 “CGOS”) with a bot equilibrating at one point to be a couple hundred Elo lower than a bot no longer running that it should have been head-to-head equal or stronger than, due to different transitive opponents.
The second case, the one relevant here, can happen because there’s no particular reason to expect that a game will actually have tails that precisely match that of a sigmoid function f(x) = 1/(1+exp(-x)) in expected score in the extreme. Depending the actual tails between different pairs of players of increasingly large ratings differences, particularly whether it tends to be thinner or heavier than exp(-x) in given conditions, when you then try to measure large ratings differences via many transitive steps of intermediate opponents, you then will get different answers depending on the composition of those intermediate players and how many and how big of steps you take.

It’s not surprising when models that are just useful approximations of reality (i.e. “the map, not the territory”) start breaking down at extremes. It can be still worthwhile doing things like this to build intuition or even just for fun and see what numbers you get! While doing so, my personal tendency in such cases would still be to emphasize that at the extremes of questions like “what is the Elo of perfect play” or “what is the Elo of random play”, the numbers you do get can start to be answers that have a lot to do with one’s models and methodologies rather than answers that reflect an underlying reality accurately.

polytope 19 Feb 2025 16:48 UTC
2 points
1
on: Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Circling back to this with a thing I was thinking about—suppose one wanted to figure out just one additional degree of freedom to the Elo rating a player had (at a given point in time, if you also allow evolution over time) that would add as much improvement as possible. Almost certainly you need more dimensions than that to properly fit real idiosyncratic nonlinearities/nontransitivities (i.e. if you had a playing population with specific pairs of players that were especially strong/weak only against specific other players, or cycles of players where A beats B beats C beats A, etc), but if you just wanted to work out what the “second principal component” might be, what’s a plausible guess?
First, you can essentially reproduce the Elo model by rather than each player having a rating and the winning chance being a function of the difference between their ratings, instead you posit that each player has a rating and when they play a game, they each indepedently sample a random value from a fixed probability distribution centered around their own rating, and the player with the larger sample wins.
I think that you exactly reproduce the Elo model up to scaling if this distribution is a Gumbel distribution, because the difference of two Gumbels is apparently equivalent to a draw from a logistic distribution, and the CDF of the logistic distribution is precisely the sigmoid that the Elo model posits. But in practice, you should end up with almost the same thing if you choose any other reasonable distribution so long as it has the right heaviness of tail.
In particular, I’d expect having linearly-exponential tails is good rather than quadratically-exponential tails like the normal distribution has, because linearly-exponential tails tend to be desirable for real-world ratings models due to being much more outlier-resistant and in the real world you have issues like forfeits, sandbaggers, internet disconnection/timeouts, etc. (If you have a quadratically exponential tail, then a ratings model can put so low probability on an outlier that subject to seeing the outlier, the ratings model is forced to make a too-large update to accommodate it, this should be intuitive from a Bayesian perspective). Outliers and noise and the realities of real world ratings data I’d expect introduces far bigger variation in ratings quality anyways than any minor distribution-shape differences would.
So for example, you could also say each player draws from a logistic distribution, rather than only a Gumbel. The difference of two logistics is not quite a logistic distribution but up to rescaling it should be pretty close so this is nearly the Elo model again.
Anyways, with any reformulation like this, there is a very natural candidate now for a second dimension—that of the variance of the distribution that a player draws their sample from. Rather than each player drawing from a fixed distribution centered around their rating before seeing who has the higher value and wins, we now add a second parameter that allows the variance of that distribution to vary by player. So the ratings model now becomes able to express things like “this player is more variable in performance between games, or prone to blunders uncharacteristic of their skill level than this other player”. This parameter might also improve the rating system’s ability to “explain away” things like sandbagger players by assigning them a high variance, thereby reducing their distortionary impact on other players’ ratings even before manual intervention.

polytope 10 Feb 2025 21:28 UTC
12 points
11
on: Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
I might be misunderstanding, but it looks to me like your proposed extension is essentially just the Elo model with some degrees of freedom that don’t yet appear to matter?
The dot product has the property that <theta_A-theta_B,w> = <theta_A,w> - <theta_B,w>, so the only thing that matters is the <theta_P,w> for each player P, which is just a single scalar. So we are on a one-dimensional scale again where predictions are based on taking a sigmoid of the difference between a single scalar associated with each player.
As far as I can tell, the way that such a model could still be a nontrivial extension of Elo would be if you posited w could vary between games, whether randomly from some distribution or whether via additional parameters associated to players that influence what w is in the games they are involved in, or other things like that. But it seems you would need something like that, or else some source of nonlinearity, because if w is constant then every dimension orthogonal to that fixed w can never have any effect on predictions by the model.

polytope 2 Nov 2024 16:05 UTC
3 points
2
on: Set Theory Multiverse vs Mathematical Truth—Philosophical Discussion
I assume you’re familiar with the case of the parallel postulate in classical geometry as being independent of other axioms? Where that independence corresponds with the existence of spherical/hyperbolic geometries (i.e. actual models in which the axiom is false) versus normal flat Euclidean geometry (i.e. actual models in which it is true).

To me, this is a clear example of there being no such thing as an “objective” truth about the the validity of the parallel postulate—you are entirely free to assume either it or incompatible alternatives. You end up with equally valid theories, it’s just those theories are applicable to different models, and those models are each useful in different situations, so the only thing it comes down to is which models you happen to be wanting to use or explore or prove things about on a given day.
Similarly for the huge variety of different algebraic or topological structures (groups, ordered fields, manifolds, etc) - it is extremely common to have statements that are independent of the axioms, e.g. in a ring it is independent of the axioms whether multiplication is commutative or not. And both choices are valid. We have commutative rings, and we have noncommutative rings, and both are self-consistent mathematical structures that one might wish to study.
Loosely analogous to how one can write a compiler/interpreter for a programming language within other programming languages, some theories can easily simulate other theories. Set theories are particularly good and convenient for simulating other theories, but one can also simulate set theories within other seemingly more “primitive” theories (e.g. simulating it in theories of basic arithmetic via Godel numbering). This might be analogous to e.g. someone writing a C compiler in Brainfuck. Just like how it’s meaningless to talk about whether a programming language or a given sub-version or feature extension of a programming language is more “objectively true” than another, there are many who take the position that the same holds for different set theories.
When you say you’re “leaning towards a view that maintains objective mathematical truth” with respect to certain axioms, is there some fundamental principle by which you’re discriminating the axioms that you want to assign objective truth from axioms like the parallel postulate or the commutativity of rings, which obviously have no objective truth? Or do you think that even in these latter cases there is still an objective truth?

polytope 29 Oct 2024 20:05 UTC
14 points
2
on: AI #87: Staying in Character
This thread analyzes what is going on under the hood with the chess transformer. It is a stronger player than the Stockfish version it was distilling, at the cost of more compute but only by a fixed multiplier, it remains O(1).

I found this claim suspect because this basically is not a thing that happens in board games. In complex strategy board games like Chess, practical amounts of search on top of a good prior policy and/or eval function (which Stockfish has), almost always outperforms any pure forward pass policy model that doesn’t do explicit search, even when that pure policy model is quite large and extensively trained. With any reasonable settings, it’s very unlikely that the distillation of Stockfish into a pure policy model produces a better player than Stockfish.

I skimmed the paper (https://arxiv.org/pdf/2402.04494), and had trouble finding such a claim, and indeed it seems the original poster of that thread later retracted that claim as due to their own mistake in interpreting the data table of the paper. The post where they acknowledge the mistake is much less prominent than the original post, link here: https://x.com/sytelus/status/1848239379753717874 . The chess transformer remains quite a bit weaker than the Stockfish it tries to predict/imitate.

polytope 5 Jul 2024 11:48 UTC
1 point
0
on: OthelloGPT learned a bag of heuristics
Do you think a vision transformer trained on 2-dimensional images of the board state would also come up with a bag of heuristics or would it naturally learn a translation invariant algorithm taking advantage of the uniform way the architecture could process the board? (Let’s say that there are 64 1 pixel by 1 pixel patches, perfectly aligned with the 64 board locations of an 8x8 pixel image, to make it maximally “easy” for both the model and for interpretability work.)

And would it differ based on whether one used an explicit 2D positional embedding, or a learned embedding, or a 1D positional embedding that ordered the patches from top to bottom, right to left?
I know that of course giving a vision transformer the actual board state like this shortcircuits the cool part where OthelloGPT tries to learn its own representation of the board. But I’m wondering if even in this supposedly easy setting it still would end up imperfect with a tiny error rate and a bag-of-heuristics-like way of computing legal moves.

And brainstorming a bit here: a slightly more interesting setting that might not shortcircuit the cool part would be if the input to the vision transformer was a 3D “video” of the moves on the board. E.g. the input[t][x][y] is 1 if on turn t, a move was made at (x,y), and 0 otherwise. Self-attention would presumably be causally-masked on the t dimension but not on x and y. Would we get a bag of heuristics here in the computation of the board state and the legal moves from that state?

polytope 20 Jun 2024 20:13 UTC
21 points
4
in reply to: gwern’s comment on: Beyond the Board: Exploring AI Robustness Through Go
(KataGo dev here, I also provided a bit of feedback with the authors on an earlier draft.)
@gwern—The “atari” attack is still a cyclic group attack, and the ViT attack is also still a cyclic group attack. I suspect it’s not so meaningful to put much weight on the particular variant that one specific adversary happens to converge to.
This is because the space of “kinds” of different cyclic group fighting situations is combinatorically large and it’s sort of arbitrary what local minimum the adversary ends it because it doesn’t have much pressure to find more once it finds one that works. Even among just the things that are easily put into words without needing a diagram—how big is the cycle? Does the cyclic group have a big eye (>= 4 points behaves tactically distinctly) or a small eye (<=3 points), or no eye? Is the eye a two-headed-dragon-style eye, or not? Does it have more loose connections or is it solid? Is the group inside locally dead/unsettled/alive? Is the cycle group racing against an outside group for liberties or only making eyes of its own, or both? How many liberties do all the various groups each have? Are there ko-liberties? Are there approach-liberties? Is there a cycle inside the cycle? etc.
This is the same as how in Go the space of different capturing race situations in general is combinatorically large, with enough complexity that many situations are difficult even for pro players who have studied them for a lifetime.
The tricky bit here is that there seems to not be (enough) generalization between the exponentially large space of large group race situations in Go more broadly and the space of situations with cyclic groups. So whereas the situations in “normal Go” get decently explored by self-play, cyclic groups are rare in self-play so there isn’t enough data to learn them well, leaving tons of flaws, even for some cases humans consider “simple”. A reasonable mental model is that any particular adversary will probably find one or two of them somewhat arbitrarily, and then rapidly converge to exploit that, without discovering the numerous others.
The “gift” attack is distinct and very interesting. There isn’t a horizon effect involved, it’s just a straightforward 2-3 move blind spot of both the policy and value heads. Being only 2-3 moves this is why it also gets fixed more easily by search than cyclic groups. As for why it happens, as a bit of context I think these are true:
- In 99.9...% of positions, the flavor of “superko” rule doesn’t affect the value of a position or the correct move.
- The particular shape used by the gift adversary and similar shapes do occur with reasonable frequency in real games without the superko rule being relevant (due to different order of moves), in which case the “gift shape” actually is harmless rather than being a problem.
I’ve taken a look at the raw neural net outputs and it’s also clear that the neural net has no idea that the superko rule matters—predictions don’t vary as you change the superko rule in these positions. So my best guess is that that the neural net perhaps “overgeneralizes” and doesn’t easily learn that in this one specific shape with this specific order of moves, the superko rule, which almost never matters, suddenly does matter and flips the result.

polytope 14 Feb 2024 23:05 UTC
1 point
0
in reply to: gwern’s comment on: Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search
Apparently not a writeup (yet?), but there appears to be a twitter post here from LC0 with an comparison plot of accuracy on tactics puzzles: https://x.com/LeelaChessZero/status/1757502430495859103?s=20

polytope 22 Jan 2024 2:26 UTC
4 points
3
in reply to: Throwaway2367’s comment on: Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events
Yes, rather than resolving the surprise of “the exact sequence HHTHTTHTTH” by declaring that it shouldn’t be part of the set of events, I would prefer to resolve it via something like:
- It should be part of the set of events I’m allowed to consider just like any other subset of all 10-flip sequences.
- We do observe events (or outcomes that if constructed as singleton events) all the time that would we would have predicted to be exceedingly improbable (while they may be improbable individually, a union of them may not be).
- Observing some particular unlikely event like “the exact sequence HHTHTTHTTH occurs” should in fact raise my relative belief in any hypothesis by a large factor if that hypothesis would have uniquely predicted that to occur, as compared to others that would have made a far more non-specific prediction. (up to a factor of at most 2^10 unless the other hypothesis considered that sequence to be unlikelier than uniform)
- Even if all this is true, I still do not and should not feel surprised in such a case because I think surprise has more to do the amount by which something shifts the beliefs I have that my brain intuits to be important for various reasons. It has little to do with the likelihood of events I observe, other than how it affects those beliefs. I didn’t have any prior reason to assign any meaningful weight to hypotheses about the coin that would predict that exact sequence and no others, such that even after scaling them by a large factor, my overall beliefs about the coin and the distribution of likely future flips should remain very similar to before, therefore I feel little surprise.
- By contrast I might feel a little more surprise seeing “HHHHHHHHHH”. And again the reason is not really because of the likelihood or unlikelihood of that sequence, and it also has little to do with which sequences I’m being told I can define to be a mathematical event or not. Rather I think it’s closer to something like “this coin is biased heads” or “this coin always flips heads” are competing hypotheses to “this coin is fair” that while initially extremely unlikely would not be outlandish to consider, and if true it would affect my conception of the coin and predictions of its future flips. So this time the large relative boost would come closer to shifting my beliefs in a way that would impact how I think about the coin and make future predictions, therefore I feel more surprise.

polytope 11 Jan 2024 21:07 UTC
4 points
1
on: An Actually Intuitive Explanation of the Oberth Effect
Here’s my intuition-driving example/derivation.
Fix a reference frame and suppose you are on a frictionless surface standing next to a heavy box equal to your own mass, and you and the box always start at rest relative to one another. In every example, you will push the box leftward, adding 1 m/s leftward velocity to the box, and adding 1 m/s rightward velocity to yourself.
Let’s suppose we didn’t know what “kinetic energy” is, but let’s suppose such a concept exists, and that whatever it is, an object of your mass has 0 units of it when at rest, and it is a continuous monotonic function of the absolute value of that object’s velocity. Let’s also take as an assumption that when you perform such a push like the above, you are always adding precisely 1 unit of this thing called “kinetic energy” to you and the box combined.
Okay so suppose the box and you are at rest and you perform this push, and start moving at 1m/s left and right, respectively. You and the box started with 0 units of kinetic energy, and you added 1 unit total. Since you and the box have the same absolute value of velocity, your energies are equal, so you each must have gotten ¹⁄₂ of a unit. Great, therefore we derive 1 m/s is ¹⁄₂ unit of kinetic energy.
Now suppose you and the box start out at a velocity of 1m/s rightward, so you have ¹⁄₂ unit of energy each, for a total of 1 unit. You perform the same push, bringing the total kinetic energy to 2 units. The box ends up at 0 m/s, so it has 0 units of energy now. You end up going 2m/s rightward, with all the energy. Great, therefore we derive 2 m/s is 2 units of kinetic energy.
Now suppose you and the box start out at a velocity of 2m/s rightward, so you have 2 units of energy each, for a total of 4 units. You perform the same push, bringing the total kinetic energy to 5 units. The box ends up at 1 m/s, so it has ¹⁄₂ unit of energy now, since we derived earlier that 1m/s is ¹⁄₂ unit of energy. You end up going 3m/s rightward. So you must have the other 4.5 units of energy. Therefore we derive 3 m/s is 4.5 units of kinetic energy.
We can continue this indefinitely, without running into any inconsistencies or contradictions. This “kinetic energy” thing so far seems to be a self-consistent concept given these assumptions! In general, we derive that an object of our mass moving at velocity v is has a kinetic energy of ¹⁄₂ v^2 units.
And I hope this makes it clearer why kinetic energy has to behave quadratically. A quadratic function f is precisely the kind of function such the quantity f(x+c) + f(x-c) − 2f(x) is constant with respect to x. It’s the only function that satisfies the property that a fixed “amount of push” of the propellant you are carrying away from you always adds the same total energy into the combined system of you + propellant.
And it also gives some intuition for why you end up with more energy when you fire the propellant while moving faster. When you pushed the box while initially at 0 m/s, your kinetic energy went from 0 units to 0.5 units (+0.5), but when you pushed the box while initially at 1 m/s, your kinetic energy went from 0.5 units to 2 units (+1.5), and when you pushed the box while initially at 2 m/s, your kinetic energy went from 2 units to 4.5 units (+2.5) and in all cases you only added 1 unit of energy yourself. Where does the extra energy come from? From slowing down the box’s rightward motion, and/or from not speeding up the box to go leftward from rest.

polytope 7 Dec 2023 2:50 UTC
1 point
0
in reply to: JBlack’s comment on: On Trust
> lack of sufficient evidence.

Perhaps more specifically, evidence that is independent from the person that is to be trusted or not. Presumably when trusting someone else that something is true, often one does so due to believing that the other person is being honest and reliable enough such that that their word is sufficient evidence to then take some action. It’s just that there isn’t sufficient evidence without that person’s word.

polytope 24 Jul 2023 14:30 UTC
11 points
0
in reply to: gwern’s comment on: Even Superhuman Go AIs Have Surprising Failure Modes
I am also curious why the zero-shot transfer is so close to 0% but not 0%. Why do those agents differ so much, and what do the exploits for them look like?
The exploits for the other agents are pretty much the same exploit, they aren’t really different. From what I can tell as an experienced Go player watching the adversary and other human players use the exploit, the zero shot transfer is not so high because the adversarial policy overfits to memorize specific sequences that let you set up the cyclic pattern and learns to do so in a relatively non-robust way.
All the current neural-net-based Go bots share the same massive misevaluations in the same final positions. Where they differ is that they may have arbitrarily different preferences among almost equally winning moves, so during the long period that the adversary is in a game-theoretically-lost position, any different victim all the while still never realizing any danger, may nonetheless just so happen to choose different moves. If you consider a strategy A that might broadly minimize the number of plausible ways a general unsuspecting victim might mess up your plan by accident, and a strategy B that leaves more total ways open but those ways are not the ones that small set of victim networks you are trained to exploit would stumble into (because you’ve memorized their tendencies enough to know they won’t), the adversary is incentivized more towards B than A.
This even happens after the adversary “should” win. Even after it it finally reaches a position that is game-theoretically winning, it often blunders several times and plays moves that cause the game to be game-theoretically lost again, before eventually finally winning again. I.e. it seems overfit to the fact that the particular victim net is unlikely to take advantage of its mistakes, so it never learns that they are in fact mistakes. In zero-shot transfer against a different opponent this unnecessarily may give the opponent, who shares the same weakness but may just so happen to play in different ways, chances to stumble on a refutation and win again. Sometimes even without the victim even realizing that it was a refutation of anything and that they were in trouble in the first place.
I’ve noticed human exploiters play very differently than that. Once they achieve a game-theoretic-winning position they almost always close all avenues for counterplay and stop giving chances to the opponent that would work if the opponent were to suddenly become aware.
Prior to that point, when setting up the cycle from a game-theoretically lost position, most human players I’ve seen also play slightly differently too. Most human players are far less good at reliably using the exploit, because they haven’t practiced and memorized as much the ways to get any particular bot to not accidentally interfere with them as they do so. So the adversary does much better than them here. But as they learn to do better, they tend do so in ways that I think transfer better (i.e. from observation my feeling is they maintain a much stronger bias towards things like “strategy A” above).

polytope 7 Jul 2023 4:43 UTC
14 points
5
in reply to: Ege Erdil’s comment on: When do “brains beat brawn” in Chess? An experiment
(I’m the main KataGo dev/researcher)
Just some notes about KataGo—the degree to which KataGo has been trained to play well vs weaker players is relatively minor. The only notable thing KataGo does is in some self-play games to give up to an 8x advantage in how many playouts one side has over the other side, where each side knows this. (Also KataGo does initialize some games with handicap stones to make them in-distribution and/or adjust komi to make the game fair). So the strong side learns to prefer positions that elicit higher chance of mistakes by the weaker side, while the weak side learns to prefer simpler positions where shallower search doesn’t harm things as much.
This method is cute because it adds pressure to only learn “general high-level strategies” for exploiting a compute advantage, instead of memorizing specific exploits (which one might hypothesize to be less likely to generalize to arbitrary opponents). Any specific winning exploit learned by the stronger side that works too well will be learned by the weaker side (it’s the same neural net!) and subsequently will be avoided and stop working.
And it’s interesting that “play for positions that a compute-limited yourself might mess up more” correlates with “play for positions that a weaker human player might mess up in”.
But because this method doesn’t adapt to exploit any particular other opponent, and is entirely ignorant of a lot of tendencies of play shared widely across all humans, I would still say it’s pretty minor. I don’t have hard data, but from firsthand subjective observation I’m decently confident that top human amateurs or pros do a better job playing high-handicap games (> 6 stones) against players that more than that many ranks weaker than them than KataGo would, despite KataGo being stronger in “normal” gameplay. KataGo definitely plays too “honestly”, even with the above training method, and lacks knowledge of what weaker humans find hard.

If you really wanted to build a strong anti-human handicap game bot in Go, you’d absolutely start by learning to better model human play, using the millions of games available online.
(As for the direct gap with the very best pro players, without any specific anti-bot exploits, at tournament-like time controls I think it’s more like 2 stones rather than 3-4. I could believe 3-4 for some weaker pros, or if you used ultra-blitz time controls, since shorter time controls tend to favor bots over humans).

polytope 4 Mar 2023 1:27 UTC
LW: 5 AF: 4
0
AF
in reply to: DanielFilan’s comment on: Inside the mind of a superhuman Go model: How does Leela Zero read ladders?
There’s (a pair of) binary channels that indicate whether the acting player is receiving komi or paying it. (You can also think of this as a “player is black” versus “player is white” indicator, but interpreting it as komi indicators is equivalent and is the natural way you would extend Leela Zero to operate on different komi without having to make any changes to the architecture or input encoding).
In fact, you can set the channels to fractional values strictly between 0 and 1 to see what the model thinks of a board state given reduced komi or no-komi conditions. Leela Zero is not trained on any value other than the 0 or 1 endpoints corresponding to komi +7.5 or komi −7.5 for the acting player, so there is no guarantee that the behavior for fractional values is reasonable, but I recall people found that many of Leela Zero’s models do interpolate their output for the fractional values in a not totally unreasonable way!
If I recall right, it tended to be the smaller models that behaved well, whereas some of the later and larger models behaved totally nonsensically for fractional values. If I’m not mistaken about that being the case, then as a total guess perhaps that’s something to do with later and larger models having more degrees of freedom with which to fit/overfit arbitrarily to arbitrarily give rise to non-interpolating behavior in between, and/or having more extreme differences in activations at the end points that constrain the middle less and give it more room to wiggle and do bad non-monotone things.

polytope 1 Mar 2023 4:52 UTC
LW: 21 AF: 10
3
AF
on: Inside the mind of a superhuman Go model: How does Leela Zero read ladders?
This is very cool, thanks for this post.
Some remarks:
Playing at an intersection with no liberty is forbidden, unless the play results in capture
This is true, but the capture can be of your own stones. That is, Leela Zero is trained under Tromp-Taylor rules where self-capture is legal. So there isn’t any forbidding of moves due to just liberties. Single stone suicide is still illegal, but only by virtue of the fact that self-capture of a single stone would repeat the board position, but you can suicide multiple stones.
However, there is still of course a question of how liberty counting works. See https://github.com/leela-zero/leela-zero/issues/877 for some past discoveries of positions where Leela Zero is unable to determine when a large group is in atari or would become in atari after certain move(s). This suggests that the neural nets do not learn a general and correct algorithm, instead they learn a bunch of heuristics on top of heuristics that work almost all of the time but have enough combinatorically many nooks and crannies that are not all forced to be correct due to rarity of some of the cases.
Note that if you are going to investigate liberty counting, you should expect that the neural net likely counts a much more fuzzy and intricate concept than literal liberties. As an expert Go player I would expect it almost certainly primarily focuses on concepts that are much more tactically relevant, like “fighting liberties”, e.g. how many realistic moves the opponent would need to fill a group accounting for necessary approach moves, recaptures, etc. For example a group with literal 2 liberties but where one liberty was an eye and the other would be self-atari by the opponent unless they made a preparatory connection first would have 3 fighting liberties because actually capturing the group would take 3 moves under realistic play, not 2.
The fighting liberty count is also somewhat multidimensional, because it can vary depending on the particular tactical objective. You might have 4 fighting liberties against a particular group, but 6 against a different group because 2 of the liberties or approach moves required are only “effective” for one objective and not the other. Also purely integer values don’t suffice because fighting liberty count can depend on things like a ko.
The fact that this not-entirely-rigorously-defined concept of “fighting liberties” is what expert play in Go cares about (at least, expert human players) perhaps makes it also less surprising why a net might not implement a “correct” and “general” algorithm for liberty counting but might instead end up with a pile of hacks and heuristics that don’t always work.
I suspect will be easier to investigate the counting of eyes than liberties, and might suggest to focus on that first if you investigate further. The presence of an eye is much more discrete and less multidimensional than the fighting liberty count, and you only need to count up to 2, so there are fewer behavior patterns in the activations that will need to be distinguished by an analysis.
A particularly interesting application would be to understand Wang et al (2022)’s adversarial policies against KataGo. For example, one of the adversarial policies essentially amounts to confusing the model about the number of liberties that a large circular group has, and capturing it “without the model noticing”. This attack somewhat transfers to Leela Zero as well. Can we find a mechanism for liberty-counting that explains the success of this attack?
See https://www.lesswrong.com/posts/Es6cinTyuTq3YAcoK/there-are-probably-no-superhuman-go-ais-strong-human-players?commentId=gAEovdd5iGsfZ48H3 where I give a hypothesis for what the mechanism is, which I think is at a high-level more likely than not to be roughly what’s happening.
I also currently believe based some playing around with the nets a long time ago and seeing the same misevaluations across all 4 different independently trained AlphaZero-based agents I tested that the misevaluation probably transfers very well, if not the attack. I.e. if the attack doesn’t transfer perfectly, it’s probably to do with the happenstance of the preferences of the agent earlier—for example it’s probably easier to exploit agents that are trying to also sharply maximize score since they will tend to ignore your moves more and give you more moves in a row to do things—and not because any of these replications anticipate or evaluate the attack itself much better or worse.
However, here again I would strongly recommend investigating eyes, not liberties. All end-to-end observational experimentation with the models I’ve done suggests that exactly the same misevaluation is happening with respect to eyes. And the analysis of any miscounting of eyes on the predictions should be far crisper than for liberties.
In particular, if you overcount eyes by propagating a partial eye count multiple times around a cycle, you will predict a group is absolutely alive even when it isn’t alive, because for groups with 2,3,4,5,… eyes, the data is nearly absolutely consistent that such groups are alive independent of anything else around them.
By contrast, if you overcount liberties and think a group has a “very large” number of liberties, you might not always predict the group is alive. For example what if every opposing group has 2 eyes and is therefore immortally strong, whereas the large-liberty group is surrounded and has no eyes? The training data probably doesn’t so sharply constrain how different nets will generalize to rare “over-large” liberty counts because generally how liberty counts affect statuses is contextual rather than absolute like eye count. So one would expect that when a net mistakenly overcounts liberties, the effect could also be harder to analyze due to being contextual and not always consistent.
Right before the game ends, the value of the board should come down to which player has more territory. Is the model calculating each player’s territory, and if so, how?
I would say almost certainly yes, but possibly not “exactly” territory, maybe something like certainty-adjusted or variance-adjusted territory, with hacks for different kinds of sources of uncertainty. But yes, almost certainly something that correlates very well with territory.
I did some visualizations of activations of a smaller net way back in the past that shows something pretty much like this. https://github.com/lightvector/GoNN#global-pooled-properties-dec-2017
Although this architecture isn’t quite a plain resnet (in particular, this is the visualization of the conv layer output prior to a global pooling layer) , it shows that the neural net learned a concept very recognizable to a Go player as having to do with predicted ownership or territory. And it learned this concept without ever being trained to predict the game outcome or score! In this case, the relevant net was trained solely to predict the next move a human player played in a position.
The fact that this kind of concept can arise automatically from pure move prediction also gives some additional intuitive force for why the original AlphaGoZero work found such a huge improvement from using the same neural net to predict both value and policy. Pure policy prediction is already capable of automatically generating the internal feature you would need to get basic value prediction working.

polytope 22 Feb 2023 17:26 UTC
9 points
6
in reply to: dxu’s comment on: There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs
I think there are two “simple” abstractions possible here, where the amount of data to distinguish which one is right is minuscule under natural play in Go, and therefore can easily be overwhelmed by small constant factors in the “ease” of converging to either one due to inductive bias.
- Abstraction 1: the size of the set of all empty spaces adjacent to a group
- Abstraction 2: the sum of the number of empty spaces next to a given stone on the group, plus, recursively, this same sum for each of the neighboring stones of the same player to the north, south, east, west, where this recursion doesn’t “backtrack”, plus a penalty if a nearby empty space is bordered by the same group from multiple sides.
Abstraction 1 is mathematically more elegant, but abstraction 2 is algorithmically simpler to implement. If you modify abstraction 2′s algorithm to include a globally remembered “set” such that traversed spaces and stones are added to this set and are not re-traversed, you can make it equivalent to abstraction 1, but that requires additional intermediate memory during the computation and so is more complex.
Indeed, when programmers write iterative or recursive algorithms that operate on graphs, it’s easy and it’s usually less lines of code and less local state if you forget to include proper checks on prohibiting return to the global set of locations previously visited during that instance of that algorithm, and then your algorithm may work fine on trees but fails on cycles. The “less memory or state required” part, plus the fact that in Go groups with large cycles that would distinguish these two possibilites are rare, is what leads me to hypothesize right now (without explicit evidence or confirmation) that convergence to abstraction 2 is morally what’s going on here.

polytope 22 Feb 2023 16:20 UTC
6 points
0
in reply to: LawrenceC’s comment on: There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs
Keep in mind that the adversary was specifically trained against KataGo, whereas the performance against LeelaZero and ELF is basically zero-shot. It’s likely the case that an adversary trained against LeelaZero and ELF would also win consistently.
I’ve run LeelaZero and ELF and MiniGo (yet another independent AlphaZero replication in Go) by hand in particular test positions to see what their policy and value predictions are, and they all very massively misevaluate cyclic group situations just like KataGo. Perhaps by pure happenstance different bots could “accidentally” prefer different move patterns that make it harder or easier to form the attack patterns (indeed this is almost certainly something that should vary between bots as they do have different styles and preferences to some degree), but probably the bigger contributor to the difference is explicit optimization vs zero shot.
So all signs point to this misgeneralization being general to AlphaZero with convnets, not one particular bot. In another post here https://www.lesswrong.com/posts/Es6cinTyuTq3YAcoK/there-are-probably-no-superhuman-go-ais-strong-human-players?commentId=gAEovdd5iGsfZ48H3 I explain why I think it’s intuitive how and why a convnet would learn the an incorrect algorithm first and then get stuck on it given the data.
(As for whether, e.g. a transformer architecture would have less of an issue—I genuinely have no idea, I think it could go either way, nobody I know has tried it in Go. I think it’s at least easier to see why a convnet could be susceptible to this specific failure mode, but that doesn’t mean other architectures wouldn’t be too)