According to this Nature paper, the Atlantic Meridional Overturning Circulation (AMOC), the “global conveyor belt”, is likely to collapse this century (mean 2050, 95% confidence interval is 2025-2095).
Another recent study finds that it is “on tipping course” and predicts that after collapse average February temperatures in London will decrease by 1.5 °C per decade (15 °C over 100 years). Bergen (Norway) February temperatures will decrease by 35 °C. This is a temperature change about an order of magnitude faster than normal global warming (0.2 °C per decade) but in the other direction!
This seems like a big deal? Anyone with more expertise in climate sciences want to weigh in?
When working with SAE features, I’ve usually relied on a linear intuition: a feature firing with twice the strength has about twice the “impact” on the model. But while playing with an SAE trained on the final layer I was reminded that the actual direct impact on the relative token probabilities grows exponentially with activation strength. While a feature’s additive contribution to the logits is indeed linear with its activation strength, the ratio of probabilities of two competing tokens P(A)/P(B) is equal to the exponent of the logit difference exp(logit(A)−logit(B)).
If we have a feature that boosts logit(A) and not logit(B) and we multiply its activation strength by a factor of 5.0, this doesn’t 5x its effect on P(A)/P(B), but rather raises its effect to the 5th power. If this feature caused token A to be three times as likely as token B before, it now makes this token 3^5 = 243 times as likely! This might partly explain why the lower activations for a feature are often less interpretable than the top activations. Their direct impact on the relative token probabilities is exponentially smaller.
Note that this only holds for the direct ‘logit lens’-like effect of a feature. This makes this intuition mostly applicable to features in the final layers of a model, as the impact of earlier features is probably mostly modulated by their effect on later layers.
According to this Nature paper, the Atlantic Meridional Overturning Circulation (AMOC), the “global conveyor belt”, is likely to collapse this century (mean 2050, 95% confidence interval is 2025-2095).
Another recent study finds that it is “on tipping course” and predicts that after collapse average February temperatures in London will decrease by 1.5 °C per decade (15 °C over 100 years). Bergen (Norway) February temperatures will decrease by 35 °C. This is a temperature change about an order of magnitude faster than normal global warming (0.2 °C per decade) but in the other direction!
This seems like a big deal? Anyone with more expertise in climate sciences want to weigh in?
When working with SAE features, I’ve usually relied on a linear intuition: a feature firing with twice the strength has about twice the “impact” on the model. But while playing with an SAE trained on the final layer I was reminded that the actual direct impact on the relative token probabilities grows exponentially with activation strength. While a feature’s additive contribution to the logits is indeed linear with its activation strength, the ratio of probabilities of two competing tokens P(A)/P(B) is equal to the exponent of the logit difference exp(logit(A)−logit(B)).
If we have a feature that boosts logit(A) and not logit(B) and we multiply its activation strength by a factor of 5.0, this doesn’t 5x its effect on P(A)/P(B), but rather raises its effect to the 5th power. If this feature caused token A to be three times as likely as token B before, it now makes this token 3^5 = 243 times as likely! This might partly explain why the lower activations for a feature are often less interpretable than the top activations. Their direct impact on the relative token probabilities is exponentially smaller.
Note that this only holds for the direct ‘logit lens’-like effect of a feature. This makes this intuition mostly applicable to features in the final layers of a model, as the impact of earlier features is probably mostly modulated by their effect on later layers.