cubefox comments on How to game the METR plot

cubefox 21 Dec 2025 12:57 UTC
8 points
1
Here R (the square root of R²) is Pearson correlation, which checks for linear association. The better measure here would be to use Spearman correlation on the original data, which checks for any monotonic association. Spearman is more principled than trying to transform the data first with some monotonic function (e.g. various sigmoids) before applying Pearson.
- Lukas Finnveden 21 Dec 2025 15:37 UTC
  5 points
  0
  Parent
  Hm.
  R² = 1 − (mean squared errors / variance)
  Mean squared error seems pretty principled. Normalizing by variance to make it more comparable to other distributions seems pretty principled.
  I guess after that it seems more natural to take the standard deviation (to get RMSE normalized by standard deviation), than to subtract it off of 1. But I guess the latter is a simple enough transformation and makes it comparable to the (more well-motivated) R^2 for linear models, so therefore more commonly reported than RMSE/STD.
  Anyway, spearman r is −0.903 (square 0.82) and −0.710, (square 0.5) so basically the same.