There’s something completely different going on for tasks longer than 1 minute, clearly not explained by the log-linear fit.
Perhaps humans generating training data are, for longer tasks, taking cognitive steps which are opaque to these models, or at least relatively more difficult to learn?
I’d wager 1:1 that this sort of abstraction-domain mismatch between human training data and LLMs is causing more of the HCAST weirdness than skewed finetuning investment.
Bounties (fractional funds distributed in good faith if you solve part of a problem):
1500$ for an algo which individuates a sufficient portion of activation space into semantically meaningful polytopes (or fuzzy loci) such we can detect steganography during training with minimal human oversight in polynomial (constant exponent across architectures) or faster time
750$ for strong handles on the sorts of downstream activation patterns by which we can cluster upstream polytopes, and additional 300$ for polynomial or faster clustering algo
Happy to fund solutions to other subproblems as well. Comment or dm.