Would rewriting Linux to seL4 standards cost $16B in the world before or after frontier models are solving Erdos problems? If that’s the cost in human SWE hours then it seems tractable to use agent harnesses and formal methods to achieve quite a cost reduction.
But also I don’t think most people want Linux to seL4 standards (the Unix security model isn’t great); there’s probably more to be gained by finishing the network stack(s) for seL4 and implementing a bunch of network card drivers and a TLS library. That would enable IoT at least to have a pretty secure base to work from, and hopefully the harnesses and tooling for that work would also be available to application developers to verify at least their parsing and security checks for example.
Similar to vision/audio tokens the system and thinking tokens should be distinct from user and output. This would have to happen after pretraining with, e.g. the first assistant training examples appearing with some thinking/output tokens intermixed with the pretraining corpus tokens, slowly increasing the ratio of specialized to normal tokens until the model only outputs thinking tokens in the thinking section, output tokens in the output. Precisely when to begin adjusting the ratios could actually happen earlier, e.g. have a 1:1 mapping of “user” and “output” tokens and calculate the loss function on output tokens despite the LLM seeing “user” tokens, but making translation between token sets straightforward. Finally, on inference always strip incorrect-domain tokens from the tagged output sections, and of course don’t let users input non-user tokens.