This was a very interesting and entertaining read—thank you! The concept of the monitorability tax seems critical as you say—if this becomes significant (making some models not competitive) it will be an extremely powerful and negative force on this technology. The degree of faithfulness to CoT is super interesting. Clearly its useful to LLMs, but also clearly an output and not always the real scratchpad, or at least not always capturing all the scratches. Could a more faithful scratchpad be developed with thinkish? Perhaps it’s the formal use of human language that makes it “unnatural” for an LLM to use as a true scratchpad fully driving the output. Maybe there is a compromise in this race that keeps things interpretable. A structured intermediate language (more compressed than English, more readable than Neuralese) to keep the tax manageable.
This was a very interesting and entertaining read—thank you! The concept of the monitorability tax seems critical as you say—if this becomes significant (making some models not competitive) it will be an extremely powerful and negative force on this technology. The degree of faithfulness to CoT is super interesting. Clearly its useful to LLMs, but also clearly an output and not always the real scratchpad, or at least not always capturing all the scratches. Could a more faithful scratchpad be developed with thinkish? Perhaps it’s the formal use of human language that makes it “unnatural” for an LLM to use as a true scratchpad fully driving the output. Maybe there is a compromise in this race that keeps things interpretable. A structured intermediate language (more compressed than English, more readable than Neuralese) to keep the tax manageable.