Introspective RSI vs Extrospective RSI

On the left, a blue man solves a rubix cube with a thought bubble depicting himself thinking about the rubix cube. On the right, a blue man is being studied with medical equipment by other blue men.
Introspective-RSIExtrospective-RSI
The meta-cognition and meso-cognition occur within the same entity.The meta-cognition and meso-cognition occur in different entities.

Much like a human, an AI will observe, analyze, and modify its own cognitive processes. And this capacity is privileged: the AI can make self-observations and self-modifications that can’t be made from outside.

AIs will automate various R&D tasks that humans currently perform to improve AI, using similar workflows (studying prior literature, forming hypotheses, writing code, running experiments, analyzing data, drawing conclusions, publishing results).

Here are some differences between them (I mark the most important with *):

  1. *Monitoring opportunities: During E-RSI, information flows between the AIs through external channels (e.g. API calls) which humans can monitor. But in I-RSI, there are fewer monitoring channels.

  2. *Generalisation from non-AI R&D: E-RSI involves AIs performing AI R&D in the same way they perform non-AI R&D, such as drug discovery and particle physics. So we can train the AIs in those non-AI R&D domains and hope that the capabilities and propensities generalise to AI R&D.

  3. Latency: I-RSI may be lower latency, because the metacognition is “closer” to the mesocognition in some sense, e.g. in the same chain-of-thought.

  4. Parallelisation: E-RSI may scale better through parallelization, because it operates in a distributed manner.

  5. Diminishing returns: Humans have already performed lots of AI R&D, which so there might be diminishing returns to E-RSI (c.f. ideas are harder to find). But I-RSI would be a novel manner of improving AI cogntion, so there may be low-hanging fruit leading to rapid progress.

  6. Historical precedent: Humans perfoming AI R&D provides a precedent for E-RSI. So we can apply techniques for improving the security of human AI-R&D to mitigating risks from E-RSI, such as research sabotage.

  7. Verification standards: To evaluate E-RSI, we can rely on established mechanisms for verifying human AI-R&D, e.g. scientific peer-review. However, we don’t have good methods to verify I-RSI.

  8. *Transition continuity: I think the transition from humans leading AI-R&D to E-RSI will be gradual, as AIs take greater leadership in the workflows. But I-RSI might be more of sudden transition.

  9. Representation compatibility: Because of the transition continuity, I expect that E-RSI will involve internal representations which are compatible with human concepts.

  10. *Transferability: E-RSI will produce results which can more readily be shared between instances, models, and labs. But improvements from I-RSI might not transfer between models, perhaps not even between instances.

See here for a different taxonomy — it separates E-RSI into “scaffolding-level improvement” and other types of improvements.