There are many ways self-modification can be restricted. Only certain numerical parameters may be modified, only some source may be modified while other stuff remains a black box. If it has to implement its own interpreter, that’s not a “mild performance penalty” it’s a gargantuan one, not to mention that it can be made impossible.
If you place too many restrictions you will probably never reach human-like intelligence.
You can also freeze self-modification abilities at any given time and examine the current machine to evaluate intelligence.
If you do it frequently you won’t reach human-like intelligence in a reasonable span of time. If you do it infrequently, you will miss the transition into superhuman and it will be too late.
These are only examples, but I think we are far too far away from constructing an AI to assume that the first ones would be introspective or highly self-modifying. And by the time we start building one, we’ll know, and we’ll be able to prepare procedures to put in place… a coherent, large, well-funded effort will build something with many theories and proofs-of-concept partial prototypes along the way to guide safety procedures.
A coherent, large, well-funded effort can still make a fatal mistake. The Challenger was such an effort. The Chernobyl power plant was such an effort. Trouble is, this time the stakes are much higher.
If you place too many restrictions you will probably never reach human-like intelligence.
If you do it frequently you won’t reach human-like intelligence in a reasonable span of time. If you do it infrequently, you will miss the transition into superhuman and it will be too late.
A coherent, large, well-funded effort can still make a fatal mistake. The Challenger was such an effort. The Chernobyl power plant was such an effort. Trouble is, this time the stakes are much higher.