Well, my idea is that Quirrell’s interest is to teach you a valuable lesson, by showing you a program whose safety is genuinely non-obvious to you, so the first program may or may not be safe; but if it happens to be unsafe and you double down anyway, Quirrell is the type of teacher who thinks you’ll learn the best lesson about your overconfidence if you do suffer the consequences of self-destruction.
That’s just the story, though. What I need formally is a setting where there’s a clear separation between “safe” and “unsafe” rewrites, and our program has to decide whether a proposed rewrite is definitely safe or possibly unsafe. For this toy setting, I wanted that to be the only choice the program has to make, because if there were other choices, you could argue that you should only accept a rewrite if the rewrite is good at making those other choices, and good at only accepting rewrites which are good at these other choices, etc. -- which is a problem that will need to be solved, but not something I wanted to focus on in this post.
Well, my idea is that Quirrell’s interest is to teach you a valuable lesson, by showing you a program whose safety is genuinely non-obvious to you, so the first program may or may not be safe; but if it happens to be unsafe and you double down anyway, Quirrell is the type of teacher who thinks you’ll learn the best lesson about your overconfidence if you do suffer the consequences of self-destruction.
That’s just the story, though. What I need formally is a setting where there’s a clear separation between “safe” and “unsafe” rewrites, and our program has to decide whether a proposed rewrite is definitely safe or possibly unsafe. For this toy setting, I wanted that to be the only choice the program has to make, because if there were other choices, you could argue that you should only accept a rewrite if the rewrite is good at making those other choices, and good at only accepting rewrites which are good at these other choices, etc. -- which is a problem that will need to be solved, but not something I wanted to focus on in this post.