Btw, what are some ways we can incorporate heuristics into our algorithm while staying on level 1-2?
We don’t know how the prove to required desiderata about the heuristic, but we can still reasonably conjecture them and support the conjectures with empirical tests.
We can’t prove or even conjecture anything useful-in-itself about the heuristic, but the way the heuristic is incorporated into the overall algorithm makes it safe. For example, maybe the heuristic produces suggestions together with formal certificates of their validity. More generally, we can imagine an oracle-machine (where the heuristic is slotted into the oracle) about which we cannot necessarily prove something like a regret bound w.r.t. the optimal policy, but we can prove (or at least conjecture) a regret bound w.r.t. some fixed simple reference policy. That is, the safety guarantee shows that no matter what the oracle does, the overall system is not worse than “doing nothing”. Maybe, modulo weak provable assumptions about the oracle, e.g. that it satisfies a particular computational complexity bound.
[Epistemic status: very fresh idea, quite speculative but intriguing.] We can’t find even a guarantee like a above for a worst-case computationally bounded oracle. However, we can prove (or at least conjecture) some kind of an “average-case” guarantee. For example, maybe we have high probability of safety for a random oracle. However, assuming a uniformly random oracle is quite weak. More optimistically, maybe we can prove safety even for any oracle that is pseudorandom against some complexity class C1 (where we want C1 to be as small as possible). Even better, maybe we can prove safety for any oracle in some complexity class C2 (where we want C2 to be as large as possible) that has access to another oracle which is pseudorandom against C1. If our heuristic is not actually in this category (in particular, C2 is smaller than P and our heuristic doesn’t lie in C2), this doesn’t formally guarantee anything, but it does provide some evidence for the “robustness” of our high-level scheme.
Btw, what are some ways we can incorporate heuristics into our algorithm while staying on level 1-2?
We don’t know how the prove to required desiderata about the heuristic, but we can still reasonably conjecture them and support the conjectures with empirical tests.
We can’t prove or even conjecture anything useful-in-itself about the heuristic, but the way the heuristic is incorporated into the overall algorithm makes it safe. For example, maybe the heuristic produces suggestions together with formal certificates of their validity. More generally, we can imagine an oracle-machine (where the heuristic is slotted into the oracle) about which we cannot necessarily prove something like a regret bound w.r.t. the optimal policy, but we can prove (or at least conjecture) a regret bound w.r.t. some fixed simple reference policy. That is, the safety guarantee shows that no matter what the oracle does, the overall system is not worse than “doing nothing”. Maybe, modulo weak provable assumptions about the oracle, e.g. that it satisfies a particular computational complexity bound.
[Epistemic status: very fresh idea, quite speculative but intriguing.] We can’t find even a guarantee like a above for a worst-case computationally bounded oracle. However, we can prove (or at least conjecture) some kind of an “average-case” guarantee. For example, maybe we have high probability of safety for a random oracle. However, assuming a uniformly random oracle is quite weak. More optimistically, maybe we can prove safety even for any oracle that is pseudorandom against some complexity class C1 (where we want C1 to be as small as possible). Even better, maybe we can prove safety for any oracle in some complexity class C2 (where we want C2 to be as large as possible) that has access to another oracle which is pseudorandom against C1. If our heuristic is not actually in this category (in particular, C2 is smaller than P and our heuristic doesn’t lie in C2), this doesn’t formally guarantee anything, but it does provide some evidence for the “robustness” of our high-level scheme.