[Question] What’s wrong with these analogies for understanding Informed Oversight and IDA?

In Can HCH epistem­i­cally dom­i­nate Ra­manu­jan? Alex Zhu wrote:

If HCH is as­crip­tion uni­ver­sal, then it should be able to epistem­i­cally dom­i­nate an AI the­o­rem-prover that rea­sons similarly to how Ra­manu­jan rea­soned. But I don’t cur­rently have any in­tu­itions as to why ex­plicit ver­bal break­downs of rea­son­ing should be able to repli­cate the in­tu­itions that gen­er­ated Ra­manu­jan’s re­sults (or any style of rea­son­ing em­ployed by any math­e­mat­i­cian since Ra­manu­jan, for that mat­ter).

And I an­swered:

My guess is that HCH has to re­verse en­g­ineer the the­o­rem prover, figure out how/​why it works, and then re­pro­duce the same kind of rea­son­ing.

And then I fol­lowed up my own com­ment with:

It oc­curs to me that if the over­seer un­der­stands ev­ery­thing that the ML model (that it’s train­ing) is do­ing, and the train­ing is via some kind of lo­cal op­ti­miza­tion al­gorithm like gra­di­ent de­scent, the over­seer is es­sen­tially man­u­ally pro­gram­ming the ML model by grad­u­ally nudg­ing it from some ini­tial (e.g., ran­dom) point in con­figu­ra­tion space.

No one an­swered my com­ments with ei­ther a con­fir­ma­tion or de­nial, as to whether these guesses of how to un­der­stand Univer­sal­ity /​ In­formed Over­sight and IDA are cor­rect. I’m sur­fac­ing this ques­tion as a top-level post be­cause if “In­formed Over­sight = re­verse en­g­ineer­ing” and “IDA = pro­gram­ming by nudg­ing” are good analo­gies for un­der­stand­ing In­formed Over­sight and IDA, it seems to have pretty sig­nifi­cant im­pli­ca­tions.

In par­tic­u­lar it seems to im­ply that there’s not much hope for IDA to be com­pet­i­tive with ML-in-gen­eral, be­cause if IDA is analo­gous to a highly con­strained method of “man­ual” pro­gram­ming, that seems un­likely to be com­pet­i­tive with less con­strained meth­ods of “man­ual” pro­gram­ming (i.e., AIs de­sign­ing and pro­gram­ming more ad­vanced AIs in more gen­eral ways, similar to how hu­mans do most pro­gram­ming to­day), which it­self is pre­sum­ably not com­pet­i­tive with gen­eral (un­con­strained-by-safety) ML (oth­er­wise ML would not be the com­pet­i­tive bench­mark).

If these are not good ways to un­der­stand IO and IDA, can some­one please point out why?

No comments.