• Distil­la­tion: We train an ML agent to im­ple­ment a func­tion from ques­tions to an­swers based on demon­stra­tions (or in­cen­tives) pro­vided by a large tree of ex­perts […]. The trained agent […] only repli­cates the tree’s in­put-out­put be­hav­ior, not in­di­vi­d­ual rea­son­ing steps.

Why do we de­com­pose in the first place? If the train­ing data for the next agent con­sists only of root ques­tions and root an­swers, it doesn’t mat­ter whether they rep­re­sent the tree’s in­put-out­put be­havi­our or the in­put-out­put be­havi­our of a small group of ex­perts who rea­son in the nor­mal hu­man high-con­text, high-band­width way. The lat­ter is cer­tainly more effi­cient.

There seems to be a cir­cu­lar prob­lem and I don’t un­der­stand how it is not cir­cu­lar or where my un­der­stand­ing goes astray: We want to teach an ML agent al­igned rea­son­ing. This is difficult if the train­ing data con­sists of high-level ques­tions and an­swers. So in­stead we write down how we rea­son ex­plic­itly in small steps.

Some tasks are hard to write down in small steps. In these cases we write down a naive de­com­po­si­tion that takes ex­po­nen­tial time. A real-world agent can’t use this to rea­son, be­cause it would be too slow. To work around this we train a higher-level agent on just the in­put-out­put be­havi­our of the slower agent. Now the train­ing data con­sists of high-level ques­tions and an­swers. But this is what we wanted to avoid, and there­fore started writ­ing down small steps.

De­com­po­si­tion makes sense to me in the high-band­width set­ting where the task is too difficult for a hu­man, so the hu­man only di­vides it and com­bines the sub-re­sults. I don’t see the point of de­com­pos­ing a hu­man-an­swer­able ques­tion into even smaller low-band­width sub­ques­tions if we then throw away the tree and train an agent on the top-level ques­tion and an­swer.

# Job de­scrip­tion for an in­de­pen­dent AI al­ign­ment researcher

13 Jul 2019 9:47 UTC
• Good point. I ex­per­i­mented for ten min­utes with sav­ing the HTML, chang­ing it and load­ing it again in the browser. But it doesn’t work for LessWrong. The ar­ti­cle ap­pears briefly and then it switches to: ‘Sorry, we couldn’t find what you were look­ing for.’ I didn’t feel like figur­ing this out.

• with­out hav­ing to hover over the link to view its URL

In­deed, that’s some­thing I do all the time.

costs, such as break­ing up the flow of the text

On the other hand it breaks the flow of the read­ing (on pa­per) if I have to open the ar­ti­cle on my com­puter and find the link to hover over it.

the effort of typ­ing or copy/​past­ing the ar­ti­cle ti­tle (es­pe­cially on mo­bile)

How much effort is this com­pared to the effort of all the read­ers who have to look up what is be­hind a link?

• where links are treated like foot­notes or words or phrases

Un­for­tu­nately they are not ren­dered as foot­notes when printed.

There is also a curse of knowl­edge is­sue. The au­thor knows what is be­hind their link, how im­por­tant it is, whether it is a refer­ence or a defi­ni­tion or a “fur­ther read­ing”. The reader has no idea. So the least I’m likely to do for any non-speak­ing link is hover over it to see what URL it points to. This wouldn’t be nec­es­sary if the link were named with some­thing close to the ti­tle of its tar­get.

it’s of­ten a style choice

And it’s best to choose a style that sup­ports the func­tion, right? I don’t mind “punc­tu­a­tion style” in most or­di­nary blog posts. But it doesn’t work for (semi-)sci­en­tific ma­te­rial that is likely to be printed. Espe­cially by be­gin­ners like me. Maybe more ad­vanced peo­ple can just tear through an ar­ti­cle on, say, Benign model-free RL, but I need the aid of pages spread on my desk.

11 Jul 2019 7:47 UTC
• In the pseu­docode, it would make more sense to ini­tial­ize A ← Distill(H), wouldn’t it? Other­wise, run­ning Am­plify with the ran­domly ini­tial­ized A in the next step wouldn’t be helpful.

• I’ve added speci­fics. I hope this im­proves things. If not, feel free to edit it out.

Thanks for point­ing out the prob­lems with my ques­tion. I see now that I was wrong to com­bine strong lan­guage with no speci­fics and a con­crete tar­get. I would amend it, but then the con­text for the dis­cus­sion would be gone.

# [Question] How to deal with a mis­lead­ing con­fer­ence talk about AI risk?

27 Jun 2019 21:04 UTC
• In the al­ter­na­tive al­gorithm for the five-and-ten prob­lem, why should we use the first proof that we find? How about this al­gorithm:

A2 :=
Spend some time t search­ing for proofs of sen­tences of the form
“A2() = a → U() = x”
for a ∈ {5, 10}, x ∈ {0, 5, 10}.
For each found proof and cor­re­spond­ing pair (a, x):
if x > x*:
a* := a
x* := x
Re­turn x*


If this one searches long enough (de­pend­ing on how com­pli­cated U is), it will re­turn 10, even if the non-spu­ri­ous proofs are longer than the spu­ri­ous ones.

• How I un­der­stand the main point:

The goal is to get su­per­hu­man perfor­mance al­igned with hu­man val­ues . How might we achieve this? By learn­ing the hu­man val­ues.Then we can use a perfect plan­ner to find the best ac­tions to al­ign the world with the hu­man val­ues. This will have su­per­hu­man perfor­mance, be­cause hu­mans’ plan­ning al­gorithms are not perfect. They don’t always find the best ac­tions to al­ign the world with their val­ues.

How do we learn the hu­man val­ues? By ob­serv­ing hu­man be­havi­our, ie. their ac­tions in each cir­cum­stance. This is mod­el­led as the hu­man policy .

Be­havi­our is the known out­side view of a hu­man, and val­ues+plan­ner is the un­known in­side view. We need to learn both the val­ues and the plan­ner such that .

Un­for­tu­nately, this equa­tion is un­der­de­ter­mined. We only know . and can vary in­de­pen­dently.

Are there differ­ences among the can­di­dates? One thing we could look at is their Kol­mogorov com­plex­ity. Maybe the true can­di­date has the low­est com­plex­ity. But this is not the case, ac­cord­ing to the ar­ti­cle.