IRL in General Environments

Here is a pro­posal for In­verse Re­in­force­ment Learn­ing in Gen­eral En­vi­ron­ments. (2 12 pages; very lit­tle math).

Copy­ing the in­tro­duc­tion here:

The even­tual aim of IRL is to un­der­stand hu­man goals. How­ever, typ­i­cal al­gorithms for IRL as­sume the en­vi­ron­ment is finite-state Markov, and it is of­ten left un­speci­fied how raw ob­ser­va­tional data would be con­verted into a record of hu­man ac­tions, alongside the space of ac­tions available. For IRL to learn hu­man goals, the AI has to con­sider gen­eral en­vi­ron­ments, and it has to have a way of iden­ti­fy­ing hu­man ac­tions. Lest these ex­ten­sions ap­pear triv­ial, I con­sider one of the sim­plest pro­pos­als, and dis­cuss some difficul­ties that might arise.