james.lucassen comments on [Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning

james.lucassen 1 Apr 2022 14:30 UTC
5 points
AF
I don’t think I understand how the scorecard works. From:
[the scorecard] takes all that horrific complexity and distills it into a nice standardized scorecard—exactly the kind of thing that genetically-hardcoded circuits in the Steering Subsystem can easily process.
And this makes sense. But when I picture how it could actually work, I bump into an issue. Is the scorecard learned, or hard-coded?
If the scorecard is learned, then it needs a training signal from Steering. But if it’s useless at the start, it can’t provide a training signal. On the other hand, since the “ontology” of the Learning subsystem is learned-from-scratch, then it seems difficult for a hard-coded scorecard to do this translation task.
- Steven Byrnes 1 Apr 2022 17:14 UTC
  LW: 4 AF: 3
  AF Parent
  The categories are hardcoded, the function-that-assigns-a-score-to-a-category is learned. Everybody has a goosebumps predictor, everyone has a grimacing predictor, nobody has a debt predictor, etc. Think of a school report card: everyone gets a grade for math, everyone gets a grade for English, etc. But the score-assigning algorithm is learned.
  So in the report card analogy, think of a math TA ( = Teaching Assistant = Thought Assessor) who starts out assigning math grades to students randomly, but the math professor (=Steering Subsystem) corrects the TA when its assigned score is really off-base. Gradually, the math TA learns to assign appropriate grades by looking at student tests. In parallel, there’s an English class TA (=Thought Assessor), learning to assign appropriate grades to student essays based on feedback from the English professor (=Steering Subsystem).
  The TAs (Thought Assessors) are useless at the start, but the professors aren’t. Back to biology: If you get shocked, then the Steering Subsystem says to the “freezing in fear” Thought Assessor: “Hey, you screwed up, you should have been sending a signal just now.” The professors are easy to hardwire because they only need to figure out the right answer in hindsight. You don’t need a learning algorithm for that.