Raemon comments on Looking back on my alignment PhD

Raemon 5 Jul 2022 7:53 UTC
4 points
2
I wanted to be more like Eliezer Yudkowsky and Buck Shlegeris and Paul Christiano. They know lots of facts and laws about lots of areas (e.g. general relativity and thermodynamics and information theory). I focused on building up dependencies (like analysis and geometry and topology) not only because I wanted to know the answers, but because I felt I owed a debt, that I was in the red until I could at least meet other thinkers at their level of knowledge.
But rationality is not about the bag of facts you know, nor is it about the concepts you have internalized. Rationality is about how your mind holds itself, it is how you weigh evidence, it is how you decide where to look next when puzzling out a new area.
If I had been more honest with myself, I could have nipped the “catching up with other thinkers” mistake in 2018. I could have removed the bad mental habits using certain introspective techniques; or at least been aware of the badness.
It makes sense to me that the generator here wasn’t ideal, but I’m not currently convinced your actual actions were actually wrong. The first quoted paragraph brings to mind the virtue of scholarship (“If you swallow enough sciences the gaps between them will diminish and your knowledge will become a unified whole”. Or, that John Wentworth quote I can’t find now about AI alignment work requiring you to have some depth of knowledge in some domains). The reasons you cite in your second paragraph don’t seem very connected to whether the actions in the first paragraph mattered.
- TurnTrout 11 Jul 2022 0:41 UTC
  4 points
  1
  Parent
  It’s not that my actions were wrong, it’s that I did them for the wrong reasons, and that really does matter. Under my model, the cognitive causes (e.g. I want to be like EY) of externally visible actions (study math) are very important, because I think that the responsible cognition gets reinforced into my future action-generators.
  For example, since I wanted to be like EY, I learned math; since I learned math, I got praised on LessWrong; since I got praised, my social-reward circuitry activated; since the social-reward circuitry activated, credit assignment activates and strengthens all of the antecedent thoughts which I just listed, therefore making me more of the kind of person who does things because he wants to be like EY.
  I can write a similar story for doing things because they are predicted to make me more respected. Therefore, over time, I became more of the kind of person who cares about being respected, and not so much about succeeding at alignment or truly becoming stronger.
  What links here?
  - Alex Turner's comment on [Updated] What’s the theory of change of “Come to the bay over the summer!”? by Luise (EA Forum; 11 Jul 2022 20:15 UTC; 8 points)