Some rough notes from a metacognition workshop that @Raemon ran 1-2 weeks ago.
Claim: Alignment research is hard by default.
The empirical feedback loops may not be great.
Doing object-level research can be costly and time-consuming, so it’s expensive to iterate.
It’s easy to feel like you’re doing something useful in the moment.
It’s much harder to do something that will turn out to have been useful. Requires identifying the key bottleneck and working directly on that.
The most important emotional skill may be patience, i.e. NOT doing things unless you have a model of how you’ll update based on the results.
Thus, we need to practise the skill of solving hard problems with little empirical feedback.
Claim: For the most part, you can only do this by ‘meta-learning’, i.e. trying to get better at hard things which you haven’t done before, but relying mostly on personal intuitions / thinking rather than
Claim: A good way to get better here is to identify useful ‘meta-strategies’. These are broad approaches to doing / thinking things, e.g. ‘break it down’, ‘make optimistic plan’, ‘work backwards’
Register predictions ahead of time
If you have to do things, surprise yourself as quickly as possible
Specific recommendations
Use Fatebook to register predictions ahead of time and notice when you’re surprised, to improve future calibration
Write down plans, envision outcomes, assign probabilities to plan working / being surprised
When something works, reflect on what ‘meta-strategy’ you used to make it work
When something doesn’t work, reflect on how you could have maybe predicted that in advance (and why you didn’t)
Some rough notes from a metacognition workshop that @Raemon ran 1-2 weeks ago.
Claim: Alignment research is hard by default.
The empirical feedback loops may not be great.
Doing object-level research can be costly and time-consuming, so it’s expensive to iterate.
It’s easy to feel like you’re doing something useful in the moment.
It’s much harder to do something that will turn out to have been useful. Requires identifying the key bottleneck and working directly on that.
The most important emotional skill may be patience, i.e. NOT doing things unless you have a model of how you’ll update based on the results.
Thus, we need to practise the skill of solving hard problems with little empirical feedback.
Claim: For the most part, you can only do this by ‘meta-learning’, i.e. trying to get better at hard things which you haven’t done before, but relying mostly on personal intuitions / thinking rather than
Claim: A good way to get better here is to identify useful ‘meta-strategies’. These are broad approaches to doing / thinking things, e.g. ‘break it down’, ‘make optimistic plan’, ‘work backwards’
Register predictions ahead of time
If you have to do things, surprise yourself as quickly as possible
Specific recommendations
Use Fatebook to register predictions ahead of time and notice when you’re surprised, to improve future calibration
Write down plans, envision outcomes, assign probabilities to plan working / being surprised
When something works, reflect on what ‘meta-strategy’ you used to make it work
When something doesn’t work, reflect on how you could have maybe predicted that in advance (and why you didn’t)