Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 6 Mar 2025 23:26 UTC
4 points
0
Some rough notes from a metacognition workshop that @Raemon ran 1-2 weeks ago.
Claim: Alignment research is hard by default.
- The empirical feedback loops may not be great.
- Doing object-level research can be costly and time-consuming, so it’s expensive to iterate.
- It’s easy to feel like you’re doing something useful in the moment.
- It’s much harder to do something that will turn out to have been useful. Requires identifying the key bottleneck and working directly on that.
- The most important emotional skill may be patience, i.e. NOT doing things unless you have a model of how you’ll update based on the results.
Thus, we need to practise the skill of solving hard problems with little empirical feedback.
- Claim: For the most part, you can only do this by ‘meta-learning’, i.e. trying to get better at hard things which you haven’t done before, but relying mostly on personal intuitions / thinking rather than
- Claim: A good way to get better here is to identify useful ‘meta-strategies’. These are broad approaches to doing / thinking things, e.g. ‘break it down’, ‘make optimistic plan’, ‘work backwards’
- Register predictions ahead of time
- If you have to do things, surprise yourself as quickly as possible
Specific recommendations
- Use Fatebook to register predictions ahead of time and notice when you’re surprised, to improve future calibration
- Write down plans, envision outcomes, assign probabilities to plan working / being surprised
- When something works, reflect on what ‘meta-strategy’ you used to make it work
- When something doesn’t work, reflect on how you could have maybe predicted that in advance (and why you didn’t)