Thomas Kwa comments on Thomas Kwa’s MIRI research experience

Thomas Kwa 2 Oct 2023 20:44 UTC
9 points
5
I’m less concerned about the fact that there might be a dozen different problems (and therefore don’t have an explicit list), and more concerned about the fact that we don’t understand the mathematical structure behind metacognition (if there even is something to find), and therefore can’t yet characterize it or engineer it to be safe. We were trying to make a big list early on, but gradually shifted to resolving confusions and trying to make concrete models of the systems and how these problems arise.
Off the top of my head, by metacognition I mean something like: reasoning that chains through models of yourself, or applying your planning process to your own planning process.
On why reflection/metacognition might be connected to general science ability, I don’t like to speculate on capabilities, but just imagine that the scientists in the Apollo program were unable to examine and refine their research processes—I think they would likely fail.
- Daniel Kokotajlo 3 Oct 2023 0:55 UTC
  2 points
  0
  Parent
  I agree that just because we’ve thought hard and made a big list, doesn’t mean the list is exhaustive. Indeed the longer the list we CAN find, the higher the probability that there are additional things we haven’t found yet...
  
  But I still think having a list would be pretty helpful. If we are trying to grok the shape of the problem, it helps to have many diverse examples.
  
  Re: metacognition: OK, that’s a pretty broad definition I guess. Makes the “why is this important for doing science” question easy to answer. Arguably GPT4 already does metacognition to some extent, at least in ARC Evals and when in an AutoGPT harness, and probably not very skillfully.
  
  ETA: so, to be clear, I’m not saying you were wrong to move from the draft list to making models; I’m saying if you have time & energy to write up the list, that would help me along in my own journey towards making models & generally understanding the problem better. And probably other readers besides me also.