Gurkenglas comments on Measuring intelligence and reverse-engineering goals

Gurkenglas 11 Aug 2025 9:53 UTC
2 points
0
An agent is intelligent to the extent it tends to achieve convergent instrumental goals.
I see a surprisingly persuasive formal argument for my immediate intuition that this is a ~~measure, not a target~~ theorem, not a definition: Omohundro drives are about multi-turn games, but intelligence is a property of an agent playing an arbitrary game. (A multi-turn game is a game where the type of world-states before and after the agent’s turn happen to be the same, which means you get to iterate.)
- jessicata 11 Aug 2025 16:49 UTC
  2 points
  0
  Parent
  Well I think multi turn games are generally of more import? Wouldn’t AI doing big things like inventing new technology take multiple turns?
  
  I guess cases like answering a math question could be single turn. I think multi turn settings are more appropriate to intelligent agency though.
  - Gurkenglas 11 Aug 2025 19:17 UTC
    2 points
    0
    Parent
    The parenthetical was meant to argue that almost all games are single-turn, that multi-turn games are single-turn games with extra property. I have found several definitions that can be gainfully refactored in terms of the theorems they lead to, but never did they thereby become less general.
    Mayyybe you can get away with claiming that the thing you’re looking for only typechecks for multi-turn games, in which case I’d go looking for the definitions that become available there. Namely, hmm. For what properties of world-states does there exist a player such that the property remains true over time? Which players does one get that way?
    - jessicata 11 Aug 2025 23:14 UTC
      2 points
      0
      Parent
      I guess I’m more interested in modeling cases of AI importance/risk on account of pursuit of convergent instrumental goals. If there’s a setting without convergent instrumental goals, there might be a generalization, but it’s less clear.
      
      With single-turn question answering, one could ask about accuracy, which would rate a system giving intentionally wrong answers as un-intelligent. The thing I meant to point to with AIXI is that it would be nice to have a measure of intelligence for mis-aligned systems (rather than declaring them un-intelligent because they don’t satisfy the objective like reward maximization / question answering / etc). If there is a possible “intelligent misalignment” in the single turn case (e.g. question answering) then there might be a corresponding intelligence metric that accounts for intelligent misaligned systems.