When you say ‘that weakness’, you mean the inability to identify a subtask as alignment-related?
Mainly “bad actors can split their work...” with current LLMs, but yeah also identifying/guessing the overall intentions of humans giving subtasks.
Mainly “bad actors can split their work...” with current LLMs, but yeah also identifying/guessing the overall intentions of humans giving subtasks.