Jiaxin Wen comments on Current AIs seem pretty misaligned to me

Jiaxin Wen 13 May 2026 21:58 UTC
1 point
0
i think you are conflating two distinct problems: elicitation and scalable oversight. unsupervised / weakly supervised elicitation methods definitely have the potential to scale to the superhuman regime.
- Bronson Schoen 14 May 2026 17:17 UTC
  2 points
  0
  Parent
  Sorry I wasn’t claiming that they can’t, indeed that’s the primary regime they’re intended to be useful for, my question was specifically unless you’re designing the environments themselves to be used towards these problems it’s unclear to me how more RL environments for other contexts solves this problem.
  - Jiaxin Wen 14 May 2026 23:22 UTC
    1 point
    0
    Parent
    are you saying that the unsupervised/weakly supervised elicitaion methods discovered on a serious of tasks A, B, C …. would not generalize to held-out tasks X, Y, Z?
- Jiaxin Wen 13 May 2026 22:40 UTC
  1 point
  0
  Parent
  And it’s also plausible to build RL environments to solve scalable oversight in a way that 1) might generalize to the superhuman regime, or 2) directly works in the superhuman regime—since there are certain superhuman tasks that you can find workarounds to get objective ground truths.
  - Bronson Schoen 14 May 2026 17:20 UTC
    2 points
    0
    Parent
    Do you mean potentially building environments where humans directly construct the environments to train models such that they generalize to the superhuman regime for alignment research, without some intermediate stage where you’re building RL environments for the agents themselves to do the research?
    - Jiaxin Wen 14 May 2026 23:26 UTC
      1 point
      0
      Parent
      no i meant buliding RL environemnts for the agents to solve superhuman tasks where ground-truth labels are available e.g. predicting research results, predicting subtext, etc.