The first part of your criticism makes me more excited, not less. We have considered doing the variations you suggested, and more, to distinguish between what parts of the changes are leading to which aspects of behavior.
I also think we can get info without robust operationalizations of concepts involved, but robust operationalizations would certainly allow us to get more info.
I am not one to shy away from hard problems because they’re hard. Especially if it seems increasing hardness levels lead to increasing bits gleaned.
I also think we can get info without robust operationalizations of concepts involved, but robust operationalizations would certainly allow us to get more info.
I think unless you’re extremely lucky and this turns out to be a highly human-visible thing somehow, you’d never notice what you’re looking for among all the other complicated changes happening that nobody has analysis tools or even vaguedefinitions for yet.
Which easier methods do you have in mind?
Dunno. I was just stating a general project-picking heuristic I have, and that it’s eyeing your proposal with some skepticism. Maybe search the literature for simpler problems and models with which you might probe the difference between RL and non-RL training. Something even a shallow MLP can handle, ideally.
Good ideas! I worry that a shallow MLP wouldn’t be capable enough to see a rich signal in the direction of increasing agency, but we should certainly try to do the easy version first.
I think unless you’re extremely lucky and this turns out to be a highly human-visible thing somehow, you’d never notice what you’re looking for among all the other complicated changes happening that nobody has analysis tools or even vaguedefinitions for yet.
I don’t think I’m seeing the complexity you’re seeing here. For instance, one method we plan on trying is taking sets of heads and MLPs, and reverting them to their og values to see that set’s qualitative influence on behavior. I don’t think this requires rigorous operationalizations.
An example: In a chess-playing context, this will lead to different moves, or out-of-action-space-behavior. The various kinds of out-of-action-space behavior or biases in move changes seem like they’d give us insight into what the head-set was doing, even if we don’t understand the mechanisms used inside the head set.
I don’t think I’m seeing the complexity you’re seeing here. For instance, one method we plan on trying is taking sets of heads and MLPs, and reverting them to their og values to see that set’s qualitative influence on behavior. I don’t think this requires rigorous operationalizations.
That sounds to me like it would give you a very rough, microscope-level view of all the individual things the training is changing around. I am sceptical that by looking at this ground-level data, you’d be able to separate out the things-that-are-agency from everything else that’s happening.
As an analogy, looking at what happens if you change the wave functions of particular clumps of silica atoms doesn’t help you much in divining how the IBM 608 divides numbers, if you haven’t even worked out yet that the atoms in the machine are clustered into things like transistors and cables, and actually, you don’t even really know how dividing numbers works even on a piece of paper, you just think of division as “the inverse of multiplication”.
The first part of your criticism makes me more excited, not less. We have considered doing the variations you suggested, and more, to distinguish between what parts of the changes are leading to which aspects of behavior.
I also think we can get info without robust operationalizations of concepts involved, but robust operationalizations would certainly allow us to get more info.
I am not one to shy away from hard problems because they’re hard. Especially if it seems increasing hardness levels lead to increasing bits gleaned.
Which easier methods do you have in mind?
I think unless you’re extremely lucky and this turns out to be a highly human-visible thing somehow, you’d never notice what you’re looking for among all the other complicated changes happening that nobody has analysis tools or even vague definitions for yet.
Dunno. I was just stating a general project-picking heuristic I have, and that it’s eyeing your proposal with some skepticism. Maybe search the literature for simpler problems and models with which you might probe the difference between RL and non-RL training. Something even a shallow MLP can handle, ideally.
Good ideas! I worry that a shallow MLP wouldn’t be capable enough to see a rich signal in the direction of increasing agency, but we should certainly try to do the easy version first.
I don’t think I’m seeing the complexity you’re seeing here. For instance, one method we plan on trying is taking sets of heads and MLPs, and reverting them to their og values to see that set’s qualitative influence on behavior. I don’t think this requires rigorous operationalizations.
An example: In a chess-playing context, this will lead to different moves, or out-of-action-space-behavior. The various kinds of out-of-action-space behavior or biases in move changes seem like they’d give us insight into what the head-set was doing, even if we don’t understand the mechanisms used inside the head set.
That sounds to me like it would give you a very rough, microscope-level view of all the individual things the training is changing around. I am sceptical that by looking at this ground-level data, you’d be able to separate out the things-that-are-agency from everything else that’s happening.
As an analogy, looking at what happens if you change the wave functions of particular clumps of silica atoms doesn’t help you much in divining how the IBM 608 divides numbers, if you haven’t even worked out yet that the atoms in the machine are clustered into things like transistors and cables, and actually, you don’t even really know how dividing numbers works even on a piece of paper, you just think of division as “the inverse of multiplication”.