Right now, the incentives to get useful feedback on my research push me to go into the opposite policy that I would like: publish on the AF as late as I can allow.
Ideally, I would want to use the AF as my main source of feedback, as it’s public, is read by more researchers that I know personally, and I feel that publishing there helps the field grow.
But I’m forced to admit that publishing anything on the AF means I can’t really send it to people anymore (because the ones I ask for feedback read the AF, so that’s feels wrong socially), and yet I don’t get any valuable feedback 99% of the time. More specifically, I don’t get any feedback 99% of the time. Whereas when I ask for feedback directly on a gdoc, I always end up with some useful remarks.
I also feel bad that I’m basically using a privileged policy, in the sense that a newcomer cannot use it.
Nonetheless, because I believe in the importance of my research, and I want to know if I’m doing stupid things or not, I’ll keep to this policy for the moment: never ever post something on the AF for which I haven’t already got all the useful feedback I could ask for.
I think there are a number for features LW could build to improve this situation, but first curious for more detail on “what feels wrong about explicitly asking individuals for feedback after posting on AF” similar to how you might ask for feedback on a gDoc?
Not Adam, but
Maybe there’s a sense in which everyone has already implicitly declared that they don’t want to give feedback, because they could have if they wanted to, so it feels like more of an imposition.
Maybe it feels like “I want feedback for my own personal benefit” when it’s already posted, as opposed to “I want feedback to improve this document which I will share with the community” when it’s not yet posted. So it feels more selfish, instead of part of a community project. For that problem, maybe you’d want to frame it as “I’m planning to rewrite this post / write a follow-up to this post / give a talk based on this post / etc., can you please offer feedback on this post to help me with that?” (Assuming that’s in fact the case, of course, but most posts have follow-up posts...)
curious for more detail on “what feels wrong about explicitly asking individuals for feedback after posting on AF” similar to how you might ask for feedback on a gDoc?
My main reason is steve’s first point:
Asking someone for feedback on work posted somewhere I know they read feels like I’m whining about not having feedback (and maybe whining about them not giving me feedback). On the other hand, sending a link to a gdoc feels like “I thought that could interest you”, which seems better to me.
There’s also the issue that when the work is public, you don’t know if someone has read it and not found it interesting enough to comment, not read it but planned to do it later, read it and planned to comment later. Depending on which case they are in, me asking for feedback can trigger even more problems (like them being annoyed because they don’t feel I let them the time to do it by themselves). Whereas when I share a doc, there’s only one state of knowledged for the other (not having read the doc and not knowing it exists).
Concerning steve’s second point:
2. Maybe it feels like “I want feedback for my own personal benefit” when it’s already posted, as opposed to “I want feedback to improve this document which I will share with the community” when it’s not yet posted. So it feels more selfish, instead of part of a community project. For that problem, maybe you’d want to frame it as “I’m planning to rewrite this post / write a follow-up to this post / give a talk based on this post / etc., can you please offer feedback on this post to help me with that?” (Assuming that’s in fact the case, of course, but most posts have follow-up posts...)
I don’t feel that personally. I basically take a stance of trying to do things I feel are important for the community, so if I publish something, I don’t feel like feedback is for my own benefit. Indeed, I would gladly have only constructive negative feedback for my posts instead of no feedback at all; this is pretty bad personnally (in terms of ego for example) but great for the community because it put my ideas to the test and forces me to improve them.
Now I want to go back to Raemon.
I think there are a number for features LW could build to improve this situation
Agreed. My diagnostic of the situation is that to ensure consistent feedback, it probably need to be at least slightly an obligation. The two examples of process producing valuable feedback that I have in mind are gdocs comments and peer-review for conferences/journals. In both cases, the reviewer has an obligation to do the review (social obligation for the gdoc, because it was shared explicity to you, and community obligation for the peer-review, because that’s a part of your job and the conference/journal editor asked you to review the paper). Without this element of obligation, it’s far to easy to not give feedback, even when you might have something valuable to say!
Note that I’m part of the problem: this week, I spent a good couple of hours commenting in details a 25 pages technical gdoc for a fellow researcher who asked me, but I haven’t published a decent feedback on the AF for quite some time. And when I look at my own internal process, this sense of commitment and obligation is a big reason why. (I ended up liking the work, but even that wouldn’t have ensured that I comment it to the extent that I did).
This makes me think that a “simple” solution could be a review process on the AF. Now, I’ve been talking about a proper review process with Habryka among others; getting a clearer idea of how we should judge research for such a review is a big incentive for the trial run of a review that I’m pushing for (and I’m currently rewriting a post about a framing of AI Alignment research that I hope will help a lot for that).
Yet after thinking about it further yesterday and today, it might be possible to split the establishment of such a review process for the AF in two step.
Step 1: Everyone with a post on the AF can ask for feedback. This is not considered neat per review, just the sort of thing that a fellow researchers would say if you shared the post as a gdoc to them. On the other hand, a group of people (working researchers let’s say) propose themselves to give such feedback at a given frequency (once a week for example).After that, we probably only need to find a decent enough way to order requests for feedback (prioritizing posts with no feedback, prioritizing people without the network to ask personally for feedback...), and it could be up and running.
Step 2: Establish a proper peer-review system, where you can ask for peer-review on a post, and if the review is good enough, it gets a “peer-review” tag that is managed by admins only. Doing this correctly will probably require standards for such a review, a stronger commitment by reviewers (and so finding more incentives for them to participate), and additional infrastructure (code, managing the review process, maybe sending a newsletter?
In my mind, step 1 is here for getting some feedback on your work, and step 2 is for getting prestige. I believe that both are important, but I’m more starving for feedback. And I also think that doing the step 1 could be really fast, and even if fails, there’s not big downside to the AF (whereas fucking up step 2 seems more fraught with bad consequences).
Also another point for the difference in ease to give feedback in Gdoc vs posts: implicitly, almost all shared gdocs come with a “Come at me bro” request. But when I read a post, it’s not always clear whether the poster want me to come at them or not. You also tend to know a bit more the people that share gdocs with you than posters on the AF. So being able to signal “I really want you to come at me” might help, although I doubt it’s the complete solution.
Could this be solved just by posting your work and then immediately sharing the link with people you specifically want feedback from? That way there’s no expectation that they would have already seen it. (Granted, this is slightly different from a gdoc in that you can share a gdoc with one person, get their feedback, then share with another person, while what I suggested requires asking everyone you want feedback from all at once.)
Thanks for the idea! I agree that it probably helps, and it solves my issue with the state of knowledge of the other.
That being said, I don’t feel like this solves my main problem: it still feel to me as pushing too hard. Here the reason is that I post on a small venue (rarely more than a few posts per day) that I know the people I’m asking feedback too read regularly. So if I send them such a message at the moment I publish, it feels a bit like I’m saying that they wouldn’t read and comment it without that, which is a bit of a problem.
(I’m interested to know if researchers on the AF agree with that feeling, or if it’s just a weird thing that only exists in my head. When I try to think about being at the other end of such a message, I see myself as annoyed, at the very least).
I can’t find a reference for how to test if an inferred (or just given) reward function for a system can be used to predict decently enough what the system actually does.
What I can find are references about the usual IRL/preference learning setting, where there is a true reward function, known to us but unknown to the inference system; then the inferred reward function is evaluated by training a policy on it and seeing how much reward (or how much regret) it gets on the true reward function.
But that’s a good setting for checking if the reward is good for learning to do the task, not for checking if the reward is good for predicting what this specific system will do.
The best thing I have in mind right know is to find a bunch of different initial conditions, train on the reward from these conditions, and mix all policies together to get a distribution on the action at each state, and compare that with what the system actually does. It seems decent enough, but I would really like to know if someone has done something similar in the literature.
(Agents and Devices points in the right direction, but it’s focused on prediction which of the agent mixture or the device mixture is more probable in the posterior, which is a different problem.)
A month after writing my post on Focus as a crucial component of goal-directedness, I think I see things clearer about its real point. You can decompose the proposal in my post into two main ideas:
How much a system S is trying to accomplish a goal G can be captured by the distance of S to the set of policies maximally goal-directed towards G.
The set of policies maximally directed towards G is the set of policies trained by RL (for every amount of resource above a threshold) on the reward corresponding to G.
The first idea is what focus is really about. The second doesn’t work as well, and multiple people pointed out issues with it. But I still find powerful the idea that focus on a goal mesures the similarity with a set of policy that only try to accomplish this goal.
Now the big question left is: can we define the set of policies maximally goal-directed towards G in a clean way that captures our intuitions?