Won’t we have AGI that is slightly less able to jump into existing human roles before we have AGI that can jump into existing human roles? (Borrowing intuitions from Christiano’s Takeoff Speeds) [Edited to remove typo]
Obviously, we wouldn’t notice the slowness from the inside, any more than the characters in a movie would notice that your DVD player is being choppy.
Do you have a causal understanding for why this is the case? I am a bit confused by it
Re: 1, I think it may be important to note that adoption has gotten quicker (e.g. as visualized in Figure 1 here; linking this instead of the original source since you might find other parts of the article interesting). Does this update you, or were you already taking this into account?
When the network is randomly initialized, there is a sub-network that is already decent at the task.
From what I can tell, the paper doesn’t demonstrate this—i.e. I don’t think they ever test the performance of a sub-network with random weights (rather they test the performance of a subnetwork after training only the subnetwork). Though maybe this isn’t what you meant, in which case you can ignore me :)
Thanks a lot for this—I’m doing a lit. review for an interpretability project and this is definitely coming in handy :)
Random note: the paper “Are Visual Explanations Useful? A Case Study in Model-in-the-Loop Prediction” is listed twice in the master list of summarized papers.
I agree, and thanks for the reply. And I agree that even a small chance of catastrophe is not robust. Though I asked because I still care about the probability of things going badly, even if I think that probability is worryingly high. Though I see now (thanks to you!) that in this case our prior that SGD will find look-ahead is still relatively high and that belief won’t change much by thinking about it more due to sensitivity to complicated details we can’t easily know.
Anyway, the question here isn’t whether lookahead will be perfectly accurate, but whether the post-lookahead distribution of next words will allow for improvement over the pre-lookahead distribution.
Can you say a bit more about why you only need look-ahead to improve performance? SGD favors better improvements over worse improvements—it feels like I could think of many programs that are improvements but which won’t be found by SGD. Maybe you would say there don’t seem to be any improvements that are this good and this seemingly easy for SGD to find?
For the risk question, is it asking about positive and negative risk, or just negative risk?