I like the style of this post, thanks for writing it! Some thoughts:
model scaling stops working
Roughly what probability would you put on this? I see this as really unlikely (perhaps <5%) such that ‘scaling stops working’ isn’t part of my model over the next 1-2yrs.
I will be slightly surprised if by end of 2024 there are AI agents running around the internet that are meaningfully in control of their own existence, e.g., are renting their own cloud compute without a human being involved.
Only slightly surprised? IMO being able to autonomously rent cloud compute seems quite significant (technically and legally), and I’d be very surprised if something like this happened on a 1yr horizon. I’d be negatively surprised if the US government didn’t institute regulation on the operation of autonomous agents of this type by the end of 2024, basically due to their potential for misuse and their economic value. It may help to know how you’re operationalizing AIs that are ‘meaningfully aware of their own existence’.
Quick clarifying question—the ability to figure out which direction in weight space an update should be applied in order to modify a neural net’s values seems like it would require a super strong understanding of mechanistic interpretability—something far past current human levels. Is this an underlying assumption for a model that is able to direct how its values will be systematised?