Taylor G. Lunt comments on My AI Predictions for 2027

Taylor G. Lunt 3 Sep 2025 2:16 UTC
1 point
0
Does this not mean the following though?
1. In layer n, the feed-forward network for token position t will potentially waste time doing things already done in layer n during tokens i<t.
2. This puts a constraint on the ability of different layers to represent different levels of abstraction, because now both layer n and n+1 need to be able to detect whether something “seems political”, not just layer n.
3. This means the network needs to be deeper when we have more tokens, because token t needs to wait until layer n+1 to see if token t-1 had the feature “seems political”, and token t+1 needs to wait until layer n+2 to see if token t had the feature “seems political”, and so on.