I think that in theory there is nothing wrong with having your memory wiped every iteration, and that such an architecture could in theory get us to SC. I just think it’s not very efficient and there would be a lot of repeated computation happening between predicting each word.
I mean yeah, totally agree re: repeated computation and inefficiency. But there’s no rule that says the first SC has to be close to the limits of efficiency. On the contrary, just as how the first viable airplanes were extremely shitty compared to the airplanes of today, the first viable SC will probably be shitty in various ways (e.g. data-efficiency) and perhaps this’ll be one of those ways.
I think that in theory there is nothing wrong with having your memory wiped every iteration, and that such an architecture could in theory get us to SC. I just think it’s not very efficient and there would be a lot of repeated computation happening between predicting each word.
I mean yeah, totally agree re: repeated computation and inefficiency. But there’s no rule that says the first SC has to be close to the limits of efficiency. On the contrary, just as how the first viable airplanes were extremely shitty compared to the airplanes of today, the first viable SC will probably be shitty in various ways (e.g. data-efficiency) and perhaps this’ll be one of those ways.