With how CoDI throws away the hidden state and only uses the kv values on the <|eocot|> token the accuracy drop after latent 5 could just be kv values can’t store more info.
With how CoDI throws away the hidden state and only uses the kv values on the <|eocot|> token the accuracy drop after latent 5 could just be kv values can’t store more info.