You’re describing a data augmentation variant of the teacher-student knowledge distillation. It can work well.16 bits/parameter is most commonly supported but 8-bit quantization can also be used.Performance does not depend only on the number of parameters but also on the architecture. High-end smartphones commonly have special-purpose processors for neural networks, so their performance is not bad.
You’re describing a data augmentation variant of the teacher-student knowledge distillation. It can work well.
16 bits/parameter is most commonly supported but 8-bit quantization can also be used.
Performance does not depend only on the number of parameters but also on the architecture.
High-end smartphones commonly have special-purpose processors for neural networks, so their performance is not bad.