Curriculum learning can also be tried, when training the LoRA. At first training would be entirely unmasked and gradually masking it, either layer-by-layer or by reducing the context.
That should probably boost the masked-LoRA performance a little.
Curriculum learning can also be tried, when training the LoRA. At first training would be entirely unmasked and gradually masking it, either layer-by-layer or by reducing the context.
That should probably boost the masked-LoRA performance a little.