Yes. This is literally known as model expansion, see e.g. this paper on a lossless way of doing so. The issue with Opus 3 is that it’s likely an older and less efficient transformer variant architecture, which can’t easily be converted into newer architectures.
Yes. This is literally known as model expansion, see e.g. this paper on a lossless way of doing so. The issue with Opus 3 is that it’s likely an older and less efficient transformer variant architecture, which can’t easily be converted into newer architectures.