Yeah, there’s a reason I specified ‘compensation’ rather than ‘profit’. :) Executive function assistants of some kind could be useful for me too, but whether it’d be useful enough to put the work into it as its own reward … well, that’s a question.
And, yeah, if you want to either rent the GPU yourself or have someone do training for you, and you don’t mind the training data going into the cloud, that’s the best way to do it. Tuning takes more compute than inference, in general.
(I don’t think personally identifying training data is particularly helpful for tuning; you’re trying to get methods, approach and formatting down, not so much memory, though it may pick up on a few things. Not to mention if you ever felt like letting it out of the box and sharing your helpful assistant. Retrieval-augmented context is better for memory outside pretraining.)
Quantized models have good performance/speed tradeoffs—a 4, 5 or 6 bit quantization of a larger model still captures most of the performance improvement over smaller models (that would fit in the same memory without quantization) of equivalent quality otherwise. You can indeed run inference on much larger models than you can train.
Yeah, there’s a reason I specified ‘compensation’ rather than ‘profit’. :) Executive function assistants of some kind could be useful for me too, but whether it’d be useful enough to put the work into it as its own reward … well, that’s a question.
And, yeah, if you want to either rent the GPU yourself or have someone do training for you, and you don’t mind the training data going into the cloud, that’s the best way to do it. Tuning takes more compute than inference, in general.
(I don’t think personally identifying training data is particularly helpful for tuning; you’re trying to get methods, approach and formatting down, not so much memory, though it may pick up on a few things. Not to mention if you ever felt like letting it out of the box and sharing your helpful assistant. Retrieval-augmented context is better for memory outside pretraining.)
Quantized models have good performance/speed tradeoffs—a 4, 5 or 6 bit quantization of a larger model still captures most of the performance improvement over smaller models (that would fit in the same memory without quantization) of equivalent quality otherwise. You can indeed run inference on much larger models than you can train.