Cool! Unfortunately I’m not really sure if the idea itself is compatible with turning a profit—modern business models would push for it to leak data or include ads in ways that would defeat the purpose.
I’ll eventually get one of the good macs if I have to, but I’m giving that decision another year or so to become clearer whether or not it’ll be really necessary in the long run.
I’ve also heard some very promising things about eventually being able to do a one-time investment of renting fancy compute for initial training, and then compressing the trained model to run on smaller hardware.
Yeah, there’s a reason I specified ‘compensation’ rather than ‘profit’. :) Executive function assistants of some kind could be useful for me too, but whether it’d be useful enough to put the work into it as its own reward … well, that’s a question.
And, yeah, if you want to either rent the GPU yourself or have someone do training for you, and you don’t mind the training data going into the cloud, that’s the best way to do it. Tuning takes more compute than inference, in general.
(I don’t think personally identifying training data is particularly helpful for tuning; you’re trying to get methods, approach and formatting down, not so much memory, though it may pick up on a few things. Not to mention if you ever felt like letting it out of the box and sharing your helpful assistant. Retrieval-augmented context is better for memory outside pretraining.)
Quantized models have good performance/speed tradeoffs—a 4, 5 or 6 bit quantization of a larger model still captures most of the performance improvement over smaller models (that would fit in the same memory without quantization) of equivalent quality otherwise. You can indeed run inference on much larger models than you can train.
Cool! Unfortunately I’m not really sure if the idea itself is compatible with turning a profit—modern business models would push for it to leak data or include ads in ways that would defeat the purpose.
I’ll eventually get one of the good macs if I have to, but I’m giving that decision another year or so to become clearer whether or not it’ll be really necessary in the long run.
I’ve also heard some very promising things about eventually being able to do a one-time investment of renting fancy compute for initial training, and then compressing the trained model to run on smaller hardware.
Yeah, there’s a reason I specified ‘compensation’ rather than ‘profit’. :) Executive function assistants of some kind could be useful for me too, but whether it’d be useful enough to put the work into it as its own reward … well, that’s a question.
And, yeah, if you want to either rent the GPU yourself or have someone do training for you, and you don’t mind the training data going into the cloud, that’s the best way to do it. Tuning takes more compute than inference, in general.
(I don’t think personally identifying training data is particularly helpful for tuning; you’re trying to get methods, approach and formatting down, not so much memory, though it may pick up on a few things. Not to mention if you ever felt like letting it out of the box and sharing your helpful assistant. Retrieval-augmented context is better for memory outside pretraining.)
Quantized models have good performance/speed tradeoffs—a 4, 5 or 6 bit quantization of a larger model still captures most of the performance improvement over smaller models (that would fit in the same memory without quantization) of equivalent quality otherwise. You can indeed run inference on much larger models than you can train.