This looks very useful, although I think the performance improvements in the more recent open-weight, smaller, quantized models (like Gemma-2, Qwen-2.5, or Phil-3.5) have made it much more reasonable to run such a model locally for this purpose rather than using a remote API, since sending data about the webpages they visit to OpenAI is a repulsive idea to many people (it would also have cost benefits over huge models like GPT-4, but the increase in benefit/​cost ratio would be an epsilon increase compared to budget proprietary models like Gemini-2.0-Flash).
This looks very useful, although I think the performance improvements in the more recent open-weight, smaller, quantized models (like Gemma-2, Qwen-2.5, or Phil-3.5) have made it much more reasonable to run such a model locally for this purpose rather than using a remote API, since sending data about the webpages they visit to OpenAI is a repulsive idea to many people (it would also have cost benefits over huge models like GPT-4, but the increase in benefit/​cost ratio would be an epsilon increase compared to budget proprietary models like Gemini-2.0-Flash).