Personnaly I’m just self hosting a bunch of stuff:
litellm proxy, to connect to any llm provider
langfuse for observability
faster whisper server, the v3 turbo ctranslate2 versions takes 900mb of vram and are about 10 times faster than I speak
open-webui, as it’s connected to litellm and ollama, i avoid provider lock in and keep all my messages on my backend instead of having some at openai, some at anthropic, etc. Additionally it supports artifacts and a bunch of other nice features. It also allows me to craft my perfecr prompts. And to jailbreak when needed.
piper for now for tts but plan on switching to a selfhosted fish audio.
for extra privacy I have a bunch of ollama models too. Mistral Nemo seems to be quite capable. Otherwise a few llama3, qwen2 etc.
for embeddings either bge-m3 or some self hosted jina.ai models.
I made a bunch of scripts to pipe my microphone / speaker / clipboard / llms together for productivity. For example I press 4 times on shift, speak, then shift again, and voila what I said was turned into an anki flashcard.
As providers, I mostly rely on openrouter.ai which allows to swap between providers without issue. These last few months I’ve been using sonnet 3.5 but change as soon as there’s a new frontier model.
For interacting with codebases I use aider.
So at the end all my cost comes from API calls and none from subscriptions.
Sharing my setup too:
Personnaly I’m just self hosting a bunch of stuff:
litellm proxy, to connect to any llm provider
langfuse for observability
faster whisper server, the v3 turbo ctranslate2 versions takes 900mb of vram and are about 10 times faster than I speak
open-webui, as it’s connected to litellm and ollama, i avoid provider lock in and keep all my messages on my backend instead of having some at openai, some at anthropic, etc. Additionally it supports artifacts and a bunch of other nice features. It also allows me to craft my perfecr prompts. And to jailbreak when needed.
piper for now for tts but plan on switching to a selfhosted fish audio.
for extra privacy I have a bunch of ollama models too. Mistral Nemo seems to be quite capable. Otherwise a few llama3, qwen2 etc.
for embeddings either bge-m3 or some self hosted jina.ai models.
I made a bunch of scripts to pipe my microphone / speaker / clipboard / llms together for productivity. For example I press 4 times on shift, speak, then shift again, and voila what I said was turned into an anki flashcard.
As providers, I mostly rely on openrouter.ai which allows to swap between providers without issue. These last few months I’ve been using sonnet 3.5 but change as soon as there’s a new frontier model.
For interacting with codebases I use aider.
So at the end all my cost comes from API calls and none from subscriptions.