I’d like to finetune or (maybe more realistically) prompt engineer a frontier LLM imitate me. Ideally not just stylistically but reason like me, drop anecodtes like me, etc, so it performs at like my 20th percentile of usefulness/insightfulness etc.
Is there a standard setup for this?
Examples of use cases include receive an email and send[1] a reply that sounds like me (rather than a generic email), read Google Docs or EA Forum posts and give relevant comments/replies, etc
More concretely, things I do that I think current generation LLMs are in theory more than capable of:
Read a Google Doc and identify a subtleish reasoning fallacy the poster made, or one of my pet peeves
Read a Forum post and mention some comment thread or post I’ve written before that addresses the point made.
talk about why some email/twitter post/etc relates to one of my ~300 favorite historical facts/my ~top 100 favorite jokes
drop in semi-relevant facts about naked mole rats or Sparta etc, ~unprompted
the computer use part is not essential, I don’t need it to be fully automated, i’m not even sure I want to give Opus et al access to my email account anyway.
Curate a dataset of lots of your own texts from multiple platforms. Split into 1k char chunks and generate embeddings.
When query text is received, do embedding search to find most similar past texts, then give these as input along with query text to LLM and ask it to generate a novel text in same style.
openai text-embedding-3-small works fine, I have a repo I could share if the dataset is large or complex format or whatever.
I’d like to finetune or (maybe more realistically) prompt engineer a frontier LLM imitate me. Ideally not just stylistically but reason like me, drop anecodtes like me, etc, so it performs at like my 20th percentile of usefulness/insightfulness etc.
Is there a standard setup for this?
Examples of use cases include receive an email and send[1] a reply that sounds like me (rather than a generic email), read Google Docs or EA Forum posts and give relevant comments/replies, etc
More concretely, things I do that I think current generation LLMs are in theory more than capable of:
Read a Google Doc and identify a subtleish reasoning fallacy the poster made, or one of my pet peeves
Read a Forum post and mention some comment thread or post I’ve written before that addresses the point made.
talk about why some email/twitter post/etc relates to one of my ~300 favorite historical facts/my ~top 100 favorite jokes
drop in semi-relevant facts about naked mole rats or Sparta etc, ~unprompted
the computer use part is not essential, I don’t need it to be fully automated, i’m not even sure I want to give Opus et al access to my email account anyway.
Have you tried RAG?
Curate a dataset of lots of your own texts from multiple platforms. Split into 1k char chunks and generate embeddings.
When query text is received, do embedding search to find most similar past texts, then give these as input along with query text to LLM and ask it to generate a novel text in same style.
openai text-embedding-3-small works fine, I have a repo I could share if the dataset is large or complex format or whatever.