Afaict the idea is that base models are all about predicting text, and therefore extremely sensitive to “tropes”; e.g. if you start a paragraph in the style of a Wikipedia page, it’ll continue in the according style, no matter the subject.
Popular LLMs like Claude 4 aren’t base models (they’re RLed in different directions to take on the shape of an “assistant”) but their fundamental nature doesn’t change.
Sometimes the “base model character” will emerge (e.g. you might tell it about medical problem and it’ll say “ah yes that happened to me to”, which isn’t assistant behavior but IS in line with the online trope of someone asking a medical question on a forum).
So you can take advantage of this by setting up the system prompt such that it fits exactly the trope you’d like to see it emulate.
E.g. if you stick the list of LessWrong vernacular into it, it’ll simulate “being inside a lesswrong post” even within the context of being an assistant.
Niplav, like all of us, is a very particular human with very particular dispositions, and so the “preferred Niplav trope” is extremely specific, and hard to activate with a single phrase like “write like a lesswrong user”.
SO Niplav has to write a “semantic soup” containing a slurry of words that are an approximation of the “Niplav’s preferred trope” and the idea is that each of these words will put the LLM in the right “headspace” and make it think it’s inside whatever this mix ends up pointing at.
It’s a very schizo-Twitter way of thinking, where sometimes posts will literally just be a series of disparate words attempting to arrive at some vague target or other. You can try it out! What are the ~100 words, links, concepts that best define your world? The LLM might be good at understanding what you mean if you feed it this.
Afaict the idea is that base models are all about predicting text, and therefore extremely sensitive to “tropes”; e.g. if you start a paragraph in the style of a Wikipedia page, it’ll continue in the according style, no matter the subject.
Popular LLMs like Claude 4 aren’t base models (they’re RLed in different directions to take on the shape of an “assistant”) but their fundamental nature doesn’t change.
Sometimes the “base model character” will emerge (e.g. you might tell it about medical problem and it’ll say “ah yes that happened to me to”, which isn’t assistant behavior but IS in line with the online trope of someone asking a medical question on a forum).
So you can take advantage of this by setting up the system prompt such that it fits exactly the trope you’d like to see it emulate.
E.g. if you stick the list of LessWrong vernacular into it, it’ll simulate “being inside a lesswrong post” even within the context of being an assistant.
Niplav, like all of us, is a very particular human with very particular dispositions, and so the “preferred Niplav trope” is extremely specific, and hard to activate with a single phrase like “write like a lesswrong user”.
SO Niplav has to write a “semantic soup” containing a slurry of words that are an approximation of the “Niplav’s preferred trope” and the idea is that each of these words will put the LLM in the right “headspace” and make it think it’s inside whatever this mix ends up pointing at.
It’s a very schizo-Twitter way of thinking, where sometimes posts will literally just be a series of disparate words attempting to arrive at some vague target or other. You can try it out! What are the ~100 words, links, concepts that best define your world? The LLM might be good at understanding what you mean if you feed it this.
I wanted to say that I particularly appreciated this response, thank you.