Cam comments on Silicon Morality Plays: The Hyperstition Progress Report

Cam 1 Dec 2025 5:49 UTC
3 points
0
Hey Adele—Geodesic checking in here. We plan to just use a completely new token. We’ll have Aaron and his team create the data with something like [token] and then pass just this synthetic dataset through a new tokenizer. So, our final model will have a final vocabularly one larger than our control, which is never seen in the original pre-training corpus.