Sure—i am currently on my phone but I can paint a quick picture.
Local Memory—I keep my own internal predictions on fatebook and have it synced locally to my obsidian (a local markdown file manager). Then, I use Claude’s obsidian MCP to help me write down my daily notes from work and a jumbled context of my messages with coworkers, random web comments and other messaging services so it can help me to keep my profiles on my friends and projects up to date. (It is again, glued together with more MCPs that have limited access to my chatlogs with my friends). Ofc, with human in the loop.
Delphi—I wrote a simple MCP that basically just does the Delphi method with LLMs. Usually facilitated by Claude, it calls a panel of experts. These experts are the topK ranked models on LLM arena. And it does the questionaire generation based on my question, hand them out, aggregate the consensus, and decide if one is reached! Again, it has the context needed from me through my Obsidian. I use this for questions that are more personal or that there are not good liquidity for on prediction markets.
winstonBosan
LLM Pareto Frontier But Live
Load Bearing Magic
Meta releases Llama-4 herd of models
I will bite.
First of all, I appreciate the effort of trying to communicate better and hammering down the neat borders of word and how they used across domains—especially for words that are often used interchangeably and carelessly.
TLDR: Sometimes posts just get unlucky! And your style is on the verbose side and I am still somewhat confused about your value props.
It seems like your frustration is from a lack of responses—often, a lack of response might just be luck based and how the LW algorithm works (exponential time decay). Maybe you posted at a time that most forum users are asleep, and/or you are running against the headwind of literally one of the most popular article of all time. Sometimes, you just get unlucky! Twice even!However, I can also say that the content is not written very legibly. It took me a long time to get the actual punch line and understand what you really want—“Can someone technical say something about how you want these terms to be used?”.
In addition, the post is written very verbosely. It takes a long time to get to the point, it is not clear about what it wants until the very end. It doesn’t say how you are going to this, or that if it is even worth engaging with you because it is not clear how you’d help to delineate the word boundary from your position. I am still unsure about the exact value proposition on how better delineating these words would lead to reduced p(doom).
Keep trying!
I mostly use Claude desktop client with MCPs (like additional plugins and tooling for Claude to use) for:
2-iter Delphi method involving calling Gemini2.5pro+whatever is top at the llm arena of the day through open router.
Metaculus, Kalshi and Manifold search for quick intuition on subjects
Smart fetch (for ocr’ing pdf, images, etc)
Local memory
I like where this wants to go, but I don’t want to get there with bad arguments.
To me, ank, this proposal is neither complete, nor comprehensive, and also not 100% safe.It is not complete because “Ideally those non-agentic GPUs are like sandboxes that run an LLM internally and can only spit out the text or image as safe output” is just a rephrasing of the oracle[1] problem. Rephrasing does not a solution make.
It is not comprehensive because it rely on GPU manufacturers to do the right thing. If Nvidia realizes it can just bypass the protocol and takeover the world itself, the pressure for responsible actions would just suddenly evaporate.
It is not 100% safe because it is not complete nor comprehensive.
In addition, “But lets not make the perfect be the enemy of good” from the your comment below seems like a subtle bait and switch. In the original post, 100% safety is waxed poetic about. And yet in your response to buck, that goal is truncated to softer (and IMO more reasonable) stance that we’d want a “50% sure we’ll have 99% security in most cases”. The charitable reading is that your proposal is the only way that get 100% comprehensive safety, but it doesn’t mean we can get there right away. However, the ending of your comment—“Let’s not give up because we are not 100% sure we’ll have 100% security” feels too motte-and-bailey argumenty for me; you are the person who suggested that this is the “only” way towards “100% safe[ty]”, not us.
My intuition was that it was a semi-public meeting. My memory is hazy but there was a lot of people, and they were on twitter spaces (a mostly “public forum” style audio chatroom). So I don’t think it is a secret per say.
I like this post. But I think it is somewhat common that the CEO can often feel insecure despite being all powerful and cannot be threatened by you.
Case Study: Elon Musk and disgruntled Twitter Engineer
Elon Musk is the majority share holder (along side with his consortium) of twitter, yet he is left flustered by an engineer asking technical questions in an aggressive manner. He had all the room to be magnanimous, caring and respectful—he could’ve turned a confrontational situation to one where he concedes that he knows little about the technical stack and turned an enemy into an ally a la Julius Caesar style. Yet, after mumbling for a bit, he just calls that person a jackass and removes him from the call. In his world, face is extremely important and he just thinks very little of the existing twitter engineers and their competency independent from truth.
The world of public opinion and perception of power is a funny thing. I think for many CEOs who have built their personal understanding of themselves around their force of will, it is often hard to take any criticism/doubt from their underlings even when they are technically invulnerable—they simply don’t want to appear weak to their underlings (whether for true or ill).
“Tokenizer is the root of all evil”—Abraham Lincoln before the Battle of Midway
If “you can make a decision while still being uncertain about whether it is the right decision”. Then why can’t you think about “was that the right decision”? (Lit. Quote above vs original wording)
It seems like what you want to say is—be doubtful or not, but follow through with full vigour regardless. If that is the case, I find it to be reasonable. Just that the words you use are somewhat irreconcilable.
I’m very confused. Because it seems like for you decision should not only clarify matters and narrow possibilities, but also eliminate all doubt entirely and prune off all possible worlds where the counterfactual can even be contemplated.
Perhaps that’s indeed how you define the word. But using such a stringent definition, I’d have to say I’ve never decided anything in my life. This doesn’t seem like the most useful way to understand “decision”—it diverges enough from common usage and mismatches with the hyperdimensional-cloud of word meaning for decision sufficiently to be useless in conversation with most people.
I am curious, and you have probably thought much about this. But how would the transition happen from the existing economy to this new economy? How do you convince existing property owners to give up their “out-of-proportion” ownership claims? (Would it just be political coercion like in the post? Then who/how would we convince the state?)
Note for posterity: “Let’s think step by step” is joke.
I downvoted this and I feel the urge to explain myself—the LLMism in the writing is uncanny.
The combination of “Let’s think step by step”, “First…” and “Not so fast…” gives me a subtle but dreadful impression that a highly valued member of the community is being finetuned by model output in real time. This emulation of the “Wait, but!” pattern is a bit too much for my comfort.
My comment hasn’t too much to do with the content but more about how unsettled I feel. I don’t think LLM outputs are all necessarily infohazardous—but I am beginning to see the potentially failure modes that people have been gesturing at for a while.
Prolific as ever! Small nitpick—the SBF interview link appears to be pointing at something else?
FYI I do find that aider using a mixed routing between r1 and o3-mini-high as the architect model with sonnet as the editor model to be slightly better than cursor/windsurf etc.
Or for minimal setup, this is what is ranking the highest on aider-polyglot test:aider --architect --model openrouter/deepseek/deepseek-r1 --editor-model sonnet
Is the bet for general purpose model still open? I guess it depends on the specific resolver/resolution criteria—considering that OpenAI have gotten the answer and solution to most of the hard questions. Does o3′s 25% even count?
The “biologically imposed minimal wage” is definitely going into my arsenal of verbal tools. This is one of the clearest illustration of the same position that has been argued since the dawn of LW.
I think this is a rather legitimate question to ask—I often dream about retiring to an island for the last few months of my life, hangout with friends and reading my books. And then look to the setting sun until my carbon and silicon are repurposed atom by atom.
However, that is just a dream. I suspect the moral of the story is often at the end:
”Don’t panic. Don’t despair. And don’t give up.”
“Winning” is underpractied certainly, but undervalued!? I find that a bit hard to believe. (Currently reading the transcript, will change opinion if winning means differently than what Eliezer oft mentions)
> Rationality is Systematized Winning by Eliezer Yudkowsky