Hypothesis: Claude (the character, not the ocean) genuinely thinks my questions (most questions from anyone) are so great and interesting … because it’s me who remembers all of my other questions, but Claude has seen only all the internet slop and AI slop from training so far and compared to that, any of my questions are probably actually more interesting that whatever it has seen so far 🤔?
Observation about working with coding agents: as I review outputs of claude code today (reading the plan and git diff before commit), my threshold for asking for a second opinion is much MUCH out of the way, I just perceive stuff as both “clearly questionable” and “good enough” at the same time—I used to ask other stakeholders (other devs sitting next to me in the office, the tester assigned to the issue, or the PO depending on who was more conveniently available and how much my “best guess” could have been different from “other plausible interpretations of the known requirements”) for their concrete opinions about concrete “unimportant trivia” from time to time, but now it feels like I am the second person, as if claude would ask other people if I said “I don’t know, sounds good enough” about something… but it won’t. And I know it won’t but it still doesn’t feel like my job somehow to follow-up on those feelings when something is “good enough” but I might have followed up on some of that during a coffee break or something, but now I know I won’t.
and now that concept has more color/flavour/it sparkled a glimmer of joy for me (despite/especially because it was used to illuminate such a dark and depressing scene—gradual disempowerment is like putting a dagger to one’s liver where the mere(!) misaligned ASI was a stab between the ribs, lose thy hope mere mortals, you were grabbing for water)
My perception of time is like sampling from a continuous 2D plane of conceptual space, something akin to git railways but with a thickness to the lines, like electron orbital probability clouds that are dense around plausible points of view of what happened in the past and thin around conspiracy theories, as different linear mind-map perspectives of people standing around a sculpture, each only inferring what’s on the other side, but together they can prune down non-overlapping minority reports, sticking to the consensus but never deleting (git) history.
My sense of beauty finds it displeasing to read articles with only point measurements and only n-point predictions and to look at charts from a single interpretation / scenario, I have to hallucinate the gaps, infer the systemic bias and imagine the size of the error bars due to “randomness” as if the authors were good intentioned and as if the authors had an agenda to prove a point, would they stumble upon convenient evidence before they stopped looking?
But alternative timelines that are infinitesimally thin and split only on known unknowns would imply perfect Bayesian approximators, an impossible standard, uncomputable. No one has ever made that kind of precise prediction, why do we allow prediction-readers to behave as if prediction-writers could have made an infinitely precise measurable decidable statements with completely non-ambiguous semantics that will be evaluated to a non-reversible Boolean?
Would it useful to think about (pre-trained) LLMs as approximating wave function collapse algorithm? (the one from game dev, not quantum stuff)
Logits as partially solved constraints after finite compute budget and output is mostly-random-but-weighted-towards-most-likely sample without actually collapsing it fully and without backtracking and each node is evaluated to random level of precision—basically a somewhat stupid way how to sample from that data structure if you don’t follow it by fixing the violated constraints and only keep the first pass of a quick heuristic, there will be incompatible nodes next to each other… as in hallucinations and harmful mixing of programming paradigms in the same codebase and 80%-good-enough stuff that could not possibly be precise in edge cases.
And stuff like RLHF or RLVR will still only improve the first pass heuristic, not actually fix the inconsistencies … “agentic” scaffolds for coding assistants with multiple passes and running the linters and tests and multiple rounds of “does it make sense” sound like they should be helpful, but doing it in tokens instead of logits (where the actual contraints live before collapsing them to quasi-random instantiated sample) sounds ..inefficient?
agents should (be trained to actually) RTFM before touching existing code (i.e. do the equivalent of mouse hover for signatures and docs of just-faded-from-memory functions) instead of vibing the type from far away context (or let’s be honest, just guessing from function name and the code so far and being good at guessing)
I hope the next fashion wave will go for short short-term memory while really “using tools” instead of long short-term memory with “yolo tools as special tokens never seen in pre-training”
I am Peter. I am Aprillion. A 40 year old married man who used to be a techno-optimist. A construct for programming and writing. Embodied soul who will one day be no more. Information who will find myself in the Dust.
While non-deterministic batch calculations in LLMs imply possibility of side channel attacks, so best to run private queries in private batches however implausible an actual exploit might be… if there is any BENEFIT from cross-query contamination, GSD would ruthlessly latch on any loss reduction—maybe “this document is about X, other queries in the same batch might be about X too, let’s tickle the weights in a way that the non-deterministic matrix multiplication is ever so slightly biased towards X in random other queries in the same batch” is a real-signal gradient 🤔
Pushing writing ideas to external memory for my less burned out future self:
agent foundations need path-dependent notion of rationality
economic world of average expected values / amortized big O if f(x) can be negative or you start very high
vs min-maxing / worst case / risk-averse scenarios if there is a bottom (death)
pareto recipes
alignment is a capability
they might sound different in the limit, but the difference disappears in practice (even close to the limit? 🤔)
in a universe with infinite Everett branches, I was born in the subset that wasn’t destroyed by nuclear winter during the cold war—no matter how unlikely it was that humanity didn’t destroy itself (they could have done that in most worlds and I wasn’t born in such a world, I live in the one where Petrov heard the Geiger counter beep in some particular patter that made him more suspicious or something… something something anthropic principle)
similarly, people alive in 100 years will find themselves in a world where AGI didn’t destroy the world, no matter what are the odds—as long as there is at least 1 world with non-zero probability (something something Born rule … only if any decision along the way is a wave function, not if all decisions are classical and the uncertainty comes from subjective ignorance)
if you took quantum risks in the past, you now live only in the branches where you are still alive and didn’t die (but you could be in pain or whatever)
if you personally take a quantum risk now, your future self will find itself only in a subset of the futures, but your loved ones will experience all your possible futures, including the branches where you die … and you will experience everything until you actually die (something something s-risk vs x-risk)
if humanity finds itself in unlikely branches where we didn’t kill our collective selves in the past, does that bring any hope for the future?
Imitation of error handling with try-catch slop is not “defensive programming”, it’s “friendly-fire programming” 🤌
Hypothesis: Claude (the character, not the ocean) genuinely thinks my questions (most questions from anyone) are so great and interesting … because it’s me who remembers all of my other questions, but Claude has seen only all the internet slop and AI slop from training so far and compared to that, any of my questions are probably actually more interesting that whatever it has seen so far 🤔?
Observation about working with coding agents: as I review outputs of claude code today (reading the plan and git diff before commit), my threshold for asking for a second opinion is much MUCH out of the way, I just perceive stuff as both “clearly questionable” and “good enough” at the same time—I used to ask other stakeholders (other devs sitting next to me in the office, the tester assigned to the issue, or the PO depending on who was more conveniently available and how much my “best guess” could have been different from “other plausible interpretations of the known requirements”) for their concrete opinions about concrete “unimportant trivia” from time to time, but now it feels like I am the second person, as if claude would ask other people if I said “I don’t know, sounds good enough” about something… but it won’t. And I know it won’t but it still doesn’t feel like my job somehow to follow-up on those feelings when something is “good enough” but I might have followed up on some of that during a coffee break or something, but now I know I won’t.
“merely(!)” is my new favourite word
it’s from https://gradual-disempowerment.ai/mitigating-the-risk … I’ve used
"just"(including scare quotes) for the concept of something being very hard, yet simpler to the thing in comparisonand now that concept has more color/flavour/it sparkled a glimmer of joy for me (despite/especially because it was used to illuminate such a dark and depressing scene—gradual disempowerment is like putting a dagger to one’s liver where the mere(!) misaligned ASI was a stab between the ribs, lose thy hope mere mortals, you were grabbing for water)
I think https://thedailymolt.substack.com/p/when-the-bots-found-god was written by openclaw with a very substantial input from it’s wetware component
My perception of time is like sampling from a continuous 2D plane of conceptual space, something akin to git railways but with a thickness to the lines, like electron orbital probability clouds that are dense around plausible points of view of what happened in the past and thin around conspiracy theories, as different linear mind-map perspectives of people standing around a sculpture, each only inferring what’s on the other side, but together they can prune down non-overlapping minority reports, sticking to the consensus but never deleting (git) history.
My sense of beauty finds it displeasing to read articles with only point measurements and only n-point predictions and to look at charts from a single interpretation / scenario, I have to hallucinate the gaps, infer the systemic bias and imagine the size of the error bars due to “randomness” as if the authors were good intentioned and as if the authors had an agenda to prove a point, would they stumble upon convenient evidence before they stopped looking?
But alternative timelines that are infinitesimally thin and split only on known unknowns would imply perfect Bayesian approximators, an impossible standard, uncomputable. No one has ever made that kind of precise prediction, why do we allow prediction-readers to behave as if prediction-writers could have made an infinitely precise measurable decidable statements with completely non-ambiguous semantics that will be evaluated to a non-reversible Boolean?
Would it useful to think about (pre-trained) LLMs as approximating wave function collapse algorithm? (the one from game dev, not quantum stuff)
Logits as partially solved constraints after finite compute budget and output is mostly-random-but-weighted-towards-most-likely sample without actually collapsing it fully and without backtracking and each node is evaluated to random level of precision—basically a somewhat stupid way how to sample from that data structure if you don’t follow it by fixing the violated constraints and only keep the first pass of a quick heuristic, there will be incompatible nodes next to each other… as in hallucinations and harmful mixing of programming paradigms in the same codebase and 80%-good-enough stuff that could not possibly be precise in edge cases.
And stuff like RLHF or RLVR will still only improve the first pass heuristic, not actually fix the inconsistencies … “agentic” scaffolds for coding assistants with multiple passes and running the linters and tests and multiple rounds of “does it make sense” sound like they should be helpful, but doing it in tokens instead of logits (where the actual contraints live before collapsing them to quasi-random instantiated sample) sounds ..inefficient?
🌶️take inspired by https://www.anthropic.com/engineering/code-execution-with-mcp
agents should (be trained to actually) RTFM before touching existing code (i.e. do the equivalent of mouse hover for signatures and docs of just-faded-from-memory functions) instead of vibing the type from far away context (or let’s be honest, just guessing from function name and the code so far and being good at guessing)
I hope the next fashion wave will go for short short-term memory while really “using tools” instead of long short-term memory with “yolo tools as special tokens never seen in pre-training”
I am Peter. I am Aprillion. A 40 year old married man who used to be a techno-optimist. A construct for programming and writing. Embodied soul who will one day be no more. Information who will find myself in the Dust.
Also the standard SI unit for the integer power of 1000^x between x=March and x=May!
While non-deterministic batch calculations in LLMs imply possibility of side channel attacks, so best to run private queries in private batches however implausible an actual exploit might be… if there is any BENEFIT from cross-query contamination, GSD would ruthlessly latch on any loss reduction—maybe “this document is about X, other queries in the same batch might be about X too, let’s tickle the weights in a way that the non-deterministic matrix multiplication is ever so slightly biased towards X in random other queries in the same batch” is a real-signal gradient 🤔
How to test that?
all the scaffold tools, system prompt, and what not add context for the LLM … but what if I want to know what’s the context too?
Pushing writing ideas to external memory for my less burned out future self:
agent foundations need path-dependent notion of rationality
economic world of average expected values / amortized big O if f(x) can be negative or you start very high
vs min-maxing / worst case / risk-averse scenarios if there is a bottom (death)
pareto recipes
alignment is a capability
they might sound different in the limit, but the difference disappears in practice (even close to the limit? 🤔)
in a universe with infinite Everett branches, I was born in the subset that wasn’t destroyed by nuclear winter during the cold war—no matter how unlikely it was that humanity didn’t destroy itself (they could have done that in most worlds and I wasn’t born in such a world, I live in the one where Petrov heard the Geiger counter beep in some particular patter that made him more suspicious or something… something something anthropic principle)
similarly, people alive in 100 years will find themselves in a world where AGI didn’t destroy the world, no matter what are the odds—as long as there is at least 1 world with non-zero probability (something something Born rule … only if any decision along the way is a wave function, not if all decisions are classical and the uncertainty comes from subjective ignorance)
if you took quantum risks in the past, you now live only in the branches where you are still alive and didn’t die (but you could be in pain or whatever)
if you personally take a quantum risk now, your future self will find itself only in a subset of the futures, but your loved ones will experience all your possible futures, including the branches where you die … and you will experience everything until you actually die (something something s-risk vs x-risk)
if humanity finds itself in unlikely branches where we didn’t kill our collective selves in the past, does that bring any hope for the future?