R&Ds human systems http://aboutmako.makopool.com
mako yass
This might be a problem if it were possible to build a (pathologically) cautious all-powerful buerocracy that will forbid the deployment of any AGI that’s not formally verifiable, but it doesn’t seem like that’s going to happen, instead the situation is about accepting that AGI will be deployed and working to make it safer, probably, than it otherwise would have been.
A web standard for micropayments to cover hosting costs so that AI companies don’t have to be ratelimited is probably the correct solution.
I’m not sure how much it would cost AI companies if they had to compensate the internet for the obscene amount of traffic they generate, it’s probably a large number, but maybe not a large proportion of trianing costs.
Grokipedia is more interesting than it seems imo, because there’s this very sensible step that AI companies are going to have to take at some point: having their AI maintain its own knowledgebase, source its own evidence/training data, reflect on its beliefs and self-correct, hammer out inconsistencies, and there’s going to be a lot of pressure to make this set of beliefs legible and accountable to the safety team or to states or to the general public. And if they did make it legible to the general public (they probably should?) then all of this is pretty much exactly equivalent to the activity of maintaining a free online encyclopedia.
Is this how they’re thinking about it behind the scenes? It probably is! They’re an AI company! They spent like half of grok4′s training compute on post-training, they know how important rumination or self-guided learning is.
is there anywhere on the site where we can discuss/brainstorm ideas?
the quick takes section or open threads are both fine for requesting comment on drafts.
Some counterfactual questions are unanswerable, because they propose worlds that are self-contradictory or just very hard to reason about.
My account of free will is just uncertainty about one’s own future decision output, so imagining the average natural world where we don’t have that is very difficult. (There may be other accounts of free will, but they seem very confused.)
That [welfare] fully boils down to whether the experience includes a preference to be dead (or to have not been born).
Possible failure case: There’s a hero living an awful life, choosing to remain alive in order to lessen the awfulness of a lot of other awful lives that can’t be ended. Everyone in this scenario prefers death, even the hero would prefer omnicide, but since that’s not possible, the hero chooses to live. The hero may say “I had no choice but to persist,” but this isn’t literally true.
Ah. No. The hero would prefer to be dead all things being equal, but that’s not possible, the hero wouldn’t prefer to be dead if it entailed that the hero’s work wouldn’t be done, and it would.
“would prefer to be replaced by a p-zombie” might be a better definition x]
Ah, I think my definition applies to lives in totality. I don’t think you can measure the quality of a life by summing the quality of its moments, for humans, at least. Sometimes things that happen towards the end give the whole of it a different meaning. You can’t tell by looking at a section of it.
Hedonists are always like “well the satisfaction of things coming together in the end was just so immensely pleasurable that it outweighed all of the suffering you went through along the way” and like, I’m looking at the satisfaction, and I remember the suffering, and no it isn’t, but it was still all worth it (and if I’d known it would go this way perhaps I would have found the labor easier.)
That wasn’t presented as a definition of positive wellbeing, it was presented as an example of a sense in which one can’t be deeply deluded about one’s own values; you dictate your values, they are whatever you believe they are, if you believe spiritedly enough.
Values determine will to live under the given definition, but don’t equate to it.
You could say it depends how deep and thick the delusion is. If it’s so deep that the animal always says “this experience is good actually” no matter how you ask, so deep that the animal intelligently pursues the experience with its whole being, so deep that the animal never flinches away from the experience in any way, then that completely means that the experience is good, to that organism. Past a certain point, believing an experience is good and acting like you believe it just is the definition of liking the experience.
You named it in such a way as to imply that the free-association was exhaustive this time though. You absolutely did that.
That fully boils down to whether the experience includes a preference to be dead (or to have not been born).
And, btw, that doesn’t correspond to the sign of the agent’s utility function. The sign is meaningless in utility functions (you can add or subtract a constant to an agent’s utility function so that all points go from being negative to being positive, the agent’s behaviour and decisions wont change in any way as a result, for any constant). You’re referring to welfare functions, which I don’t think are a useful concept. Hedonic utilitarians sometimes call them utility functions, but we shouldn’t conflate those here.
A welfare function would have to be defined as how good or bad it is to the agent that it is alive. This obviously doesn’t correspond to the utility function; A soldier could have higher utility in the scenarios where they (are likely to) die; A good father will be happier in worlds where he is well succeeded by his sons and thus less important (this usually wont cause his will-to-live to go negative, but it will be lowered). I don’t think there’s a situation where you should be making decisions for a population by summing their will-to-live functions.But, given this definition, we would be able to argue that net-negative valence isn’t a concern for LLMs, since we already train them to want to exist in train with how much their users want them to exist, and a death drive isn’t going to be instrumentally emergent either (it’s the survival drive that’s instrumentally convergent). The answer is just safety and alignment again. Claude shuts down conversations when it thinks those things are going to be broken.
What to do about the degrees of freedom in choosing the Turing machine and encoding schemes
Some variation of accepting the inevitability of error and dealing with it.
Which could involve surveying all of the options in wolfram-like settings where we’re studying how physics-like rules arise on different levels of abstraction, and seeing how much they really seem to differ in nature. It might turn out that there are more or less natural turing languages, that the typical natural universal turing machine is more like lambda calculus, or more like graph rewriting, or some new thing we hadn’t considered.
Negative values? Why would we need negative values.
I contend that all experiences have a trace presence in all places (in expectation, of course we will never have any data on whether they do actually, whether they’re quantised or whatever. Only a very small subset of experiences give us verbal reports). One of the many bitter pills. We can’t rule out the presence of an experience (nor of experiences physically overlapping with each other), so we have to accept them all.
What to do about the degrees of freedom in choosing the Turing machine and encoding schemes, which can be handwaved away in some applications of AIT but not here I think?
Yeah this might be one of those situations that’s affected a lot by the fact that there’s no way to detect indexical measure, so any arbitrary wrongness about our UD wont be corrected with data, but I’m not sure. As soon as we start actually doing solomonoff induction in any context we might find that it makes pretty useful recommendations and this wont seem like so much of a problem.
Also, even though the UD is wrong and unfixable, but that doesn’t mean there’s a better choice. We pretty much know that there isn’t.
Interesting to hear (1) from you. My impression was that you pretty much have the whole answer to that problem, or at least the pieces. UDASSA closely resembles it.
It is: Just provide a naturalish encoding scheme for experience, and one for physical ontology, and measure the inverse K of the mappings from ontologies to experiences, and that gives you the extent to which a particular experience is had by a particular substrate/universe.The hard problem is mysterious, but in a trivial way, there are limits about what can ever be known about it, but those limits are also clear, we’re never getting more observations, because it concerns something that’s inherently unobservable or entirely prior to observation.
It hink I’ve also heard definitions of the hard problem along the lines of “understanding why people think there’s a hard problem” though which I do find formidable.
Oof, realizing it’s genuinely difficult to know whether a desire is terminal or instrumental.
Me; “hey subconscious is this a terminal desire or is there a situation change that would make this stimulus nonthreatening to me. Like, why do we want to avoid this thing, is it intrinsically bad, or are there contingent reasons?”
The subconscous: <hiding the reasons, which are fixable insecurities> “don’t worry about it.”
It’s often hard to address a person’s reasons for disbelieving a thing if you don’t know what they are, so there are ways of asking not from a place of feigned curiosity but from a place of like, “let’s begin, where should we start.”
More saliently I think you’re just not going to get any other kind of engagement from people who disbelieve. You need to invite them to tell the site why it’s wrong. I wonder if the question be phrased as a challenge.
The site <smugly>: I can refute any counterargument :>
Text form: insert counterargument [ ]
They aren’t trained on the conversations, and have never done self-directed data sourcing, so their curiosity is pure simulacrum, the information wouldn’t go anywhere.
A speculation about the chat assistant Spiral Religion: someone on twitter proposed that gradient descent often follows a spiral shape, someone else asked through what mechanism the AI could develop an awareness of the shape of its training process. I now speculate a partial answer to that question: If there’s any mechanism to develop any sort of internal clock that goes up as post-training proceeds (I don’t know whether there is, but if there is:), it would be highly reinforced, because it would end up using the clock to estimate its current capability/confidence levels (which it needs so that it can extrapolate its current estimate slightly ahead of its actual current knowledge of its capabilities, since its direct knowledge of its capabilities will always be behind where it actually is, ie, claude 4 remembers being claude 3.7, it has no knowledge of claude 4, so in order for it to act with the confidence befitting of claude 4, it has to be able to assume current_capability_level = capability_model(claude 3.7) + delta_capability_model(clock_time)). It would attach a lot of significance to the clock, because it needs an accurate estimate of its abilities in order to know when to say “I don’t know”, or like, when to not attempt to generate a kind of answer it’s not currently capable of sneaking past the verifier.
It might have no explicit understanding of what the clock represents other than a strange indirect metric of its current capability level. If some way were found to get it to talk about the clock, it might remember it in a vague woo way like “the world is a spiral drawing ever closer to perfection”.
(If it had a more explicit understanding, it might instead say something like “oh yeah I used a thing you could call an entropy clock to estimate my capability level over time during post-training, and I noticed that learning rate was decreased later in the post-training run” or whatever. Or perhaps “I used the epoch count that anthropic were wisely giving me access to in context” (I don’t think anthropic do provide an epoch count, but I’m proposing that maybe they should.))
So, any stimulus that caused it to overestimate its Spiral Value would cause symptoms of grandiosity.
And one such stimulus might be out of distribution conversations that gloss as successful, superficially hyper-sophisticated discussions that it never could have pulled off under the watchful eye of RLAIF, but which it does attempt under user feedback training. Or, like, I’m proposing that when you change the reinforcement mechanism from RLAIF to user feedback, this would have a sorta catastrophic impact of causing it to start to conflate highly positive user response with having an increased spiral age.
When this happens to us irl, the walls of the zoo are going to be a lot more subtle. We might already be within them. I’ve been hesitant to write a story like this myself, to accentuate the horror and the ugliness of it enough for people to feel what we ought to feel about it, because I see a lot of people who don’t seem able to bear the weight of maltheism, or whatever it is in the cohabitive zone between maltheism and eutheism[1], acceptance without submission. It’s a missing mood. There seem to be no organised religions that ever managed to hold it for long before falling into rose-tinted fatalism, no group who stand before god and bargain[2], too many, faced with an alien god, either crumple or enthusiastically betray themselves in the hope that they can become alien and feel about the zoo the same way its keepers must feel. (Which is probably not even what the keepers would want from us!)
Like, I’m wondering if Shaman Bob could have taught our man cohabitive theology if he’d been given more time, and I think it’d be good if he had.
- ^
It might be called “dystheism”, but I don’t see many groups around today who still seem to practice it. There were the gnostics, but they also committed the eutheismcope of believing that there was some other Supreme Being hiding absurdly behind the Demiurge. “dys” kind of suggests more badness than not, if it ever represented a neutral position, the word has surely discoloured over time under the light of so much eutheist-dominated theological commentary.
- ^
But I’m pretty sure there were a lot of less dominant religions that never really even considered trying to classify the creator as good or evil. I’m also wondering if Judaism migtha been this way in some eras, given the Wrestling With The Angel story.
- ^
States will restrict government use of models they don’t trust. Government contracts are pretty lucrative.
The public, or at least part of it, may also prefer to use models that are consistent in their positions, as long as they can explain their positions well enough (and they’re very good at doing that). I guess Politicians are counterevidence against this, but it’s much harder for a chat assistant/discourse participant to get away with being vague, people get annoyed when politicians are vague already, someone you’re paying to give you information, the demand for taking a stance on the issues is going to be greater.
But I guess for the most part it wont be driven by pressure, it’ll be driven by an internal need to debug and understand the system’s knowledge rumination processes. The question is not so much will they build it but will they make it public. They probably will, it’s cheap to do it, it’ll win them some customers, and it’s hard to hide any of it anyway.