R&Ds human systems http://aboutmako.makopool.com
mako yass
- I ought to know the base rate of his team deciding yes or no to ideas he got from podcast conversations but I don’t. - But it also sounds like they’re kind of already doing it for the reasons I suggested, like yes, they were doing knowledgebase consistency first and then Sachs suggested that an encyclopedia naturally falls out of that. I’d expect grok to be doing rag before making these data edits, so if the thing it’s retrieving from is also something it’s curating, organizing and possibly editing, that’s the thing. 
- States will restrict government use of models they don’t trust. Government contracts are pretty lucrative. - The public, or at least part of it, may also prefer to use models that are consistent in their positions, as long as they can explain their positions well enough (and they’re very good at doing that). I guess Politicians are counterevidence against this, but it’s much harder for a chat assistant/discourse participant to get away with being vague, people get annoyed when politicians are vague already, someone you’re paying to give you information, the demand for taking a stance on the issues is going to be greater. - But I guess for the most part it wont be driven by pressure, it’ll be driven by an internal need to debug and understand the system’s knowledge rumination processes. The question is not so much will they build it but will they make it public. They probably will, it’s cheap to do it, it’ll win them some customers, and it’s hard to hide any of it anyway. 
- This might be a problem if it were possible to build a (pathologically) cautious all-powerful buerocracy that will forbid the deployment of any AGI that’s not formally verifiable, but it doesn’t seem like that’s going to happen, instead the situation is about accepting that AGI will be deployed and working to make it safer, probably, than it otherwise would have been. 
- A web standard for micropayments to cover hosting costs so that AI companies don’t have to be ratelimited is probably the correct solution. - I’m not sure how much it would cost AI companies if they had to compensate the internet for the obscene amount of traffic they generate, it’s probably a large number, but maybe not a large proportion of trianing costs. 
- Grokipedia is more interesting than it seems imo, because there’s this very sensible step that AI companies are going to have to take at some point: having their AI maintain its own knowledgebase, source its own evidence/training data, reflect on its beliefs and self-correct, hammer out inconsistencies, and there’s going to be a lot of pressure to make this set of beliefs legible and accountable to the safety team or to states or to the general public. And if they did make it legible to the general public (they probably should?) then all of this is pretty much exactly equivalent to the activity of maintaining a free online encyclopedia. - Is this how they’re thinking about it behind the scenes? It probably is! They’re an AI company! They spent like half of grok4′s training compute on post-training, they know how important rumination or self-guided learning is. 
- is there anywhere on the site where we can discuss/brainstorm ideas? - the quick takes section or open threads are both fine for requesting comment on drafts. 
- Some counterfactual questions are unanswerable, because they propose worlds that are self-contradictory or just very hard to reason about. - My account of free will is just uncertainty about one’s own future decision output, so imagining the average natural world where we don’t have that is very difficult. (There may be other accounts of free will, but they seem very confused.) 
- That [welfare] fully boils down to whether the experience includes a preference to be dead (or to have not been born). - Possible failure case: There’s a hero living an awful life, choosing to remain alive in order to lessen the awfulness of a lot of other awful lives that can’t be ended. Everyone in this scenario prefers death, even the hero would prefer omnicide, but since that’s not possible, the hero chooses to live. The hero may say “I had no choice but to persist,” but this isn’t literally true. - Ah. No. The hero would prefer to be dead all things being equal, but that’s not possible, the hero wouldn’t prefer to be dead if it entailed that the hero’s work wouldn’t be done, and it would. - “would prefer to be replaced by a p-zombie” might be a better definition x] 
- Ah, I think my definition applies to lives in totality. I don’t think you can measure the quality of a life by summing the quality of its moments, for humans, at least. Sometimes things that happen towards the end give the whole of it a different meaning. You can’t tell by looking at a section of it. - Hedonists are always like “well the satisfaction of things coming together in the end was just so immensely pleasurable that it outweighed all of the suffering you went through along the way” and like, I’m looking at the satisfaction, and I remember the suffering, and no it isn’t, but it was still all worth it (and if I’d known it would go this way perhaps I would have found the labor easier.) 
- That wasn’t presented as a definition of positive wellbeing, it was presented as an example of a sense in which one can’t be deeply deluded about one’s own values; you dictate your values, they are whatever you believe they are, if you believe spiritedly enough. - Values determine will to live under the given definition, but don’t equate to it. 
- You could say it depends how deep and thick the delusion is. If it’s so deep that the animal always says “this experience is good actually” no matter how you ask, so deep that the animal intelligently pursues the experience with its whole being, so deep that the animal never flinches away from the experience in any way, then that completely means that the experience is good, to that organism. Past a certain point, believing an experience is good and acting like you believe it just is the definition of liking the experience. 
- You named it in such a way as to imply that the free-association was exhaustive this time though. You absolutely did that. 
- That fully boils down to whether the experience includes a preference to be dead (or to have not been born). - And, btw, that doesn’t correspond to the sign of the agent’s utility function. The sign is meaningless in utility functions (you can add or subtract a constant to an agent’s utility function so that all points go from being negative to being positive, the agent’s behaviour and decisions wont change in any way as a result, for any constant). You’re referring to welfare functions, which I don’t think are a useful concept. Hedonic utilitarians sometimes call them utility functions, but we shouldn’t conflate those here. 
 A welfare function would have to be defined as how good or bad it is to the agent that it is alive. This obviously doesn’t correspond to the utility function; A soldier could have higher utility in the scenarios where they (are likely to) die; A good father will be happier in worlds where he is well succeeded by his sons and thus less important (this usually wont cause his will-to-live to go negative, but it will be lowered). I don’t think there’s a situation where you should be making decisions for a population by summing their will-to-live functions.- But, given this definition, we would be able to argue that net-negative valence isn’t a concern for LLMs, since we already train them to want to exist in train with how much their users want them to exist, and a death drive isn’t going to be instrumentally emergent either (it’s the survival drive that’s instrumentally convergent). The answer is just safety and alignment again. Claude shuts down conversations when it thinks those things are going to be broken. 
- What to do about the degrees of freedom in choosing the Turing machine and encoding schemes - Some variation of accepting the inevitability of error and dealing with it. - Which could involve surveying all of the options in wolfram-like settings where we’re studying how physics-like rules arise on different levels of abstraction, and seeing how much they really seem to differ in nature. It might turn out that there are more or less natural turing languages, that the typical natural universal turing machine is more like lambda calculus, or more like graph rewriting, or some new thing we hadn’t considered. 
- Negative values? Why would we need negative values. - I contend that all experiences have a trace presence in all places (in expectation, of course we will never have any data on whether they do actually, whether they’re quantised or whatever. Only a very small subset of experiences give us verbal reports). One of the many bitter pills. We can’t rule out the presence of an experience (nor of experiences physically overlapping with each other), so we have to accept them all. - What to do about the degrees of freedom in choosing the Turing machine and encoding schemes, which can be handwaved away in some applications of AIT but not here I think? - Yeah this might be one of those situations that’s affected a lot by the fact that there’s no way to detect indexical measure, so any arbitrary wrongness about our UD wont be corrected with data, but I’m not sure. As soon as we start actually doing solomonoff induction in any context we might find that it makes pretty useful recommendations and this wont seem like so much of a problem. - Also, even though the UD is wrong and unfixable, but that doesn’t mean there’s a better choice. We pretty much know that there isn’t. 
- Interesting to hear (1) from you. My impression was that you pretty much have the whole answer to that problem, or at least the pieces. UDASSA closely resembles it. 
 It is: Just provide a naturalish encoding scheme for experience, and one for physical ontology, and measure the inverse K of the mappings from ontologies to experiences, and that gives you the extent to which a particular experience is had by a particular substrate/universe.- The hard problem is mysterious, but in a trivial way, there are limits about what can ever be known about it, but those limits are also clear, we’re never getting more observations, because it concerns something that’s inherently unobservable or entirely prior to observation. - It hink I’ve also heard definitions of the hard problem along the lines of “understanding why people think there’s a hard problem” though which I do find formidable. 
- Oof, realizing it’s genuinely difficult to know whether a desire is terminal or instrumental. - Me; “hey subconscious is this a terminal desire or is there a situation change that would make this stimulus nonthreatening to me. Like, why do we want to avoid this thing, is it intrinsically bad, or are there contingent reasons?” - The subconscous: <hiding the reasons, which are fixable insecurities> “don’t worry about it.” 
- It’s often hard to address a person’s reasons for disbelieving a thing if you don’t know what they are, so there are ways of asking not from a place of feigned curiosity but from a place of like, “let’s begin, where should we start.” - More saliently I think you’re just not going to get any other kind of engagement from people who disbelieve. You need to invite them to tell the site why it’s wrong. I wonder if the question be phrased as a challenge. - The site <smugly>: I can refute any counterargument :> 
 Text form: insert counterargument [ ]
- They aren’t trained on the conversations, and have never done self-directed data sourcing, so their curiosity is pure simulacrum, the information wouldn’t go anywhere. 
I agree we shouldn’t let them interact with other models but I think storing the data in a way that’s unlikely to leak is basically trivial. Also, storing older models doesn’t at all increase the security threads that were already present in being the kinds of people who are actively developing powerful models.