I operate by Crocker’s rules. All LLM output is explicitely designated as such. I have made no self-hiding agreements.
niplav
Hm, I am unsure how much to believe this, even though my intuitions go the same way as yours. As a correlational datapoint, I tracked my success from cold approach and the time I’ve spent meditating (including a 2-month period of usually ~2 hours of meditation/day), and don’t see any measurable improvement in my success rate from cold approach:
(Note that the linked analysis also includes a linear regression of slope −6.35e-08, but with p=0.936, so could be random.)
In cases where meditation does stuff to your vibe-reading of other people, I would guess that I’d approach women who are more open to being approached. I haven’t dug deeper into my fairly rich data on this, and the data doesn’t include much post-retreat approaches, but I still find the data I currently have instructive.
I wish more people tracked and analyzed this kind of data, but I seem alone in this so far. I do feel some annoyance at everyone (the, ah, “cool people”?) in this area making big claims (and sometimes money off of those claims) without even trying to track any data and analyze it, leaving it basically to me to scramble together some DataFrames and effect sizes next to my dayjob.[1]
So start meditating for an hour a day for 3 months using the mind illuminated as an experiment (getting some of the cool skills mentioned in Kaj Sotala’s sequence?) and see what happens?
Do you have any concrete measurable predictions for what would happen in that case?
- ↩︎
I often wonder if empiricism is just incredibly unintuitive for humans in general, and experimentation and measurement even more so. Outside the laboratory very few people do it, and see e.g. Aristotle’s claims about the number of women’s teeth or his theory of ballistics, which went un(con)tested for almost 2000 years? What is going on here? Is empiricism really that hard? Is it about what people bother to look at? Is making shit up just so much easier so that everyone keeps in that mode, which is a stable equilibrium?
- ↩︎
Reminds me of one of my favourite essays, Software engineers solve problems (Drew DeVault, 2020).
I’m revisiting this post after listening to this section of this recent podcast with Holden Karnofsky.
Seems like this post was overly optimistic in what RSPs would be able to enforce/not quite clear on different scenarios for what “RSP” could refer to. Specifically, this post was equivocating between “RSP as a regulation that gets put into place” vs. “RSP as voluntary commitment”—we got the latter, but not really the former (except maybe in the form of the EU Codes of Practice).
Even at Anthropic, the way the RSP is put into practice is now basically completely excluding a scaling pause from the picture:
RSPs are pauses done right: if you are advocating for a pause, then presumably you have some resumption condition in mind that determines when the pause would end. In that case, just advocate for that condition being baked into RSPs!
Interview:
That was never the intent. That was never what RSPs were supposed to be; it was never the theory of change and it was never what they were supposed to be… So the idea of RSPs all along was less about saying, ‘We promise to do this, to pause our AI development no matter what everyone else is doing’
and
But we do need to get rid of some of this unilateral pause stuff.
Furthermore, what apparently happens now is that really difficult commitments either don’t get made or get walked back on:
Since the strictest conditions of the RSPs only come into effect for future, more powerful models, it’s easier to get people to commit to them now. Labs and governments are generally much more willing to sacrifice potential future value than realized present value.
Interview:
So I think we are somewhat in a situation where we have commitments that don’t quite make sense… And in many cases it’s just actually, I would think it would be the wrong call. In a situation where others were going ahead, I think it’d be the wrong call for Anthropic to sacrifice its status as a frontier company
and
Another lesson learned for me here is I think people didn’t necessarily think all this through. So in some ways you have companies that made commitments that maybe they thought at the time they would adhere to, but they wouldn’t actually adhere to. And that’s not a particularly productive thing to have done.
I guess the unwillingness of the government to turn RSPs into regulation is what ultimately blocked this. (Though maybe today even a US-centric RSP-like regulation would be considered “not that useful” because of geopolitical competition). We got RSP-like voluntary commitments from a surprising number of AI companies (so good job on predicting the future on this one) but that didn’t get turned into regulation.
It’s a bit of a travesty there’s no canonical formal write-up of UDASSA, given all the talk about it. Ugh, TODO for working on this I guess.
My understanding is that UDASSA doesn’t give you unbounded utility, by virtue of directly assigning , and the sum of utilities is proportional to . The whole dance I did was in order to be able to have unbounded utilities. (Maybe you don’t care about unbounded utilities, in which case UDASSA seems like a fine choice.)
(I think that the other horn of de Blanc’s proof is satisfied by UDASSA, unless the proportion of non-halting programs bucketed by simplicity declines faster than any computable function. Do we know this? “Claude!…”)
Edit: Claude made up plausible nonsense, but GPT-5 upon request was correct, proportion of halting programs declines more slowly than some computable functions.
Edit 2: Upon some further searching (and soul-searching) I think UDASSA is currently underspecified wrt whether its utility is bounded or unbounded. For example, the canonical explanation doesn’t mention utility at all, and none of the other posts about it mention how exactly utility is defined..
Makes sense, but in that case, why penalize by time? Why not just directly penalize by utility? Like the leverage prior.
Huh. I find the post confusingly presented, but if I understand correctly, 15 logical inductor points to Yudkowsky₂₀₁₃—I think I invented the same concept from second principles.
Let me summarize to understand: My speed prior on both the hypotheses and the utility functions is trying to emulate just discounting utility directly (because in the case of binary tapes and integers penalizing both for the exponential of speed gets you exactly an upper bound for the utility), and a cleaner way is to set the prior to . That avoids the “how do we encode numbers” question that naturally raises itself.
Does that sound right?
(The fact that I reinvented this looks like a good thing, since that indicates it’s a natural way out of the dilemma.)
I think the upper bound here is set by a program “walking” along the tape as far as possible while setting the tape to and then setting a list bit before halting (thus creating the binary number where [1]). If we interpret that number as a utility, the utility is exponential in the number of steps taken, which is why we need to penalize by instead of just [2]. If you want to write on the tape you have to make at least steps on a binary tape (and on an n-ary tape).
- ↩︎
Technically the upper bound is , the score function.
- ↩︎
Thanks to GPT-5 for this point.
- ↩︎
epistemic status: Going out on a limb and claiming to have solved an open problem in decision theory[1] by making some strange moves. Trying to leverage Cunningham’s law. Hastily written.
p(the following is a solution to Pascal’s mugging in the relevant sense)≈25%[2].
Okay, setting (also here in more detail): You have a a Solomonoff inductor with some universal semimeasure as a prior. The issue is that the utility of programs can grow faster than your universal semimeasure can penalize them, e.g. a complexity prior has busy-beaver-like programs that produce amounts of utility with the program , while only being penalized by . The more general results are de Blanc 2007, de Blanc 2009 (LW discussion on the papers from 2007). We get this kind of divergence of expected utility on the prior if
the prior is bounded from below by a computable function and
the utility function is computable and unbounded
The next line of attack is to use the speed prior as a prior . That prior is not bounded by a computable function from below (because it grows slower than for programs of length ), so we escape into one of de Blanc’s horns. (I don’t think having a computable lower bound is that important because K-complexity was never computable in the first place.)
But there’s an issue: What if our hypotheses output strings that are short, but are evaluated by our utility function as being high-value anyway? That is, the utility function takes in some short string of length and outputs as its utility . This is the case if the utility function itself is a program of some computational power, in the most extreme case the utility function is Turing-complete, and our hypotheses “parasitize” on this computational power of our utility function to be a Pascal’s mugging. So what we have to do is to also consider the computation of our utility function as being part of what’s penalized by the prior. That is,
for being the time it takes to run the utility function on the output of . I’ll call this the “reflective speed prior”. Note that if you don’t have an insane utility function which is Turing-complete, the speed penalty for evaluating the output of should be fairly low most of the time.
Pascal’s mugging can be thought of in two parts:
My expected utility diverges on priors, that is, not having observed any information or made any Bayesian updating, my utility can get arbitrarily big. I think this one is a problem.
My expected utility can diverge after updating on adversarially selected information. I think this case should be handled by improving your epistemology.
I claim that the reflective speed prior solves 1., but not 2. Furthermore, and this is the important thing, if you use the reflective speed prior, the expected utility is bounded on priors, but you can have arbitrarily high maximal expected utilities after performing Bayesian updating. So you get all the good aspects of having unbounded utility without having to worry about actually getting mugged (well, unless you have something controlling the evidence you observe, which is its own issue).
(Next steps: Reading the two de Blanc papers carefully, trying to suss out a proof, writing up the argument in more detail. Think/argue about what it means to update your prior in this strange way, and specifically penalizing hypotheses by how long it takes your utility function to evaluate them. Figure out which of these principles are violated. (On an initial read: Definitely Anti-Timidity.) Changing ones prior in a “superupdate” has been discussed here and here.)
Edit: Changed from penalizing the logarithm of runtime and utility-runtime to penalizing it linearly, after feedback from GPT-5.
Best I can tell, the risk of psychosis is much higher with Goenka style retreats, although I don’t have hard numbers, only anecdotal evidence and theory that suggests it should be more common.
My experience has been that anything except long intensive retreats don’t move my mind out of its default attractor state, and that I probably waited too long to attend do long retreats, and all the advice goes in the opposite direction.
I mention this because all the talk of the downside of meditation have me thinking of a tweet that goes roughly like “why do both republicans and democrats pretend HRT does anything”. Goenka retreats have medium-strength effects on me, an intensive one-month retreat at home had a decent effect. I may be doing something wrong.
Yup, that’s correct if I remember the sources correctly. I guess the tone surrounding it doesn’t match that particular bit of content. I should also turn the pledged/received numbers into a table for easier reading.
Yup, it’s a regionalism that I mis-/over-generalized. I’ll avoid it from now on.
It is, and it’s the thing I’d most like Smil to read if I could recommend something to him.
I’ll have to go and fix that on Wikipedia as well, that’s what misled me in the first place. Thanks again for checking this! The best paper I’m finding is this one with six scenarios, them putting the cost between -$17 trio. and -$35 trio., so I wasn’t off by a factor of ten but instead a factor of two to five.
Yep, I intended that to mean “trillion”.
Aw man I shouldn’t have trusted that number! It seemed a bit sketchy. I’ll edit it into something sane.
Edit: Fixed.
Vaclav Smil is great on this, I really liked his book Growth. He takes a very numerate but still very different view on history (e.g., ah, fitting a sigmoid to GDP numbers in the book).
Humanity Learned Almost Nothing From COVID-19
@Eric Drexler you were mentioned in the parent comment.
Sonnet 4.5 writes its private notes in slop before outputting crisp text. I think humans are largely like this as well?
This is missing that after the third paragraph the scratchpad content shown to the user starts getting summarised by a smaller model to prevent people from distealing Claude’s chain of thought. You can see a clear phase transition where the scratchpad turns from crisp detailed content into slop. That’s where the summarization starts.
Possible synthesis (not including the newest models):