I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
I think the verdict might surprise you.
What is the verdict then?
What episode of doom debates?
True, but to give a quantitative counterexample: I’ve read a semi-private memo by a well-known LW’er 2 years ago, and it stuck in my mind with the level of stuckness-in-my-mind comparable to maybe 30 top-on-this-metric LW posts.
Thanks for the elaboration. Do you have historical examples of new ontology unlocking new notation?
Possibly one factor is that the evident versatility of using ASCII in nearly all programming languages (and also for stuff like LaTeX) made people less inclined to invent new notation.
I think there’s a general bias in Western culture arising from the problems of physicalism that gets people to consider realist ontology not worth seriously pursuing.
Can you elaborate?
Relatedly: string diagrams (with Penrose’s tensor notation apparently being seen as a precursor)
Thanks for writing this.
I tried thinking about ~6 examples of community misconduct disputes around me, and indeed none of them involved disagreements about “material happenings”.
But it also seems to me that the post is lumping together under “disputing the character” two things that are important to distinguish. One is defending/attacking the parties via non-central examples/halo effect. The other is arguing about the person’s intentions, or even more broadly, the psychological and other factors that led them to behave the way they did (and sometimes also about what implications this has about how they are likely to behave in the future). Both are involved in community disputes, but it seems to me that “halo defense” is the second line of defense, when one cannot credibly claim to have had good intentions, or has caused too much damage for people to care much about intentions.
discourages people from speaking out about their experiences, both because they may be reluctant to ‘ruin the person’s life’ over something non-catastrophic
Not sure that I remember hearing this sort of explicit discouragement. To the extent that this happens, it seems bad that our culture (or the broader macro-culture it is embedded in) doesn’t leave a line of retreat in the form of some “absolution process”, which would weaken this pressure. IDK how to do it. Seems hard. But I think “absolution rituals” were somewhat common in pre-modern cultures? (E.g., the biblical parable of the prodigal son.)
https://intelligence.org/2018/02/28/sam-harris-and-eliezer-yudkowsky/ and I also recall seeing this in some tweets.
What Hume observed is that there are some sentences that involve an “is,” some sentences involve “ought,” and if you start from sentences that only have “is” you can’t get sentences that involve “oughts” without a ought introduction rule, or assuming some other previous “ought.” Like: it’s currently cloudy outside. That’s a statement of simple fact. Does it therefore follow that I shouldn’t go for a walk? Well, only if you previously have the generalization, “When it is cloudy, you should not go for a walk.” Everything that you might use to derive an ought would be a sentence that involves words like “better” or “should” or “preferable,” and things like that. You only get oughts from other oughts. That’s the Hume version of the thesis.
The way I would say it is that there’s a separable core of “is” questions. In other words: okay, I will let you have all of your “ought” sentences, but I’m also going to carve out this whole world full of “is” sentences that only need other “is” sentences to derive them.
Sam: I don’t even know that we need to resolve this. For instance, I think the is-ought distinction is ultimately specious, and this is something that I’ve argued about when I talk about morality and values and the connection to facts. But I can still grant that it is logically possible (and I would certainly imagine physically possible) to have a system that has a utility function that is sufficiently strange that scaling up its intelligence doesn’t get you values that we would recognize as good. It certainly doesn’t guarantee values that are compatible with our wellbeing. Whether “paperclip maximizer” is too specialized a case to motivate this conversation, there’s certainly something that we could fail to put into a superhuman AI that we really would want to put in so as to make it aligned with us.
Eliezer: I mean, the way I would phrase it is that it’s not that the paperclip maximizer has a different set of oughts, but that we can see it as running entirely on “is” questions. That’s where I was going with that. There’s this sort of intuitive way of thinking about it, which is that there’s this sort of ill-understood connection between “is” and “ought” and maybe that allows a paperclip maximizer to have a different set of oughts, a different set of things that play in its mind the role that oughts play in our mind.
Sam: But then why wouldn’t you say the same thing of us? The truth is, I actually do say the same thing of us. I think we’re running on “is” questions as well. We have an “ought”-laden way of talking about certain “is” questions, and we’re so used to it that we don’t even think they are “is” questions, but I think you can do the same analysis on a human being.
Eliezer: The question “How many paperclips result if I follow this policy?” is an “is” question. The question “What is a policy such that it leads to a very large number of paperclips?” is an “is” question. These two questions together form a paperclip maximizer. You don’t need anything else. All you need is a certain kind of system that repeatedly asks the “is” question “What leads to the greatest number of paperclips?” and then does that thing. Even if the things that we think of as “ought” questions are very complicated and disguised “is” questions that are influenced by what policy results in how many people being happy and so on.
Ok, looking now at the transcript, it looks like he’s saying that wiring together certain “is” questions can produce “wanting” that we label “ought”. I think he’s prematurely deflating the argument, because, IIUC, in this ontology, the “ought” questions are about what “is” questions to have wired together in one’s brain.
Several times, I’ve heard Eliezer say something like “a powerful consequentialist AI could run on ‘is’ statements only, without any ‘ought’ statements”, and I don’t think I’ve ever heard him explain clearly what the difference is between the two categories of statements that he’s tracking.
The classical Humean distinction seems to posit that all “motivational force” is derived from “ought” statements, so it seems like he thinks about it differently than Hume.
Has this been explained anywhere?
I claim you can have asymptotic alignment without having a formally certified proof of asymptotic alignment, but that it would be surprising to be able to have empirical asymptotic alignment without the model confidently telling you that it expects that someday, it or a successor will be able to give a formal proof of alignment.
Reminded me of James’s https://www.lesswrong.com/posts/akuMwu8SkmQSdospi/working-through-a-small-tiling-result
tl;dr it seems that you can get basic tiling to work by proving that there will be safety proofs in the future, rather than trying to prove safety directly.
I know and have known people who don’t think in words. My experience is kinda in-between-ish, with verbal fragments transiently appearing in my consciousness like words on scraps of paper carried by the wind, or sometimes short verbal comments. The exception is when I’m actually trying to put some rigid structure on my thinking: then I do something that feels more like “thinking in full sentences” (or sometimes formal notation).
It seems to me that high-internal-monologue (or something) people tend to assume that their experience is a human universal,[1] similarly to how aphantasia was only “discovered for real” a few decades ago, because aphantasiacs mostly thought that non-aphantasiacs were being metaphorical when they were talking about “seeing loved ones’ faces in their mind’s eye”, etc.
[ETA: Maybe I misunderstood how strong emphasis you’re putting on the role of words/language?]
Are you asking about problems that would by default only appear at the SI level that have been demonstrated at a sub-SI level via some sort of elicitation?
(Acknowledgment’. A guiding frustration here is that imo people posting on LessWrong think way too much in terms of goals.)
FYI, I think this comment might be the best (compressed/short?) illustration of limitations of thinking in terms of goals for the purpose of understanding agency that I’ve seen.
The continuing kick toward higher degrees of agency comes from parts of the brain which have reactions to the predictions made by the cortex. (Otherwise, the cortex just learns to predict the raw reflexes, and we’re stuck imitating our baby selves or something along those lines).
Interesting. This seems to imply a (weak) prediction that defects of (some) “parts of the brain which have reactions to the predictions made by the cortex” might manifest as mental developmental disorders.
Not OP, but a few months ago, I figured that a big thing I needed was more friction when accessing various distracting apps and websites on my desktop, so I did the obvious thing: vibe-coded https://github.com/MatthewBaggins/app-blocker-daemon and https://github.com/MatthewBaggins/site-blocker.
Also, 2 years ago, I installed https://www.minimalistphone.com/, and, from my subjective felt non-QS experience, it helped a lot. In particular, I set up a nearly full-day blocker for Brave (my main web browser), which was the most distracting app. If I really want/have to, I can access the web via Chrome, but it’s higher friction, because I’m not logged into my Google, LW, etc accounts, and also it doesn’t have the adblocker (and I am an ardent ad hater).
Claude is still in the bicameral mind stage, so it is probably not conscious yet.
Thanks for posting this.
It is an open-to-me question to what extent this (assuming it’s as bad as you’re saying), the effect of (1) boring selection effects; (2) power actually corrupting a lot; (3) people being bad and power revealing them to be so; (4) something sunk-costs-shaped.
(Regarding (2) and (3), I’ve come to be unsure whether they are meaningfully different.)
I am rather skeptical that (1) explains most of this. Surely, it plays a role, but, like, surely not all super-powerful are born with very strong tendencies to develop sociopathy or whatever. It’s more plausible that those extremely powerful people who have reasons to repent their past use of extreme power are the ones who have prior psychological predispositions to use power in very bad ways. But still, IDK, it seems to me that ruling well can be extremely hard, so causing a lot of bad, while trying to do good[1] doesn’t seem super difficult.
Regarding (4), for example, there’s been a lot of repenting recently in the LW/EA sphere, but I can’t think of anyone other than Habryka among the (past or current) leadership-ish positions who said out loud that “yep, we’ve done a lot of bad stuff; maybe it’s net bad overall”. It’s probably just really hard/high-friction to face the truth that one’s past actions have led to really, really nasty stuff, and being at the top of some social pyramid doesn’t make it any easier. It seems plausible that humans derive values from fictitious imputed coherence, so “facing one’s past sins”, especially the ones that one incumbently counts as one’s most meaningful decisions, runs against the natural grain of human value acquisition.
Let’s hand-wavingly put aside the examples of trying to do good, such as the Crusades, or whatever the hell the Soviets were thinking when they stole food from peasants to starve them to death.
I feel like something stronger needs to be said: If this is the case, then you’re probably working on something that the capabilities/product people will need for their capabilities/product work, and therefore plausibly just doing for them a bit of their work, which is pushing the world towards x-risk.
It might be worth it anyway, because maybe it’s better for the world if this specific part of the work that you’re doing gets done earlier, relative to the other parts of the work. But, eh, IDK, seems sus.