Thoth Hermes
[Question] What would a post that argues against the Orthogonality Thesis that LessWrong users approve of look like?
I am asking the reader to at least entertain my hypotheticals as I explain them, which… Perhaps is asking a little too much. It might simply be necessary to provide far more examples, especially for this particular subject.
The thing is, the concept overlaps are going to be very fuzzy, and there’s no way around that. These color-meanings can’t be forced to be too precise, and that means that on the whole, over many many observations, these meanings make very soft impressions over time. It may not be something that will strike you either as obvious or as an explanation for a missing piece of data you’ve always wondered about unless you’re explicitly looking for it.
In my case, I am not sure when / how I first observed it, but it was relatively sudden and I happened not to be explicitly looking for it.
I’ve provided evidence for all of them—they have to obey algebraic equations.
I don’t really know by what basis you say these are just based on Western culture. Take, for example that Buddhist monks wear orange robes. Or, that stop lights are mostly (red, yellow, green) in nearly all countries. There may be a reason that we use these colors for these meanings, and my post postulates this as well as speculates that although this may be the case, it is not something that is well-documented at this point.
You shouldn’t just claim that someone hasn’t provided evidence for something or has failed to do something markedly obvious—you really lose a lot of the basis of shared respect that way.
This is an introductory post. I have been advised to keep things short before, but trying to ensure that every possible objection is answered preemptively is not possible within those constraints.
If you try and keep your objections something that can spur discussion, it would make comment threads useful for expanding on the material, which would be a desirable outcome.
Colors Appear To Have Almost-Universal Symbolic Associations
Yes, but the point is that we’re trying to determine if you are under “bad” social circumstances or not. Those circumstances will not be independent from other aspects of the social group, e.g. the ideology it espouses externally and things it tells its members internally.
What I’m trying to figure out is to what extent you came to believe you were “evil” on your own versus you were compelled to think that about yourself. You were and are compelled to think about ways in which you act “badly”—nearby or adjacent to a community that encourages its members to think about how to act “goodly.” It’s not a given, per se, that a community devoted explicitly to doing good in the world thinks that it should label actions as “bad” if they fall short of arbitrary standards. It could, rather, decide to label actions people take as “good” or “gooder” or “really really good” if it decides that most functional people are normally inclined to behave in ways that aren’t necessarily un-altruistic or harmful to other people.
I’m working on a theory of social-group-dynamics which posits that your situation is caused by “negative-selection groups” or “credential-groups” which are characterized by their tendency to label only their activities as actually successfully accomplishing whatever it is they claim to do—e.g., “rationality” or “effective altruism.” If it seems like the group’s ideology or behavior implies that non-membership is tantamount to either not caring about doing well or being incompetent in that regard, then it is a credential-group.
Credential-groups are bad social circumstances, and in a nutshell, they act badly by telling members who they know not to be intentionally causing harm that they are harmful or bad people (or mentally ill).
[Question] Why doesn’t the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a “fairly capable” agent will have at least some non-negligible fraction of overlap with human values?
Ontologies Should Be Backwards-Compatible
This is cool because what you’re saying has useful information pertinent to model updates regardless of how I choose to model your internal state.
Here’s why it’s really important:
You seem to have been motivated to classify your own intentions as “evil” at some point, based entirely on things that were not entirely under your own control.
That points to your social surroundings as having pressured you to come to that conclusion (I am not sure it is very likely that you would have come to that conclusion on your own, without any social pressure).
So that brings us to the next question: Is it more likely that you are evil, or rather, that your social surroundings were / are?
and that since there continue to be horrible things happening in the world, they must have evil intentions and be a partly-demonic entity.
Did you conclude this entirely because there continue to be horrible things happening in the world, or was this based on other reflective information that was consistent with horrible things happening in the world too?
I imagine that this conclusion must at least be partly based on latent personality factors as well. But if so, I’m very curious as to how these things jive with your desire to be heroically responsible at the same time. E.g., how do evil intentions predict your other actions and intentions regarding AI-risk and wanting to avert the destruction of the world?
I will accept the advice from Dentin about adding sections as valid, but I will probably not do it (simply because I don’t think that will cause more people to read it).
I tend to reject advice that is along the lines of “I can’t understand what your post is about, try changing the formatting / structure, and see if I understand it then” as violating “Norm One” (see my commenting guidelines).
For example, a request to add sections (or in this case, to reduce the size of the overall piece) isn’t technically wrong advice, as those things may potentially increase the quality of the piece. But when those requests are accompanied by “I didn’t understand it because it lacked _”, I think that those requests are too burdensome on the author, as they create a sort of implicit contract between the author and the commenter in which the author bears the responsibility of performing the work request in exchange for some probability that the commenter will say “Okay, looks good now” and then offer some substantial discussion to the major claims / arguments.
I can summarize one of the major points here, e.g., later on I write:
In that linked piece, for example, he admonishes one not to assume that people can easily interpret what you meant. Therefore, if someone says to you that you aren’t making sense to them, believe them.
In other words, I believe the Sequences claim (and are wrong about) that the burden is always on the speaker to assume that when someone says they didn’t understand you, it was you who made the error, and it is on you to correct the error, if you want to be understood. My model of human minds says the prior on the probability that the speaker made a mistake in transmitting the information over the probability that the receiver of the information made a mistake (which can be corrected) or that they are being deceptive, is quite low.
Where “the Sequences” Are Wrong
Actually, my lowest three comments are:
It seems to be historically the case that “doomers” or “near-doomers” [...] (K −9)
AFAIK, the Secretary-General is a full-time position, e.g., [...] (K −5)
Remove the word “AI” and I think this claim is not really changed in any way. AI systems are the most general systems. [...] (K −5)
The following is simply my own assessment of why these comments were downvoted. For the first one, I assume that it was because of the use of the term “doomers” in a pejorative sense. (This is closer, I believe, to what I called “low-key aggro” in my earlier comment.)
I am not sure why the second one was taken so poorly, and I imagine that whoever downvoted it would probably claim it to be snarkier or more disrespectful somehow than it actually was. This is unfortunate, because I think this serves as evidence that comments will often be downvoted because they could be interpreted to be more hostile or low-effort than they actually are. Alternatively, it was downvoted because it was “political.”
The third one is also unfortunate. Disagree-downvoting for that comment makes sense, but not karma-downvoting. If you were to counter that it was somehow 101-material or misunderstanding basic points, I would still have to strongly disagree with that.
My second-highest comment is about why I am worried about site-norms unfairly disfavoring discussions that disagree with major points that are commonly accepted on LessWrong or taken as catechism, so that should also support the idea that if such norms exist, you will observe that comments that do so also appear to be karma-downvoted, so as to limit their visibility and discourage discussion of those topics.
This still supports my main point, I believe.
This is pretty dense with metaphors and generalizations, with roughly zero ties to any specific instance, which will always be a mix of these generalities and context-dependent perturbations, often with the specifics overwhelming the generalities.
I disagree that the specifics will necessarily overwhelm the generalities. In this model, we presume that the presence-or-not of a credential has a meaning that would not be overwhelmed by the specifics in any given situation, else it would not exist. People have to come together and decide to use the credential; After so much time has passed for groups with and without them, there have been enough observations to determine whether noise would overwhelm signal, here.
If people use them, we merely conclude that there must be a reason for that.
So as not to be accused of asking for examples without trying to come up with some myself, it seems like higher education is a case that you’d use for this theory. But I don’t think the model applies very well—there is certainly a fair bit of credentialism and disdain for outsiders, but there’s also a lot of symbiosis with “industry” in terms of actual output and value-capture of ideas.
Doctoral-level degree programs mainly, but even more so for licenses that are legally required to perform the work in most jurisdictions.
I don’t assume that it applies literally everywhere for all forms of work, either. And my model wouldn’t work if it did. But it has to be significant enough to matter. Computer science is an area where having a degree is not always required—to work in industry—and that makes it a healthier field, IMO. But you do need one to be a psychiatrist—and that’s one of the areas I would recommend looking if we were curious about fields which might be more liable to produce / propagate more harmful views.
This is just wrong. Ability to have direct output is not ONLY dependent on ability, but also on opportunity and context. If that is sufficiently gate-kept, there will be plenty of high-ability persons who never prove themselves.
This is only more true in proportion to how much that particular field is optimized to be dependent on gate-keeping itself, or on which success is more directly defined to be success at passing through gates. The less this is the case—such as in computer science, as we mentioned above—the more one can more directly discern for themselves their own skill. In theory, it should be possible for one to determine how skilled they are at any given task by assessing the value of their own output.
Maybe if I were to make a scale between Meditation <---> Psychiatry, I would say that this scale represents the same underlying task (mental health), but on the left side you have the task optimized for reliable self-assessment, and on the right you have the task optimized for gate-keeping. If you define your skill to be dependent on whether or not you have a psychiatric license, then only in that case would you consider there to be high-ability persons who “never prove themselves.” But this requires you to accept at face-value the signal that the credential represents—which, as I said, is why it exists—but, keep in mind also that it is an artificial pseudo-signal.
The Great Ideological Conflict: Intuitionists vs. Establishmentarians
I suppose this location would be as good as any for my response to this issue (I note that I currently have a 1-post / 3-day limit on my account, so this means this comment should be fairly long).
Looking back at my own top comments, I have noticed that what LessWrong the community tends to favor are things I would consider to be more critical in general: Not things that “disagree” on equal-footing so much as critiquing a post coming from the position as more knowledgeable and more experienced than the person I am critiquing. This means that if I wanted to “farm karma” so to speak, I could do so by limiting my comments to criticisms that are flavored with a wee bit of condescension.
When looking at what others’ consider to be your best contributions, it presents somewhat of an awkward question: Are these the same things I would rate as my best work? If not, does that mean that I was trying less hard, or taking it less seriously than I should when I wrote them? We have to consider that when one receives feedback on their work, positive or negative, one must interpret it by their own lights in order to incorporate that feedback in a way that would be considered adequate (assuming that they do indeed wish to improve themselves with respect to overall karma).
Note that it would be sort of funny if you sent a message to someone (in moderator capacity) in something like the following way:
“[User], you’ve been posting fairly frequently, and my subjective impression as well as voting patterns suggest most people aren’t finding your comments sufficiently helpful. For now I’ve given you a 1-per-day rate limit, and may evaluate as we think more about the new standards. As far as feedback goes, do you think you could try and be a little more critical to people, and flavor your posts with a wee bit of condescension? That would really help make LessWrong the well-kept garden it aims to be!”
In other words, optimize not for being “low-key aggro”, but rather for criticisms flavored with a wee bit of condescension. You don’t want to sound like the Omega, who can’t rule out that he truly deserves to be at the bottom of the status-hierarchy, just because he received some negative feedback and perhaps some moderation warnings. It’s easier to perform “established group-member” by limiting your output to critique that aims to come across as being helpful to the community as a whole, rather than the person you’re delivering it to.
The only problem with that is that when I am trying to write a post and I want feedback that’s not just from my own judgement, my mind turns to “what would people upvote more” and my own experience thus far tells me that if I were simply to alter the tone of my posts / comments as opposed to the actual content of the posts / comments, I could alter the resulting approval rating substantially more than I would otherwise think it deserves.
It’s not that I disagree with or dislike my own comments that I believe cater to the community’s expected response so much, just that I think that optimizing for that would be necessary to avoid the restrictions placed on my commenting, and I think optimizing for it enough would bring me outside of what I would consider to be my own standards and judgements for what I actually think as well as how expressing it should actually be done.
As a moderator, you have the responsibility of tuning the dials and knobs that change whatever metrics the users are optimizing for, and in this case, the weight applied to ‘impressing just the community’ as opposed to ‘just speak your mind, using your own standards for what you believe qualifies as good for yourself as well as the community’ (or equivalently, their relative ratio). You have to be pretty sure that increasing that weight is what you want. That weight applies to everyone, of course. So if it is tuned too high, then you get a situation in which everyone is optimizing for what they think everyone else thinks is good.
For the record, I don’t think that weight should be zero, but I also think that it will be non-zero somewhat naturally, so that any weight increases you apply to it with the infrastructure, rules and norms might look like they are being added to zero, but are actually being added to a non-zero initial quantity. It may be that you come to the conclusion that the optimal amount is higher than whatever you deem to be the initial amount, and so that some restrictions are still good to have. Please consider possibly subtracting out the base value from your estimations of how much stricter you want the norms on the site to be.
Deception Strategies
You can prove additional facts about the world with those values, that was the point of my usage of ‘i’ as one of the examples.
For h = 1⁄0 you can upgrade R to the projectively extended real line. If I’m not mistaken, one needs to do this in order to satisfy additional proofs in real analysis (or upgrade to this one instead).
You seem to be asking whether or not doing so in every conceivable case would prove to be useful. I’m saying that we’d be likely to know beforehand about whether or not it would be. Like finding polynomial roots, one might be inclined to wish that all polynomials with coefficients in the reals had roots, therefore, upgrading the space to the complex numbers allows one to get their wish.
I can’t upvote this sadly, because I do not have the karma, but I would if I did.
There is another post about this as well.
Don’t be too taken aback if you receive negative karma or some pushback—it is unfortunately expected for posts on this topic taking a position anti to the Orthogonality Thesis.