dxu(David Xu)

Karma: 4,541

dxu 11 Nov 2021 17:25 UTC
LW: 81 AF: 28
AF
in reply to: adamShimi’s comment on: Discussion with Eliezer Yudkowsky on AGI interventions

Similarly, the fact that they kept at it over and over with all the big improvement of DL instead of trying to adapt to prosaic Alignment sounds like evidence that they might be over attached to a specific framing, which they had trouble to discard.

I’m… confused by this framing? Specifically, this bit (as well as other bits like these)

I have to explain again and again to stressed-out newcomers that you definitely don’t need to master model theory or decision theory to do alignment, and try to steer them towards problems and questions that look like they’re actually moving the ball instead of following the lead of the “figure of authority”.

Some of the brightest and first thinkers on alignment have decided to follow their own nerd-sniping and call everyone else fakers, and when they realized they were not actually making progress, they didn’t switch to something else as much as declare everyone was still full of it

Also, I don’t know how much is related to mental health and pessimism and depression (which I completely understand can color one’s view of the world), but I would love to see the core MIRI team and EY actually try solving alignment with neural nets and prosaic AI. Starting with all their fears and caveats, sure, but then be like “fuck it, let’s just find a new way of grappling it”.

seem to be coming at the problem with [something like] a baked-in assumption that prosaic alignment is something that Actually Has A Chance Of Working?

And, like, to be clear, obviously if you’re working on prosaic alignment that’s going to be something you believe[1]. But it seems clear to me that EY/MIRI does not share this viewpoint, and all the disagreements you have regarding their treatment of other avenues of research seem to me to be logically downstream of this disagreement?

I mean, it’s possible I’m misinterpreting you here. But you’re saying things that (from my perspective) only make sense with the background assumption that “there’s more than one game in town”—things like “I wish EY/MIRI would spend more time engaging with other frames” and “I don’t like how they treat lack of progress in their frame as evidence that all other frames are similarly doomed”—and I feel like all of those arguments simply fail in the world where prosaic alignment is Actually Just Doomed, all the other frames Actually Just Go Nowhere, and conceptual alignment work of the MIRI variety is (more or less) The Only Game In Town.

To be clear: I’m pretty sure you don’t believe we live in that world. But I don’t think you can just export arguments from the world you think we live in to the world EY/MIRI thinks we live in; there needs to be a bridging step first, where you argue about which world we actually live in. I don’t think it makes sense to try and highlight the drawbacks of someone’s approach when they don’t share the same background premises as you, and the background premises they do hold imply a substantially different set of priorities and concerns.

Another thing it occurs to me your frustration could be about is the fact that you can’t actually argue this with EY/MIRI directly, because they don’t frequently make themselves available to discuss things. And if something like that’s the case, then I guess what I want to say is… I sympathize with you abstractly, but I think your efforts are misdirected? It’s okay for you and other alignment researchers to have different background premises from MIRI or even each other, and for you and those other researchers to be working on largely separate agendas as a result? I want to say that’s kind of what foundational research work looks like, in a field where (to a first approximation) nobody has any idea what the fuck they’re doing?

And yes, in the end [assuming somebody succeeds] that will likely mean that a bunch of people’s research directions were ultimately irrelevant. Most people, even. That’s… kind of unavoidable? And also not really the point, because you can’t know which line of research will be successful in advance, so all you have to go on is your best guess, which… may or may not be the same as somebody else’s best guess?

I dunno. I’m trying not to come across as too aggressive here, which is why I’m hedging so many of my claims. To some extent I feel uncomfortable trying to “police” people’s thoughts here, since I’m not actually an alignment researcher… but also it felt to me like your comment was trying to police people’s thoughts, and I don’t actually approve of that either, so...

Yeah. Take this how you will.

[1] I personally am (relatively) agnostic on this question, but as a non-expert in the field my opinion should matter relatively little; I mention this merely as a disclaimer that I am not necessarily on board with EY/MIRI about the doomed-ness of prosaic alignment.

dxu 19 Nov 2021 18:21 UTC
LW: 75 AF: 27
AF
in reply to: adamShimi’s comment on: Ngo and Yudkowsky on AI capability gains
From my (dxu’s) perspective, it’s allowable for there to be “deep fundamental theories” such that, once you understand those theories well enough, you lose the ability to imagine coherent counterfactual worlds where the theories in question are false.
To use thermodynamics as an example: the first law of thermodynamics (conservation of energy) is actually a consequence of Noether’s theorem, which ties conserved quantities in physics to symmetries in physical laws. Before someone becomes aware of this, it’s perhaps possible for them to imagine a universe exactly like our own, except that energy is not conserved; once they understand the connection implied by Noether’s theorem, this becomes an incoherent notion: you cannot remove the conservation-of-energy property without changing deep aspects of the laws of physics.
The second law of thermodynamics is similarly deep: it’s actually a consequence of there being a (low-entropy) boundary condition at the beginning of the universe, but no corresponding (low-entropy) boundary condition at any future state. This asymmetry in boundary conditions is what causes entropy to appear directionally increasing—and again, once someone becomes aware of this, it is no longer possible for them to imagine living in a universe which started out in a very low-entropy state, but where the second law of thermodynamics does not hold.
In other words, thermodynamics as a “deep fundamental theory” is not merely [what you characterized as] a “powerful abstraction that is useful in a lot of domains”. Thermodynamics is a logically necessary consequence of existing, more primitive notions—and the fact that (historically) we arrived at our understanding of thermodynamics via a substantially longer route (involving heat engines and the like), without noticing this deep connection until much later on, does not change the fact that grasping said deep connection allows one to see “at a glance” why the laws of thermodynamics inevitably follow.
Of course, this doesn’t imply infinite certainty, but it does imply a level of certainty substantially higher than what would be assigned merely to a “powerful abstraction that is useful in a lot of domains”. So the relevant question would seem to be: given my above described epistemic state, how might one convince me that the case for thermodynamics is not as airtight as I currently think it is? I think there are essentially two angles of attack: (1) convince me that the arguments for thermodynamics being a logically necessary consequence of the laws of physics are somehow flawed, or (2) convince me that the laws of physics don’t have the properties I think they do.
Both of these are hard to do, however—and for good reason! And absent arguments along those lines, I don’t think I am (or should be) particularly moved by [what you characterized as] philosophy-of-science-style objections about “advance predictions”, “systematic biases”, and the like. I think there are certain theories for which the object-level case is strong enough that it more or less screens off meta-level objections; and I think this is right, and good.
Which is to say:
The mental move I’m doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you’re describing something that doesn’t commute, addition might be a deep theory, but it’s not useful for what you want. Similarly, you could argue that given how we’re building AIs and trying to build AGI, evolution is not the deep theory that you want to use. (emphasis mine)
I think you could argue this, yes—but the crucial point is that you have to actually argue it. You have to (1) highlight some aspect of the evolutionary paradigm, (2) point out [what appears to you to be] an important disanalogy between that aspect and [what you expect cognition to look like in] AGI, and then (3) argue that that disanalogy directly undercuts the reliability of the conclusions you would like to contest. In other words, you have to do things the “hard way”—no shortcuts.
...and the sense I got from Richard’s questions in the post (as well as the arguments you made in this subthread) is one that very much smells like a shortcut is being attempted. This is why I wrote, in my other comment, that
I don’t think I have a good sense of the implied objections contained within Richard’s model. That is to say: I don’t have a good handle on the way(s) in which Richard expects expected utility theory to fail, even conditioning on Eliezer being wrong about the theory being useful. I think this important because—absent a strong model of expected utility theory’s likely failure modes—I don’t think questions of the form “but why hasn’t your theory made a lot of successful advance predictions yet?” move me very much on the object level.
I think I share Eliezer’s sense of not really knowing what Richard means by “deep fundamental theory” or “wide range of applications we hadn’t previous thought of”, and I think what would clarify this for me would have been for Richard to provide examples of “deep fundamental theories [with] a wide range of applications we hadn’t previously thought of”, accompanied by an explanation of why, if those applications hadn’t been present, that would have indicated something wrong with the theory.

dxu 16 Jan 2019 2:59 UTC
65 points
in reply to: mako yass’s comment on: Open Thread January 2019
FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed. How is that even relevant? You are being blackmailed.
I’m quoting this because, even though it’s wrong, it’s actually an incredibly powerful naive intuition. I think many people who have internalized TDT/UDT/FDT-style reasoning have forgotten just how intuitive the quoted block is. The unstated underlying assumption here (which is unstated because Schwarz most likely doesn’t even realize it is an assumption) is extremely persuasive, extremely obvious, and extremely wrong:
If you find yourself in a particular situation, the circumstances that led you to that situation are irrelevant, because they don’t change the undeniable fact that you are already here.
This is the intuition driving causal decision theory, and it is so powerful that mainstream academic philosophers are nearly incapable of recognizing it as an assumption (and a false assumption at that). Schwarz himself demonstrates just how hard it is to question this assumption: even when the opposing argument was laid right in front of him, he managed to misunderstand the point so hard that he actually used the very mistaken assumption the paper was criticizing as ammunition against the paper. (Note: this is not intended to be dismissive toward Schwarz. Rather, it’s simply meant as an illustrative example, emphasizing exactly how hard it is for anyone, Schwarz included, to question an assumption that’s baked into their model of the world.) And if even if you already understand why FDT is correct, it still shouldn’t be hard to see why the assumption in question is so compelling:
How could what happened in the past be relevant for making a decision in the present? The only thing your present decision can affect is the future, so how could there be any need to consider the past when making your decision? Surely the only relevant factors are the various possible futures each of your choices leads to? Or, to put things in Pearlian terms: it’s known that the influence of distant causal nodes is screened off by closer nodes through which they’re connected, and all future nodes are only connected to past nodes through the present—there’s no such thing as a future node that’s directly connected to a past node while not being connected to the present, after all. So doesn’t that mean the effects of the past are screened off when making a decision? Only what’s happening in the present matters, surely?
Phrased that way, it’s not immediately obvious what’s wrong with this assumption (which is the point, of course, since otherwise people wouldn’t find it so difficult to discard). What’s actually wrong is something that’s a bit hard to explain, and evidently the explanation Eliezer and Nate used in their paper didn’t manage to convey it. My favorite way of putting it, however, is this:
In certain decision problems, your counterfactual behavior matters as much—if not more—than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens. Here’s a very simple toy example of such a problem:
Omega, the alien superintelligence, predicts the outcome of a chess game between you and Kasparov. If he predicted that you’d win, he gives you $500 in reality; otherwise you get nothing.
Strictly speaking, this actually isn’t a decision problem, since the real you is never faced with a choice to make, but it illustrates the concept clearly enough: the question of whether or not you receive the $500 is entirely dependent on a chess game that never actually happened. Does that mean the chess game wasn’t real? Well, maybe; it depends on your view of Platonic computations. But one thing it definitely does not mean is that Omega’s decision was arbitrary. Regardless of whether you feel Omega based his decision on a “real” chess game, you would in fact either win or not win against Kasparov, and whether you get the $500 really does depend on the outcome of that hypothetical game. (To make this point clearer, imagine that Omega actually predicted that you’d win. Surprised by his own prediction, Omega is now faced with the prospect of giving you $500 that he never expected he’d actually have to give up. Can he back out of the deal by claiming that since the chess game never actually happened, the outcome was up in the air all along and therefore he doesn’t have to give you anything? If your answer to that question is no, then you understand what I’m trying to get at.)
So outcomes can depend on things that never actually happen in the real world. Cool, so what does that have to do with the past influencing the future? Well, the answer is that it doesn’t—at least, not directly. But that’s where the twist comes in:
Earlier, when I gave my toy example of a hypothetical chess game between you and Kasparov, I made sure to phrase the question so that the situation was presented from the perspective of your actual self, not your hypothetical self. (This makes sense; after all, if Omega’s prediction method was based on something other than a direct simulation, your hypothetical self might not even exist.) But there was another way of describing the situation:
You’re going about your day normally when suddenly, with no warning whatsoever, you’re teleported into a white void of nothingness. In front of you is a chessboard; on the other side of the chessboard sits Kasparov, who challenges you to a game of chess.
Here, we have the same situation, but presented from the viewpoint of the hypothetical you on whom Omega’s prediction is based. Crucially, the hypothetical you doesn’t know that they’re hypothetical, or that the real you even exists. So from their perspective, something random just happened for no reason at all. (Yes, yes, if Omega used some method other than a simulation to make his prediction, the hypothetical you wouldn’t have existed and wouldn’t have had a perspective—but hey, that doesn’t stop me from writing from their perspective, right? After all, real people write from the perspectives of unreal people all the time; that’s just called writing fiction. And besides, we’ve already established that real or unreal, the outcome of the game really does determine whether you get the $500, so the thoughts and feelings of the hypothetical you are nonetheless important in that they partially determine the outcome of the game.)
And now we come to the final, crucial point that makes sense of the blackmail scenario and all the other thought experiments in the paper, the point that Schwarz and most mainstream philosophers haven’t taken into account:
Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself.
When “you’re” being blackmailed, Schwarz makes the extremely natural assumption that “you” are you. But there’s no reason to suppose this is the case. The scenario never stipulates why you’re being blackmailed, only that you’re being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it’s the real you or a mere hypothetical is...
...your decision whether or not to pay up, of course.
If you cave into the blackmail and pay up, then you’re almost certainly the real deal. On the other hand, if you refuse to give in, it’s very likely that you’re simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world. So your decision doesn’t just determine the future; it also determines (with high probability) which you “you” are. And so then the problem simplifies into this: which you do you want to be?
If you’re the real you, then life kinda sucks. You just got blackmailed and you paid up, so now you’re down a bunch of money. If, on the other hand, you’re the hypothetical version of yourself, then congratulations: “you” were never real in the first place, and by counterfactually refusing to pay, you just drastically lowered the probability of your actual self ever having to face this situation and (in the process) becoming you. And when things are put that way, well, the correct decision becomes rather obvious.
But this kind of counterfactual reasoning is extremely counterintuitive. Our brains aren’t designed for this kind of thinking (well, not explicitly, anyway). You have to think about hypothetical versions of yourself that have never existed and (if all goes well) will never exist, and therefore only exist in the space of logical possibility. What does that even mean, anyway? Well, answering confused questions like that is pretty much MIRI’s goal these days, so I dunno, maybe we can ask them.
What links here?
- mako yass's comment on A Critique of Functional Decision Theory by wdmacaskill (16 Sep 2019 1:58 UTC; 6 points)

dxu 8 Aug 2021 22:18 UTC
LW: 63 AF: 35
1
AF
on: Seeking Power is Convergently Instrumental in a Broad Class of Environments
One particular example of this phenomenon that comes to mind:

In (traditional) chess-playing software, generally moves are selected using a combination of search and evaluation, where the search is (usually) some form of minimax with alpha-beta pruning, and the evaluation function is used to assign a value estimate to leaf nodes in the tree, which are then propagated to the root to select a move.

Typically, the evaluation function is designed by humans (although recent developments have changed that second part somewhat) to reflect meaningful features of chess understanding. Obviously, this is useful in order to provide a more accurate estimate for leaf node values, and hence more accurate move selection. What’s less obvious is what happens if you give a chess engine a random evaluation function, i.e. one that assigns an arbitrary value to any given position.

This has in fact been done, and the relevant part of the experiment is what happened when the resulting engine was made to play against itself at various search depths. Naively, you’d expect that since the evaluation function has no correlation with the actual value of the position, the engine would make more-or-less random moves regardless of its search depth—but in fact, this isn’t the case: even with a completely random evaluation function, higher search depths consistently beat lower search depths.

The reason for this, once the games were examined, is that the high-depth version of the engine seemed to consistently make moves that gave its pieces more mobility, i.e. gave it more legal moves in subsequent positions. This is because, given that the evaluation function assigns arbitrary values to leaf nodes, the “most extreme” value (which is what minimax cares about) will be uniformly distributed among the leaves, and hence branches with more leaves are more likely to be selected. And since mobility is in fact an important concept in real chess, this tendency manifests in a way that favors the higher-depth player.

At the time I first learned about this experiment, it struck me as merely a fascinating coincidence: the fact that selecting from a large number of nodes distributed non-uniformly leads to a natural approximation of the “mobility” concept was interesting, but nothing more. But the standpoint of instrumental convergence reveals another perspective: the concept chess players call “mobility” is actually important precisely because it emerges even in such a primitive system, one almost entirely divorced from the rules and goals of the game. In short, mobility is an instrumentally useful resource—not just in chess, but in all chess-like games, simply because it’s useful to have more legal moves no matter what your goal is.

(This is actually borne out in games of a chess variant called “suicide chess”, where the goal is to force your opponent to checkmate you. Despite having a completely opposite terminal goal to that of regular chess, games of suicide chess actually strongly resemble games of regular chess, at least during the first half. The reason for this is simply that in both games, the winning side needs to build up a dominant position before being able to force the opponent to do anything, whether that “anything” be delivering checkmate or being checkmated. Once that dominant position has been achieved, you can make use of it to attain whatever end state you want, but the process of reaching said dominant position is the same across variants.)

Additionally, there are some analogies to be drawn to the three types of utility function discussed in the post:
1. Depth-1 search is analogous to utility functions over action-observation histories (AOH), in that its move selection criterion depends only on the current position. With a random (arbitrary) evaluation function, move selection in this regime is equivalent to drawing randomly from a uniform distribution across the set of immediately available next moves, with no concern for what happens after that. There is no tendency for depth-1 players to seek mobility.
2. Depth-2 search is analogous to a utility function in a Markov decision process (MDP), in that its move selection criterion depends on the assessment of the immediate next time-step. With a random (arbitrary) evaluation function, move selection in this regime is equivalent to drawing randomly from a uniform distribution across the set of possible replies to one’s immediately available moves, which in turn is equivalent to drawing from a distribution over one’s next moves weighted by the number of replies. There is a mild tendency for depth-2 players to seek mobility.
3. Finally, maximum-depth search would be analogous to the classical utility function over observation histories (OH). With a random (arbitrary) evaluation function, move selection in this regime almost always picks whichever branch of the tree leads to the maximum possible number of terminal states. What this looks like in principle is unknown, but empirically we see that high-depth players have a strong tendency to seek mobility.
What links here?

dxu 19 Oct 2021 22:06 UTC
59 points
in reply to: Benquo’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)
Reading through the comments here, I perceive a pattern of short-but-strongly-worded comments from you, many of which seem to me to contain highly inflammatory insinuations while giving little impression of any investment of interpretive labor. It’s not [entirely] clear to me what your goals are, but barring said goals being very strange and inexplicable indeed, it seems to me extremely unlikely that they are best fulfilled by the discourse style you have consistently been employing.

To be clear: I am annoyed by this. I perceive your comments as substantially lower-quality than the mean, and moreover I am annoyed that they seem to be receiving engagement far in excess of what I believe they deserve, resulting in a loss of attentional resources that could be used engaging more productively (either with other commenters, or with a hypothetical-version-of-you who does not do this). My comment here is written for the purpose of registering my impressions, and making it common-knowledge among those who share said impressions (who, for the record, I predict are not few) that said impressions are, in fact, shared.

(If I am mistaken in the above prediction, I am sure the voters will let me know in short order.)

I say all of the above while being reasonably confident that you do, in fact, have good intentions. However, good intentions do not ipso facto result in good comments, and to the extent that they have resulted in bad comments, I think one should point this fact out as bluntly as possible, which is why I worded the first two paragraphs of this comment the way I did. Nonetheless, I felt it important to clarify that I do not stand against [what I believe to be] your causes here, only the way you have been going about pursuing those causes.

(For the record: I am unaffiliated with MIRI, CFAR, Leverage, MAPLE, the “Vassarites”, or the broader rationalist community as it exists in physical space. As such, I have no direct stake in this conversation; but I very much do have an interest in making sure discussion around any topics this sensitive are carried out in a mature, nuanced way.)

dxu 27 Sep 2021 3:00 UTC
55 points
in reply to: sullyj3’s comment on: Petrov Day 2021: Mutually Assured Destruction?
Thank you for clarifying. I think your stance is a reasonable one, and (although I maintain that your initial comment was a poor vehicle for conveying them) I am largely sympathetic to your frustrations. Knowing that your initial comment came from a place of frustration also helps to recontextualize it, which in turn helps to look past some of the rougher wording.

Having said that: while I can’t claim to speak for the mods or the admins of LW, or what they want to accomplish with the site and larger community surrounding it, I think that I personally would like to offer some further pushback. In particular, I think that there is a tension between what you term “making the vibes a little less weird”, and something that I might term “being able to visibly, publicly care about things most people haven’t thought about”.

There is an argument, perhaps, to be had about whether the Petrov Day game is something “worth” caring about, even for a group of people with a history of caring about strange things. As I wrote in a separate comment response, I don’t necessarily have a strong opinion about this. I am less involved in the “rationalist community” than many members of this site; I have not attended any LW meetups in person, have not interfaced IRL with any outstanding members of the rationalist community, and am not particularly involved in the inner circles of either the rationalist or the EA community. (Needless to say, I do not possess the launch codes for either site.)

So as far as the “community” is concerned, and the ability of its members to coordinate with themselves, I am less directly impacted than many here. Insofar as I would like the community to be able to coordinate with itself better, it is purely in the abstract—because I believe that coordination is an important resource, and communities with more of it do better in the long run. Insofar as I believe that coordination is an important resource, it’s unclear to me whether the Petrov Day game is actually useful in enhancing coordination; that would depend on many specifics, some of which I am not privy to.

However, what is clear enough to me is that there are a substantial fraction of LWers who do take the game seriously, or at least seriously enough to post comments like these:

Attention LessWrong—I am a chosen user of EA Forum and I have the codes needed to destroy LessWrong. I hereby make a no first use pledge and I will not enter my codes for any reason, even if asked to do so. I also hereby pledge to second strike—if the EA Forum is taken down, I will retaliate. [...]

Regarding your second strike pledge: it would of course be wildly disingenuous to remember Petrov’s action, which was not jumping to retaliation, by doing the opposite and jumping to retaliation. I believe you know this, and would guess that if in fact one of the sites went down, you’d do nothing but instead later post about your moral choice of not retaliating. [...]

I had one of the EA Forum’s launch codes, but I decided to permanently delete it as an arms-reduction measure. I no longer have access to my launch code, though I admit that I cannot convincingly demonstrate this.

Independently of whether I personally put any stock in this game, I strongly approve of the ability of users to do things like this—to act in ways that treat this game with far more seriousness than the objective consequences it has, to speak and behave the same way they would if the game actually had far more serious consequences—and, moreover, to be able to do so without being concerned about any “weird vibes” they may or may not be putting out in the process.

Caring about things is hard. Caring about things publicly, in a space where you can be judged for it, is more than hard: it’s uncool. There is, by default, an omnipresent pressure in social spaces to conform; and since caring is uncool, people who acquiesce to such pressures will often end up caring less.

I think this is VERY BAD. As such I am EXTREMELY STRONGLY OPPOSED to any attempts to increase this pressure in spaces that are lucky enough to have (relatively) little of it to begin with; especially attempts that use rude and inflammatory language as a social hammer to increase their force. In other words, I am extremely wary of the (often invisible) incentives behind sentences like this,

These present core LWers risk severely underestimating how off-putting this stuff is. How many people would be interested in participating in this community, constructively, if the vibes were a little less weird. [...]

I have to abandon my identification as an LW rat, because I just don’t want to be associated with it anymore.

and downright allergic to the blatantly visible incentives behind sentences like this:

Calling a website going down for a bit “destruction of real value” is technically true, but connotationally just so over the top. A website going down is just not that big a deal. I’m sorry, but it’s not. Go outside or something. It will make you feel good, I promise.

Sentences like these pressure those who read them to reshape themselves to be less unusual, and insofar as “usual” is net-negative, that corresponding pressure is also net-negative. I would like to decrease the magnitude of said pressure, and as such I will continue to push back if and when I see it.

dxu 8 Nov 2021 21:07 UTC
52 points
in reply to: JenniferRM’s comment on: Speaking of Stag Hunts
[Obvious disclaimer: I am not Duncan, my views are not necessarily his views, etc.]

It seems to me that your comment is [doing something like] rounding off Duncan’s position to [something like] conflict theory, and contrasting it to the alternative of a mistake-oriented approach. This impression mostly comes from passages like the following:

You’re sad about the world. I’m sad about it too. I think a major cause is too much poking. You’re saying the cause is too little poking. So I poked you. Now what?

If we really need to start banning the weeds, for sure and for true… because no one can grow, and no one can be taught, and errors in rationality are terrible signs that a person is an intrinsically terrible defector… then I might propose that you be banned?

And obviously this is inimical to your selfish interests. Obviously you would argue against it for this reason if you shared the core frame of “people can’t grow, errors are defection, ban the defectors” because you would also think that you can’t grow, and I can’t grow, and if we’re calling for each other’s banning based on “essentializing pro-conflict social logic” because we both think the other is a “weed”… well… I guess its a fight then?

But I don’t think we have to fight, because I think that the world is big, everyone can learn, and the best kinds of conflicts are small, with pre-established buffering boundaries, and they end quickly, and hopefully lead to peace, mutual understanding, and greater respect afterwards.

To the extent that this impression is accurate, I suspect you and Duncan are (at least somewhat) talking past each other. I don’t want to claim I have a strong model of Duncan’s stance on this topic, but the model I do have predicts that he would not endorse summaries of his positions along the lines of “people can’t grow, errors are defection, ban the defectors”; nor do I think he would endorse a summary of his prescriptions as “more poking”, “more fighting”, or “more conflict”.

Why is this an important clarification, in my view? Well, firstly, on the meta-level I should note that I don’t find the “conflict versus mistake” lens particularly convincing; my feeling is that it fails to carve reality at the joints in at least some important ways, in at least some important situations. This makes me in general suspicious of arguments that [seem to me to] depend on this lens (in the sense of containing steps that route substantially through the lens in question). Of course, this is not necessarily an indictment of that lens’ applicability in any specific case, but I think it’s worth mentioning nonetheless, just to give an idea of the kind of intuitions I’m starting with.

In terms of the argument as it applies to this specific case: I don’t think my model of Duncan particularly cares about the inherent motivations behind [what he would consider] violations of epistemic hygiene. Insofar as he does care about those motivations, I think it is only indirectly, in that he predicts different motivations will cause different reactions to pushback, and perhaps “better” motivations (to use a somewhat value-loaded term) will result in “better” reactions.

Of course, this is all very abstract, so let me be more specific: my model of Duncan predicts that there are some people on LW whose presence here is motivated (at least significantly in part) by wanting to grow as a rationalist, and also that there are some people on LW whose presence here is only negligibly motivated by that particular desire, if at all. My model of Duncan further predicts that both of these groups, sharing the common vice of being human, will at least occasionally produce epistemic violations; but model!Duncan predicts that the first group, when called out for this, is more likely to make an attempt to shift their thinking towards the epistemic ideal, whereas the second group’s likelihood of doing this is significantly lower.

Model!Duncan then argues that, if the ambient level of pushback crosses a certain threshold, this will make being a perennial member of the second group unpleasant enough to be psychologically unsustainable; either they will self-modify into a member of the first group, or (more likely) they will simply leave. Model!Duncan’s view is that the departure of such members is not a great loss to LW, and that LW should therefore strive to increase its level of ambient pushback, which (if done in a good way) translates to increasing epistemic standards on a site level.

Note that at no point does this model necessitate the frequent banning of users. Bans (or other forms of moderator action) may be one way to achieve the desired outcome, but model!Duncan thinks that the ideal process ought to be much more organic than this—which is why model!Duncan thinks the real Duncan kept gesturing to karma and voting patterns in his original post, despite there being a frame (which I read you, Jennifer, as endorsing) where karma is simply a number.

Note also that this model makes no assumption that epistemic violations (“errors”) are in any way equivalent to “defection”, intentional or otherwise. Assuming intent is not necessary; epistemic violations occur by default across the whole population, so there is no need to make additional assumptions about intent. And, on the flipside of that coin, it is not so strange to imagine that even people who are striving to escape from the default human behavior may still need gentle reminders from time to time.

(And if there are people on this site who do not so strive, and for whom the reminders in question serve no purpose but to annoy and frustrate, to the point of making them leave—well, says model!Duncan, so much the worse for them, and so much the better for LW.)

Finally, note that at no point have I made an attempt to define what, exactly, constitute “epistemic violations”, “epistemic standards”, or “epistemic hygiene”. This is because this is the point where I am least confident in my model of Duncan, and separately where I also think his argument is at its weakest. It seems plausible to me that, even if [something like] Duncan’s vision for LW were to be realized, there would be still be substantial remaining disagreement about how to evaluate certain edge cases, and that that lack of consensus could undermine the whole enterprise.

(Though my model of Duncan does interject in response to this, “It’s okay if the edge cases remain slightly blurry; those edge cases are not what matter in the vast majority of cases where I would identify a comment as being epistemically unvirtuous. What matters is that the central territory is firmed up, and right now LW is doing extremely poorly at picking even that low-hanging fruit.”)

((At which point I would step aside and ask the real Duncan what he thinks of that, and whether he thinks the examples he picked out from the Leverage and CFAR/MIRI threads constitute representative samples of what he would consider “central territory”.))

dxu 4 Apr 2023 0:43 UTC
49 points
30
on: Communicating effectively under Knightian norms
The sequence starting with this post seemed to me at the time I read it to be a good summary of reasons to reject “Knightian” uncertainty as somehow special, and it continues to seem that way as of today.

dxu 24 Nov 2023 22:26 UTC
LW: 48 AF: 24
17
AF
in reply to: paulfchristiano’s comment on: Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

It’s pretty unclear if a system that is good at answering the question “Which action would maximize the expected amount of X?” also “wants” X (or anything else) in the behaviorist sense that is relevant to arguments about AI risk. The question is whether if you ask that system “Which action would maximize the expected amount of Y?” whether it will also be wanting the same thing, or whether it will just be using cognitive procedures that are good at figuring out what actions lead to what consequences.

Here’s an existing Nate!comment that I find reasonably persuasive, which argues that these two things are correlated in precisely those cases where the outcome requires routing through lots of environmental complexity:

Part of what’s going on here is that reality is large and chaotic. When you’re dealing with a large and chaotic reality, you don’t get to generate a full plan in advance, because the full plan is too big. Like, imagine a reasoner doing biological experimentation. If you try to “unroll” that reasoner into an advance plan that does not itself contain the reasoner, then you find yourself building this enormous decision-tree, like “if the experiments come up this way, then I’ll follow it up with this experiment, and if instead it comes up that way, then I’ll follow it up with that experiment”, and etc. This decision tree quickly explodes in size. And even if we didn’t have a memory problem, we’d have a time problem—the thing to do in response to surprising experimental evidence is often “conceptually digest the results” and “reorganize my ontology accordingly”. If you’re trying to unroll that reasoner into a decision-tree that you can write down in advance, you’ve got to do the work of digesting not only the real results, but the hypothetical alternative results, and figure out the corresponding alternative physics and alternative ontologies in those branches. This is infeasible, to say the least.

Reasoners are a way of compressing plans, so that you can say “do some science and digest the actual results”, instead of actually calculating in advance how you’d digest all the possible observations. (Note that the reasoner specification comprises instructions for digesting a wide variety of observations, but in practice it mostly only digests the actual observations.)

Like, you can’t make an “oracle chess AI” that tells you at the beginning of the game what moves to play, because even chess is too chaotic for that game tree to be feasibly representable. You’ve gotta keep running your chess AI on each new observation, to have any hope of getting the fragment of the game tree that you consider down to a managable size.

Like, the outputs you can get out of an oracle AI are “no plan found”, “memory and time exhausted”, “here’s a plan that involves running a reasoner in real-time” or “feed me observations in real-time and ask me only to generate a local and by-default-inscrutable action”. In the first two cases, your oracle is about as useful as a rock; in the third, it’s the realtime reasoner that you need to align; in the fourth, all [the] word “oracle” is doing is mollifying you unduly, and it’s this “oracle” that you need to align.

Could you give an example of a task you don’t think AI systems will be able to do before they are “want”-y? At what point would you update, if ever? What kind of engineering project requires an agent to be want-y to accomplish it? Is it something that individual humans can do? (It feels to me like you will give an example like “go to the moon” and that you will still be writing this kind of post even once AI systems have 10x’d the pace of R&D.)

Here’s an existing Nate!response to a different-but-qualitatively-similar request that, on my model, looks like it ought to be a decent answer to yours as well:

a thing I don’t expect the upcoming multimodal models to be able to do: train them only on data up through 1990 (or otherwise excise all training data from our broadly-generalized community), ask them what superintelligent machines (in the sense of IJ Good) should do, and have them come up with something like CEV (a la Yudkowsky) or indirect normativity (a la Beckstead) or counterfactual human boxing techniques (a la Christiano) or suchlike.

Note that this only tangentially a test of the relevant ability; very little of the content of what-is-worth-optimizing-for occurs in Yudkowsky/Beckstead/Christiano-style indirection. Rather, coming up with those sorts of ideas is a response to glimpsing the difficulty of naming that-which-is-worth-optimizing-for directly and realizing that indirection is needed. An AI being able to generate that argument without following in the footsteps of others who have already generated it would be at least some evidence of the AI being able to think relatively deep and novel thoughts on the topic.

(The original discussion that generated this example was couched in terms of value alignment, but it seems to me the general form “delete all discussion pertaining to some deep insight/set of insights from the training corpus, and see if the model can generate those insights from scratch” constitutes a decent-to-good test of the model’s cognitive planning ability.)

(Also, I personally think it’s somewhat obvious that current models are lacking in a bunch of ways that don’t nearly require the level of firepower implied by a counterexample like “go to the moon” or “generate this here deep insight from scratch”, s.t. I don’t think current capabilities constitute much of an update at all as far as “want-y-ness” goes, and continue to be puzzled at what exactly causes [some] LLM enthusiasts to think otherwise.)

dxu 8 Nov 2021 21:40 UTC
44 points
in reply to: [DEACTIVATED] Duncan Sabien’s comment on: Speaking of Stag Hunts
- Don’t respond, and the-audience-as-a-whole, i.e. the-culture-of-LessWrong, will largely metabolize this as tacit admission that you were right, and I was unable to muster a defense because I don’t have one that’s grounded in truth
- Respond in brief, and the very culture that I’m saying currently isn’t trying to be careful with its thinking and reasoning will round-off and strawman and project onto whatever I say. This seems even likelier than usual here in this subthread, given that your first comment does this all over the place and is getting pretty highly upvoted at this point.
- Respond at length, here but not elsewhere, and try to put more data and models out there to bridge the inferential gaps (this feels doomy/useless, though, because this is a site already full of essays detailing all of the things wrong with your comments)
- Respond at length to all such comments, even though it’s easier to produce bullshit than to refute bullshit, meaning that I’m basically committing to put forth two hours of effort for every one that other people can throw at me, which is a recipe for exhaustion and demoralization and failure, and which is precisely why the OP was written. “People not doing the thing are outgunning people doing the thing, and this causes people doing the thing to give up and LessWrong becomes just a slightly less poisonous corner of a poisonous internet.”
I am less confident than you are in your points, and I am also of the opinion that both of Jennifer’s comments were posted in good faith. I wanted to say, however, that I strongly appreciate your highlighting of this dynamic, which I myself have observed play out too many times to count. I want to reinforce the norm of pointing out fucky dynamics when they occur, since I think the failure to do this is one of the primary routes through which “not enough concentration of force” can corrode discussion; that alone would have been enough to merit a strong upvote of the parent comment.

(Separately I would also like to offer commiseration, since I perceive that you are Feeling Bad at the moment. It’s not clear to me what the best way is to do this, so I settled for adding this parenthetical note.)

dxu 8 Mar 2018 10:18 UTC
44 points
on: My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms
While I liked Valentine’s recent post on kensho and its follow-ups a lot, one thing that I was annoyed by were the comments that the whole thing can’t be explained from a reductionist, third-person perspective. I agree that such an explanation can’t produce the necessary mental changes that the explanation is talking about. But it seemed wrong to me to claim that all of this would be somehow intrinsically mysterious and impossible to explain on such a level that would give people at least an intellectual understanding of what Looking and enlightenment and all that are.
Speaking as someone who’s more or less avoided participating in the kensho discussion (and subsequent related discussions) until now, I think the quoted passage pretty much nails the biggest reservation I had with respect to the topic: the language used in those threads tended to switch back and forth between factual and metaphorical with very little indication as to which mode was being used at any particular moment, to the point where I really wanted to just say, “Okay, I sort of see what you’re gesturing at and I’d love to discuss this with you in good faith, but before we get started on that, can we quickly step out of mythic mode/metaphor land/narrative thinking for a moment, just to make sure that we are all still on the same page as far as basic ontology goes, and agree that, for instance, physics and mathematics and logic are still true?”
But when other people in those threads (such as, for example, Said Achmiz) asked essentially the same question, it seemed to me (as in System-1!seemed) that Val and others would simply respond with “It doesn’t matter what basic ontology you’re using unless that ontology actually helps you Look.” Which, okay, fine, but I don’t really want to start trying to Look until I can confirm the absence of some fairly huge epistemic issues that typically plague this region of thought-space.
All of which is to say, I’m glad this post was made. ;-)
(although there is a part of me that can’t help but wonder why this post or something like it wasn’t the opener for this topic, as opposed to something that was only typed up after a couple of huge demon threads spawned)

dxu 19 Oct 2021 18:58 UTC
40 points
in reply to: jessicata’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)
I strong-upvoted both of Jessica’s comments in this thread despite disagreeing with her interpretation in the strongest possible terms; I did so because I think it is important to note that, for every “common-sense” interpretation of a community leader’s words, there will be some small minority who interpret it in some other (possibly more damaging) way—and while I think (importantly) this does not imply it is the community leader’s responsibility to manage their words in such a way that no misinterpretation is possible (which I think is simply completely unfeasible), I am nonetheless in favor of people sharing their (non-standard) interpretations, given the variation in potential responses.

As Eliezer once said (I’m paraphrasing from memory here, so the following may not be word-for-word accurate, but I am >95% confident I’m not misremembering the thrust of what he said), “The question I have to ask myself is, will this drive more than 5% of my readers insane?”

EDIT: I have located the text of the original comment. I note (with some vindication) that once again, it seems that Eliezer was sensitive to this concern way ahead of when it actually became a thing.

dxu 30 Mar 2023 2:03 UTC
39 points
28
in reply to: Noosphere89’s comment on: Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

Eliezer can’t update well on evidence at all, especially if it contradicts doom (in this case it’s not too much evidence against doom, but calling it zero evidence is inaccurate.)

I’ve noticed you repeating this claim in a number of threads, but I don’t think I’ve seen you present evidence sufficient to justify it. In particular, the last time I asked you about this, your response was basically premised on “I think current (weak) systems are going to analogize very well to stronger systems, and this analogy carries the weight of my entire argument.”

But if one denies the analogy (as I do, and as Eliezer presumably does), then that does indeed license him to update differently; in particular, it enables him to claim different conditional probabilities for the observations you put forth as evidence. You can’t (validly) criticize his updating procedure without first attacking that underlying point—which, as far as I can tell, boils down to essentially a matter of priors: you, for whatever reason, have a strong prior that experimental results from (extremely) weak systems will carry over to stronger systems, despite there being a whole host of informal arguments (many of which Eliezer made in the original Sequences) against this notion.

In summary: I disagree with the object-level claim, as well as the meta-level claim about epistemic assessment. Indeed, I would push strongly against interpreting mere disagreement as evidence of the irrationality of one’s opposition; that’s double-counting evidence. You have observed that someone disagrees with you, but until you know why they disagree, to immediately suggest, from there, that this disagreement must stem from incorrect updating procedure on their part, is to assume the conclusion.

dxu 19 Nov 2021 3:44 UTC
38 points
AF
in reply to: dxu’s comment on: Ngo and Yudkowsky on AI capability gains
Speaking from my own perspective: I definitely had a sense, reading through that section of the conversation, that Richard’s questions were somewhat… skewed? … relative to the way I normally think about the topic. I’m having some difficulty articulating the source of that skewness, so I’ll start by talking about how I think the skewness relates to the conversation itself:
I interpreted Eliezer’s remarks as basically attempting to engage with Richard’s questions on the same level they were being asked—but I think his lack of ability to come up with compelling examples (to be clear: by “compelling” here I mean “compelling to Richard”) likely points at a deeper source of disagreement (which may or may not be the same generator as the “skewness” I noticed). And if I were forced to articulate the thing I think the generator might be...
I don’t think I have a good sense of the implied objections contained within Richard’s model. That is to say: I don’t have a good handle on the way(s) in which Richard expects expected utility theory to fail, even conditioning on Eliezer being wrong about the theory being useful. I think this important because—absent a strong model of expected utility theory’s likely failure modes—I don’t think questions of the form “but why hasn’t your theory made a lot of successful advance predictions yet?” move me very much on the object level.
Probing more at the sense of skewness, I’m getting the sense that this exchange here is deeply relevant:
Richard: I’m accepting your premise that it’s something deep and fundamental, and making the claim that deep, fundamental theories are likely to have a wide range of applications, including ones we hadn’t previously thought of.
Do you disagree with that premise, in general?
Eliezer: I don’t know what you really mean by “deep fundamental theory” or “wide range of applications we hadn’t previously thought of”, especially when it comes to structures that are this simple. It sounds like you’re still imagining something I mean by Expected Utility which is some narrow specific theory like a particular collection of gears that are appearing in lots of places.
I think I share Eliezer’s sense of not really knowing what Richard means by “deep fundamental theory” or “wide range of applications we hadn’t previous thought of”, and I think what would clarify this for me would have been for Richard to provide examples of “deep fundamental theories [with] a wide range of applications we hadn’t previously thought of”, accompanied by an explanation of why, if those applications hadn’t been present, that would have indicated something wrong with the theory.
But the reason I’m calling the thing “skewness”, rather than something more prosaic like “disagreement”, is because I suspect Richard isn’t actually operating from a frame where he can produce the thing I asked for in the previous paragraphs (a strong model of where expected utility is likely to fail, a strong model of how a lack of “successful advance predictions”/”wide applications” corresponds to those likely failure modes, etc). I suspect that the frame Richard is operating in would dismiss these questions as largely inconsequential, even though I’m not sure why or what that frame actually looks like; this is a large part of the reason why I have this flagged as a place to look for a deep hidden crux.
(One [somewhat uncharitable] part of me wants to point out that the crux in question may actually just be the “usual culprit” in discussions like this: outside-view/modest-epistemology-style reasoning. This does seem to rhyme a lot with what I wrote above, e.g. it would explain why Richard didn’t seem particularly concerned with gears-level failure modes or competing models or the like, and why his line of questioning seemed mostly insensitive to the object-level details of what “advance predictions” look like, why that matters, etc. I do note that Richard actively denied being motivated by this style of reasoning later on in the dialogue, however, which is why I still have substantial uncertainty about his position.)

dxu 15 Nov 2021 5:10 UTC
LW: 38 AF: 25
AF
in reply to: Rob Bensinger’s comment on: Discussion with Eliezer Yudkowsky on AGI interventions
So, the point of my comments was to draw a contrast between having a low opinion of “experimental work and not doing only decision theory and logic”, and having a low opinion of “mainstream ML alignment work, and of nearly all work outside the HRAD-ish cluster of decision theory, logic, etc.” I didn’t intend to say that the latter is obviously-wrong; my goal was just to point out how different those two claims are, and say that the difference actually matters, and that this kind of hyperbole (especially when it never gets acknowledged later as ‘oh yeah, that’s not true and wasn’t what I meant’) is not great for discussion.
It occurs to me that part of the problem may be precisely that Adam et al. don’t think there’s a large difference between these two claims (that actually matters). For example, when I query my (rough, coarse-grained) model of [your typical prosaic alignment optimist], the model in question responds to your statement with something along these lines:
If you remove “mainstream ML alignment work, and nearly all work outside of the HRAD-ish cluster of decision theory, logic, etc.” from “experimental work”, what’s left? Perhaps there are one or two (non-mainstream, barely-pursued) branches of “experimental work” that MIRI endorses and that I’m not aware of—but even if so, that doesn’t seem to me to be sufficient to justify the idea of a large qualitative difference between these two categories.
In a similar vein to the above: perhaps one description is (slightly) hyperbolic and the other isn’t. But I don’t think replacing the hyperbolic version with the non-hyperbolic version would substantially change my assessment of MIRI’s stance; the disagreement feels non-cruxy to me. In light of this, I’m not particularly bothered by either description, and it’s hard for me to understand why you view it as such an important distinction.
Moreover: I don’t think [my model of] the prosaic alignment optimist is being stupid here. I think, to the extent that his words miss an important distinction, it is because that distinction is missing from his very thoughts and framing, not because he happened to use choose his words somewhat carelessly when attempting to describe the situation. Insofar as this is true, I expect him to react to your highlighting of this distinction with (mostly) bemusement, confusion, and possibly even some slight suspicion (e.g. that you’re trying to muddy the waters with irrelevant nitpicking).
To be clear: I don’t think you’re attempting to muddy the waters with irrelevant nitpicking here. I think you think the distinction in question is important because it’s pointing to something real, true, and pertinent—but I also think you’re underestimating how non-obvious this is to people who (A) don’t already deeply understand MIRI’s view, and (B) aren’t in the habit of searching for ways someone’s seemingly pointless statement might actually be right.
I don’t consider myself someone who deeply understands MIRI’s view. But I do want to think of myself as someone who, when confronted with a puzzling statement [from someone whose intellectual prowess I generally respect], searches for ways their statement might be right. So, here is my attempt at describing the real crux behind this disagreement:
(with the caveat that, as always, this is my view, not Rob’s, MIRI’s, or anybody else’s)
(and with the additional caveat that, even if my read of the situation turns out to be correct, I think in general the onus is on MIRI to make sure they are understood correctly, rather than on outsiders to try to interpret them—at least, assuming that MIRI wants to make sure they’re understood correctly, which may not always be the best use of researcher time)
I think the disagreement is mostly about MIRI’s counterfactual behavior, not about their actual behavior. I think most observers (including both Adam and Rob) would agree that MIRI leadership has been largely unenthusiastic about a large class of research that currently falls under the umbrella “experimental work”, and that the amount of work in this class MIRI has been unenthused about significantly outweighs the amount of work they have been excited about.
Where I think Adam and Rob diverge is in their respective models of the generator of this observed behavior. I think Adam (and those who agree with him) thinks that the true boundary of the category [stuff MIRI finds unpromising] roughly coincides with the boundary of the category [stuff most researchers would call “experimental work”], such that anything that comes too close to “running ML experiments and seeing what happens” will be met with an immediate dismissal from MIRI. In other words, [my model of] Adam thinks MIRI’s generator is configured such that the ratio of “experimental work” they find promising-to-unpromising would be roughly the same across many possible counterfactual worlds, even if each of those worlds is doing “experiments” investigating substantially different hypotheses.
Conversely, I think Rob thinks the true boundary of the category [stuff MIRI finds unpromising] is mostly unrelated to the boundary of the category [stuff most researchers would call “experimental work”], and that—to the extent MIRI finds most existing “experimental work” unpromising—this is mostly because the existing work is not oriented along directions MIRI finds promising. In other words, [my model of] Rob thinks MIRI’s generator is configured such that the ratio of “experimental work” they find promising-to-unpromising would vary significantly across counterfactual worlds where researchers investigate different hypotheses; in particular, [my model of] Rob thinks MIRI would find most “experimental work” highly promising in the world where the “experiments” being run are those whose results Eliezer/Nate/etc. would consider difficult to predict in advance, and therefore convey useful information regarding the shape of the alignment problem.
I think Rob’s insistence on maintaining the distinction between having a low opinion of “experimental work and not doing only decision theory and logic”, and having a low opinion of “mainstream ML alignment work, and of nearly all work outside the HRAD-ish cluster of decision theory, logic, etc.” is in fact an attempt to gesture at the underlying distinction outlined above, and I think that his stringency on this matter makes significantly more sense in light of this. (Though, once again, I note that I could be completely mistaken in everything I just wrote.)
Assuming, however, that I’m (mostly) not mistaken, I think there’s an obvious way forward in terms of resolving the disagreement: try to convey the underlying generators of MIRI’s worldview. In other words, do the thing you were going to do anyway, and save the discussions about word choice for afterwards.

dxu 25 Apr 2023 4:22 UTC
33 points
−1
in reply to: CarlShulman’s comment on: But why would the AI kill us?
RE: decision theory w.r.t how “other powerful beings” might respond—I really do think Nate has already argued this, and his arguments continue to seem more compelling to me than the the opposition’s. Relevant quotes include:

It’s possible that the paperclipper that kills us will decide to scan human brains and save the scans, just in case it runs into an advanced alien civilization later that wants to trade some paperclips for the scans. And there may well be friendly aliens out there who would agree to this trade, and then give us a little pocket of their universe-shard to live in, as we might do if we build an FAI and encounter an AI that wiped out its creator-species. But that’s not us trading with the AI; that’s us destroying all of the value in our universe-shard and getting ourselves killed in the process, and then banking on the competence and compassion of aliens.

[...]

Remember that it still needs to get more of what it wants, somehow, on its own superintelligent expectations. Someone still needs to pay it. There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than giant gold obelisks? The tiny amount of caring-ness coming down from the simulators is spread over far too many goals; it’s not clear to me that “a star system for your creators” outbids the competition, even if star systems are up for auction.

Maybe some friendly aliens somewhere out there in the Tegmark IV multiverse have so much matter and such diminishing marginal returns on it that they’re willing to build great paperclip-piles (and gold-obelisk totems and etc. etc.) for a few spared evolved-species. But if you’re going to rely on the tiny charity of aliens to construct hopeful-feeling scenarios, why not rely on the charity of aliens who anthropically simulate us to recover our mind-states… or just aliens on the borders of space in our universe, maybe purchasing some stored human mind-states from the UFAI (with resources that can be directed towards paperclips specifically, rather than a broad basket of goals)?

Might aliens purchase our saved mind-states and give us some resources to live on? Maybe. But this wouldn’t be because the paperclippers run some fancy decision theory, or because even paperclippers have the spirit of cooperation in their heart. It would be because there are friendly aliens in the stars, who have compassion for us even in our recklessness, and who are willing to pay in paperclips.

(To the above, I personally would add that this whole genre of argument reeks, to me, essentially of giving up, and tossing our remaining hopes onto a Hail Mary largely insensitive to our actual actions in the present. Relying on helpful aliens is what you do once you’re entirely out of hope about solving the problem on the object level, and doesn’t strike me as a very dignified way to go down!)

dxu 31 Jul 2019 18:20 UTC
33 points
in reply to: Said Achmiz’s comment on: Drive-By Low-Effort Criticism

It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value.

Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general? It seems to me that here you are taking the concept of “making things that have little or no (or even negative) value” as a primitive action—something that can be “encouraged” or “discouraged”—whereas, on the other hand, it seems to me that the true primitive action here is spending effort in the first place, and that actions taken to disincentivize the former, will in fact turn out to disincentivize the latter.

If this is in fact the case, then the question is not so simple as whether we ought to discourage posters from spending effort on making incorrect posts (to which the answer would of course be “yes, we ought”), but rather, whether we ought to discourage posters from spending effort. To this, you say:

But there is no virtue in mere effort.

Perhaps there is no “virtue” in effort, but in that case we must ask why “virtue” is the thing we are measuring. If the goal is to maximize, not “virtue”, but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-effort posts. Unless your contention is that all else is not equal (perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it’s hard to see why this should be the case a priori), then it seems to me that encouraging posters to put large amounts of effort into their posts is simply a better course of action than discouraging them.

And what does it mean to “encourage” or “discourage” a poster? Based on the following part of your comment, it seems that you are taking “discourage” to mean something along the lines of “point out ways in which the post in question is mistaken”:

If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).

But how often is it the case that a “long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced” is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations—in other words, that the hypothetical situation you describe does not reflect reality.

What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and then proceeds to post (what they believe to be) an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called “drive-by low-effort criticism” described in the OP. It may be that you disagree with this, but whether it is true or not is in a matter of factual accuracy, not opinion. However, if one happens to believe it is true, then it should not be difficult to understand why one might prefer to see less of the described behavior.

dxu 17 Feb 2015 2:35 UTC
33 points
on: Harry Potter and the Methods of Rationality discussion thread, February 2015, chapters 105-107
...Huh. Unless I’ve been Obliviated, the summary for HPMoR on fanfiction.net used to show “Hermione G., Harry P.” on its character list. Now it says “Hermione G., Tom R. Jr.” instead.

EY really doesn’t miss anything, does he?

dxu 21 Oct 2021 4:35 UTC
32 points
in reply to: Scott Garrabrant’s comment on: My experience at and around MIRI and CFAR (inspired by Zoe Curzi’s writeup of experiences at Leverage)

I agree that MIRI has strong (statistical) bias towards things that were invented internally. It is currently not clear to me how much of this statistical bias is also a mistake vs the correct reaction to how much internally invented things seem to fit our needs, and how hard it is to find the good stuff that exists externally when it exists. (I think there a lot of great ideas out there that I really wish I had, but I dont have a great method for filtering for in in the sea of irrelevant stuff.)

Strong-upvoted for this paragraph in particular, for pointing out that the strategy of “seeking out disagreement in order to learn” (which obviously isn’t how hg00 actually worded it, but seems to me descriptive of their general suggested attitude/approach) has real costs, which can sometimes be prohibitively high.

I often see this strategy contrasted with a group’s default behavior, and when this happens it is often presented as [something like] a Pareto improvement over said default behavior, with little treatment (or even acknowledgement) given to the tradeoffs involved. I think this occurs because the strategy in question is viewed as inherently virtuous (which in turn I fundamentally see as a consequence of epistemic learned helplessness run rampant, leaking past the limits of any particular domain and seeping into a general attitude towards anything considered sufficiently “hard” [read: controversial]), and attributing “virtuousness” to something often has the effect of obscuring the real costs and benefits thereof.

dxu 13 Mar 2023 23:31 UTC
30 points
27
on: Discussion with Nate Soares on a key alignment difficulty
Generic (but strong) upvote for more public cruxing (ish) discussions between MIRI and outsiders!