Steve_Rayhawk

Karma: 1,042

Steve_Rayhawk 15 Feb 2010 14:01 UTC
74 points
on: You’re Entitled to Arguments, But Not (That Particular) Proof
I think peoples’ decision about whether to accept or resist the AGW proposition is being complicated by an implicit negotiation over political power that’s inevitably attached to that decision.

Because the scientific projections are still vague, people feel as if their decision about whether to believe in AGW is underdetermined by the evidence, in such a way that political actors in the future will feel entitled to retrospectively interpret their decision for purposes of political precedent. (“Were they forced by the evidence, or did they feel weak enough that they made a concession they didn’t have to make?”) And the precedent won’t be induced in terms of the mental states that a perfect decision theorist, thinking about the AGW mitigation decision problem, would have had. The precedent will be in terms of the mental states that a normal non-scientifically-trained (but politically active) human would have had. One of those mental states would be uncertainty about whether scientists (unconsciously intuited as potentially colluding with, and/or hoping to become, power-grubbing environmental regulators) are just making AGW up. In that context, agreeing that AGW is probably real feels like ceding one’s right of objection to whatever seizures of power someone’s found some vague scientific way of justifying.

It becomes a signaling game, in which each choice of belief will be understood as exactly how you would communicate a particular choice of political move, and the costs of making the wrong political move feel very high. So the belief decisions and the political actions become tangled up.

Roughly, people have no way of saying:

I believe that in terms of pure decision theory, the predicted AGW damage and costs of further investigation and costs of delay are high enough that mitigation attempts should start now. But I don’t want to give up my {economic privileges / substantive national sovereignty / chance to get the standard of living of past carbon-emitting nations} without a fight, because I don’t want groups in the future like {scientists / profit-hating hippie tree-huggers / freedom-hating U.N. environmental bureaucrats / greedy unfair first-world hypocrites} to think I’ll just roll over when they try to impose concessions on me, in the name of premises that will feel psychologically as though they might just as well have been made up. In that future situation, it will be important for me to be able to credibly threaten outrage at being forced into such concessions. But as long as nobody else is going to take me for their fool, the sacrifices needed to prevent AGW are fine with me; we could start today.

So instead, they say:

I believe that the case for AGW isn’t strong enough. I demand clearer proof.

If it were possible to negotiate separately about AGW action and about precedents of policy concessions to e.g. scientists’ claims, then you might see less decision-theoretic insanity around the AGW action question itself.

(Note—most of this analysis is not on the basis of such data as opinion polls or controlled studies. It’s just from introspecting on my experience of attempting to empathize with the state of mind of AGW disputants, as recalled mostly from Internet forums.)
What links here?
- Will_Newsome's comment on Global warming is a better test of irrationality that theism by Stuart_Armstrong (17 Mar 2012 0:10 UTC; 9 points)
- Will_Newsome's comment on Our Phyg Is Not Exclusive Enough by [deleted] (15 Apr 2012 10:22 UTC; 5 points)

Steve_Rayhawk 17 Nov 2012 4:27 UTC
59 points
in reply to: AlphaOmega’s comment on: What does the world look like, the day before FAI efforts succeed?
The main way complexity of this sort would be addressable is if the intellectual artifact that you tried to prove things about were simpler than the process that you meant the artifact to unfold into. For example, the mathematical specification of AIXI is pretty simple, even though the hypotheses that AIXI would (in principle) invent upon exposure to any given environment would mostly be complex. Or for a more concrete example, the Gallina kernel of the Coq proof engine is small and was verified to be correct using other proof tools, while most of the complexity of Coq is in built-up layers of proof search strategies which don’t need to themselves be verified, as the proofs they generate are checked by Gallina.

Isn’t that as unbelievable as the idea that you can prove that a particular zygote will never grow up to be an evil dictator? Surely this violates some principles of complexity, chaos [...]

Yes, any physical system could be subverted with a sufficiently unfavorable environment. You wouldn’t want to prove perfection. The thing you would want to prove would be more along the lines of, “will this system become at least somewhere around as capable of recovering from any disturbances, and of going on to achieve a good result, as it would be if its designers had thought specifically about what to do in case of each possible disturbance?”. (Ideally, this category of “designers” would also sort of bleed over in a principled way into the category of “moral constituency”, as in CEV.) Which, in turn, would require a proof of something along the lines of “the process is highly likely to make it to the point where it knows enough about its designers to be able to mostly duplicate their hypothetical reasoning about what it should do, without anything going terribly wrong”.

We don’t know what an appropriate formalization of something like that would look like. But there is reason for considerable hope that such a formalization could be found, and that this formalization would be sufficiently simple that an implementation of it could be checked. This is because a few other aspects of decision-making which were previously mysterious, and which could only be discussed qualitatively, have had powerful and simple core mathematical descriptions discovered for cases where simplifying modeling assumptions perfectly apply. Shannon information was discovered for the informal notion of surprise (with the assumption of independent identically distributed symbols from a known distribution). Bayesian decision theory was discovered for the informal notion of rationality (with assumptions like perfect deliberation and side-effect-free cognition). And Solomonoff induction was discovered for the informal notion of Occam’s razor (with assumptions like a halting oracle and a taken-for-granted choice of universal machine). These simple conceptual cores can then be used to motivate and evaluate less-simple approximations for situations where where the assumptions about the decision-maker don’t perfectly apply. For the AI safety problem, the informal notions (for which the mathematical core descriptions would need to be discovered) would be a bit more complex—like the “how to figure out what my designers would want to do in this case” idea above. Also, you’d have to formalize something like our informal notion of how to generate and evaluate approximations, because approximations are more complex than the ideals they approximate, and you wouldn’t want to need to directly verify the safety of any more approximations than you had to. (But note that, for reasons related to Rice’s theorem, you can’t (and therefore shouldn’t want to) lay down universally perfect rules for approximation in any finite system.)

Two other related points are discussed in this presentation: the idea that a digital computer is a nearly deterministic environment, which makes safety engineering easier for the stages before the AI is trying to influence the environment outside the computer, and the idea that you can design an AI in such a way that you can tell what goal it will at least try to achieve even if you don’t know what it will do to achieve that goal. Presumably, the better your formal understanding of what it would mean to “at least try to achieve a goal”, the better you would be at spotting and designing to handle situations that might make a given AI start trying to do something else.

(Also: Can you offer some feedback as to what features of the site would have helped you sooner be aware that there were arguments behind the positions that you felt were being asserted blindly in a vacuum? The “things can be surprisingly formalizable, here are some examples” argument can be found in lukeprog’s “Open Problems Related to the Singularity” draft and the later “So You Want to Save the World”, though the argument is very short and hard to recognize the significance of if you don’t already know most of the mathematical formalisms mentioned. A backup “you shouldn’t just assume that there’s no way to make this work” argument is in “Artificial Intelligence as a Positive and Negative Factor in Global Risk”, pp 12-13.)

what will prevent them from becoming “bad guys” when they wield this much power

That’s a problem where successful/practically applicable formalizations are harder to hope for, so it’s been harder for people to find things to say about it that pass the threshold of being plausible conceptual progress instead of being noisy verbal flailing. See the related “How can we ensure that a Friendly AI team will be sane enough?”. But it’s not like people aren’t thinking about the problem.
What links here?
- Will_Newsome's comment on Outside View(s) and MIRI’s FAI Endgame by Wei Dai (29 Aug 2013 1:26 UTC; 3 points)
- Will_Newsome's comment on Questions to ask theist philosophers? I will soon be speaking with several by kokotajlod (16 Jun 2014 9:05 UTC; 1 point)

Steve_Rayhawk 23 Mar 2009 9:29 UTC
31 points
on: Cached Selves
We left out one strategy because we didn’t have scientific support for it. But introspectively:

3. Change or weaken your brain’s notion of “consistent”. Your brain has to be using prediction and classification methods in order to generate “consistent” behavior, and these can be hacked.
- 3f. Your brain remembers which “simple” predictor best described your decision, so change the pool of predictors for describing your decisions that your brain counts as “simple”.
- - 3fi. Learn to judge yourself, not by the best inference or decision you could have made in hindsight, but by the best inference or decision you could realistically have made at the time. This way, poorly-informed or impaired past decisions are evidence of your past poor information or impairment, and are not evidence that it is hopeless to try to make a better decision now that you have better information or are less impaired. To help your brain count this standard of judgment as “simple”, you may wish to make a habit of judging everyone this way.
- - 3fii. Your brain learns to predict other peoples’ judgments by learning which systems of predictive categories other people count as “natural”. If you have to predict other peoples’ judgments a lot, your brain starts to count their predictive categories as “natural”. The effect can be viral (especially with categories whose social definitions tacitly refer to Schelling focal points, pooling equilibria, or punishment of non-punishers, with a penalty for disagreeing about the boundary), and it can change how you think about yourself. Try to control social access to your brain’s pool of “simple”, “natural” predictive categories, and try to unpack category definitions so that when they are not simple, your brain sees how. (Or try to live so that your most intense experiences of thinking only make predictions of physical consequences, and not predictions of other peoples’ judgments.)
On this subject: Autistic spectrum conditions and obtuse concrete literalism are sometimes a good temporary defense against other peoples’ unnatural category systems, but you should have a backup plan.
What links here?

Steve_Rayhawk 10 Jul 2012 9:24 UTC
30 points
in reply to: Kawoomba’s comment on: An Intuitive Explanation of Solomonoff Induction
He identifies subtleties, but doesn’t look very hard to see whether other people could have reasonably supposed that the subtleties resolve in a different way than he thinks they “obviously” do. Then he starts pre-emptively campaigning viciously for contempt for everyone who draws a different conclusion than the one from his analysis. Very trigger-happy.

This needlessly pollutes discussion… that is to say, “needless” in the moral perspective of everyone who doesn’t already believe that most people who first appear wrong by that criterion that way in fact are wrong, and negligently and effectively incorrigibly so, such that there’d be nothing to lose by loosing broadside salvos before the discussion has even really started. (Incidentally, it also disincentivizes the people who could actually explain the alternative treatment of the subtleties from engaging with him, by demonstrating a disinclination to bother to suppose that their position might be reasonble.) This perception of needlessness, together with the usual assumption that he must already be on some level aware of other peoples’ belief in that needlessness but is disregarding that belief, is where most of the negative affect toward him comes from.

Also, his occasional previous lack of concern for solid English grammar didn’t help the picture of him as not really caring about the possibility that the people he was talking to might not deserve the contempt for them that third parties would inevitably come away with the impression that he was signaling.

(I wish LW had more people who were capable of explaining their objections understandably like this, instead of being stuck with a tangle of social intuitions which they aren’t capable of unpacking in any more sophisticated way than by hitting the “retaliate” button.)

Steve_Rayhawk 17 Nov 2012 1:40 UTC
25 points
in reply to: [deleted]’s comment on: What does the world look like, the day before FAI efforts succeed?

you know what I mean.

Right, but this is a public-facing post. A lot of readers might not know why you could think it was obvious that “good guys” would imply things like information security, concern for Friendliness so-named, etc., and they might think that the intuition you mean to evoke with a vague affect-laden term like “good guys” is just the same argument-disdaining groupthink that would be implied if they saw it on any other site.

To prevent this impression, if you’re going to use the term “good guys”, then at or before the place where you first use it, you should probably put an explanation, like

(I.e. people who are familiar with the kind of thinking that can generate arguments like those in “The Detached Lever Fallacy”, “Fake Utility Functions” and the posts leading up to it, “Anthropomorphic Optimism” and “Contaminated by Optimism”, “Value is Fragile” and the posts leading up to it, and the “Envisioning perfection” and “Beyond the adversarial attitude” discussions in Creating Friendly AI or most of the philosophical discussion in Coherent Extrapolated Volition, and who understand what it means to be dealing with a technology that might be able to bootstrap to the singleton level of power that could truly engineer a “forever” of the “a boot stamping on a human face — forever” kind.)

Steve_Rayhawk 12 Oct 2010 12:10 UTC
25 points
on: Swords and Armor: A Game Theory Thought Experiment
Gambit said the only equilibrium was mixed, with ¹⁄₅ each of (blue sword, blue armor), (blue sword, green armor), (yellow sword, yellow armor), (green sword, yellow armor), and (green sword, green armor).

With a stylin’ bonus of ε points per duel (if a win is 1 point and a loss is −1 points), Gambit says for ε≤1/4 the equilibrium is:
(blue sword, blue armor): 1/5−(4/5)ε
(blue sword, green armor): 1/5−(3/5)ε
(yellow sword, yellow armor): 1/5+(4/5)ε
(green sword, yellow armor): 1/5+(3/5)ε
(green sword, green armor): ¹⁄₅

Steve_Rayhawk 21 Oct 2012 10:10 UTC
22 points
in reply to: lukeprog’s comment on: Thoughts on the Singularity Institute (SI)

these are all literally from the Nonprofits for Dummies book. [...] The history I’ve heard is that SI [...]

\

failed to read Nonprofits for Dummies,

I remember that, when Anna was managing the fellows program, she was reading books of the “for dummies” genre and trying to apply them… it’s just that, as it happened, the conceptual labels she accidentally happened to give to the skill deficits she was aware of were “what it takes to manage well” (i.e. “basic management”) and “what it takes to be productive”, rather than “what it takes to (help) operate a nonprofit according to best practices”. So those were the subjects of the books she got. (And read, and practiced.) And then, given everything else the program and the organization was trying to do, there wasn’t really any cognitive space left over to effectively notice the possibility that those wouldn’t be the skills that other people afterwards would complain that nobody acquired and obviously should have known to. The rest of her budgeted self-improvement effort mostly went toward overcoming self-defeating emotional/social blind spots and motivated cognition. (And I remember Jasen’s skill learning focus was similar, except with more of the emphasis on emotional self-awareness and less on management.)

failed to ask advisors for advice,

I remember Anna went out of her way to get advice from people who she already knew, who she knew to be better than her at various aspects of personal or professional functioning. And she had long conversations with supporters who she came into contact with for some other reasons; for those who had executive experience, I expect she would have discussed her understanding of SIAI’s current strategies with them and listened to their suggestions. But I don’t know how much she went out of her way to find people she didn’t already have reasonably reliable positive contact with, to get advice from them.

I don’t know much about the reasoning of most people not connected with the fellows program about the skills or knowledge they needed. I think Vassar was mostly relying on skills tested during earlier business experience, and otherwise was mostly preoccupied with the general crisis of figuring out how to quickly-enough get around the various hugely-saliently-discrepant-seeming-to-him psychological barriers that were causing everyone inside and outside the organization to continue unthinkingly shooting themselves in the feet with respect to this outside-evolutionary-context-problem of existential risk mitigation. For the “everyone outside’s psychological barriers” side of that, he was at least successful enough to keep SIAI’s public image on track to trigger people like David Chalmers and Marcus Hutter into meaningful contributions to and participation in a nascent Singularity-studies academic discourse. I don’t have a good idea what else was on his mind as something he needed to put effort into figuring out how to do, in what proportions occupying what kinds of subjective effort budgets, except that in total it was enough to put him on the threshold of burnout. Non-profit best practices apparently wasn’t one of those things though.

But the proper approach to retrospective judgement is generally a confusing question.

the kind of thing that makes me want to say [. . .]

The general pattern, at least post-2008, may have been one where the people who could have been aware of problems felt too metacognitively exhausted and distracted by other problems to think about learning what to do about them, and hoped that someone else with more comparative advantage would catch them, or that the consequences wouldn’t be bigger than those of the other fires they were trying to put out.

strategic plan [...] SI failed to make these kinds of plans in the first place,

There were also several attempts at building parts of a strategy document or strategic plan, which together took probably 400-1800 hours. In each case, the people involved ended up determining, from how long it was taking, that, despite reasonable-seeming initial expectations, it wasn’t on track to possibly become a finished presentable product soon enough to justify the effort. The practical effect of these efforts was instead mostly just a hard-to-communicate cultural shared understanding of the strategic situation and options—how different immediate projects, forms of investment, or conditions in the world might feed into each other on different timescales.

expenses tracking, funds monitoring [...] some funds monitoring was insisted upon after the large theft

There was an accountant (who herself already cost like $33k/yr as the CFO, despite being split three ways with two other nonprofits) who would have been the one informally expected to have been monitoring for that sort of thing, and to have told someone about it if she saw something, out of the like three paid administrative slots at the time… well, yeah, that didn’t happen.

I agree with a paraphrase of John Maxwell’s characterization: “I’d rather hear Eliezer say ‘thanks for funding us until we stumbled across some employees who are good at defeating their akrasia and [had one of the names of the things they were aware they were supposed to] care about [happen to be “]organizational best practices[“]’, because this seems like a better depiction of what actually happened.” Note that this was most of the purpose of the Fellows program in the first place—to create an environment where people could be introduced to the necessary arguments/ideas/culture and to help sort/develop those people into useful roles, including replacing existing management, since everyone knew there were people who would be better at their job than they were and wished such a person could be convinced to do it instead.

Steve_Rayhawk 22 Jul 2012 23:54 UTC
21 points
in reply to: Wei Dai’s comment on: Work on Security Instead of Friendliness?

It’s not something you can ever come close to competing with by a philosophy invented from scratch.

I don’t understand what you mean by this.

A sufficient cause for Nick to claim this would be that he believed that no human-conceivable AI design would be able to incorporate by any means, including by reasoning from first principles or even by reference, anything functionally equivalent to the results of all the various dynamics of updating that have (for instance) made present legal systems as (relatively) robust (against currently engineerable methods of exploitation) as they are.

This seems somewhat strange to you, because you believe humans can conceive of AI designs that could reason some things from first principles (given observations of the world that the reasoning needed to be relevant to, plus reasonably anticipatable advantages of computing power over single humans) or incorporate results by reference.

One possible reason he might believe this would be that he believed that, whenever a human reasons about history or evolved institutions, there are something like two distinct levels of a computational complexity hierarchy at work, and that the powers of the greater level (history and the evolution of institutions) are completely inacessible to the powers of the lesser level (the human). (The machines representing the two levels in this case might be “the mental states accessible to a single armchair philosophy community”, or, alternatively, “fledgling AI which, per a priori economic intuition, has no advantage over a few philosophers”, versus “the physical states accessible in human history”.)

This belief of his might be charged with a sort of independent half-intuitive aversion to making the sorts of (frequently catastrophic) mistakes that are routinely made by people who think they can metaphorically breach this complexity barrier. One effect of such an aversion would be that he would intuitively anticipate that he would always be, at least in expected value, wrong to agree with such people, no matter what arguments they could turn out to have. That is, it wouldn’t increase his expected rightness to check to see if they were right about some proposed procedure to get around the complexity barrier, because, intuitively, the prior probability that they were wrong, the conditional probability that they would still be wrong despite being persuasive by any conventional threshold, and the wrongness of the cost that had empirically been inflicted on the world by mistakes of that sort, would all be so high. (I took his reference to Hayek’s Fatal Conceit, and the general indirect and implicitly argued emotional dynamic of this interaction, to be confirmation of this intuitive aversion.) By describing this effect explicitly, I don’t mean to completely psychologize here, or make a status move by objectification. Intuitions like the one I’m attributing can (and very much should!), of course, be raised to the level of verbally presented propositions, and argued for explicitly.

(For what it’s worth, the most direct counter to the complexity argument expressed this way is: “with enough effort it is almost certainly possible, even from this side of the barrier, to formalize how to set into motion entities that would be on the other side of the barrier”. To cover the pragmatics of the argument, one would also need to add: “and agreeing that this amount of effort is possible can even be safe, so long as everyone who heard of your agreement was sufficiently strongly motivated not to attempt shortcuts”.)

Another, possibly overlapping reason would have to do with the meta level that people around here normally imagine approaching AI safety problems from—that being, “don’t even bother trying to invent all the required philosophy yourself; instead do your best to try to formalize how to mechanically refer to the process that generated, and could continue to generate, something equivalent to the necessary philosophy, so as to make that process happen better or at least to maximally stay out of its way” (“even if this formalization turns out to be very hard to do, as the alternatives are even worse”). That meta level might be one that he doesn’t really think of as even being possible. One possible reason for this would be that he weren’t aware that anyone actually ever meant to refer to a meta level that high, so that he never developed a separate concept for it. Perhaps when he first encountered e.g. Eliezer’s account of the AI safety philosophy/engineering problem, the concept he came away with was based on a filled-in assumption about the default mistake that Eliezer must have made and the consequent meta level at which Eliezer meant to propose that the problem should be attacked, and that meta level was far too low for success to be conceivable, and he didn’t afterwards ever spontaneously find any reason to suppose you or Eliezer might not have made that mistake. Another possible reason would be that he disbelieved, on the above-mentioned a priori grounds, that the proposed meta level was possible at all. (Or, at least, that it could ever be safe to believe that it were possible, given the horrors perpetrated and threatened by other people who were comparably confident in their reasons for believing similar things.)

Steve_Rayhawk 24 Jul 2009 7:47 UTC
21 points
on: The Nature of Offense
But “status”, itself, still seems like a black box.

I think “offense” is one emotion caused by a human-universal ability to recognize states of mind in other people that can motivate those people to take actions disadvantageous to oneself or one’s allies, and to predict that tolerating an action associated with such a state of mind will set a disadvantageous precedent.

The precedent would be as if the other person had negotiated a right to take those disadvantageous actions, and as if they might later negotiate a right to take actions even more disadvantageous.

The state of mind “I am thinking of someone who has low status” is just one possible such state of mind. Other possibilities are “objectification”, “depersonalization”, “violent anger”, “unwillingness to imitate sanity”, “intent to theive”, “intent to deceive”, “sexual interest”, “intent to slack”, “unwillingness to obey a shared lord”, “intent to obey the letter of the law and not the spirit”, “intent to reduce a people to slavery”… But some of these cause “offense” and some cause other emotions. What is the pattern?

I think “offense” only happens for states of mind that might cause another person to think of oneself (or maybe one’s group) as having some intrinsic property disadvantageous to oneself. “Low status” is one such property. “Non-personhood” is another one. And if you are offended at a peer’s unwillingness to obey a shared lord, maybe it is because the lord might think of you as having the intrinsic property “disobedient” and that would be disadvantageous to you. But maybe there’s something I didn’t think of.

[ETA] Of the different states of mind that can cause “offense”, it is possible to negotiate different precedents of levels of associated behavior. It is possible to negotiate a precedent of high toleration of “depersonalization”-associated behavior and low toleration of “sexual interest”-associated behavior or vice versa. If offense was only about status, these differences would have to be part of the machinery of status. [/ETA]

My explanation also uses the ideas of “negotiated”, “precedent”, and “right”. I don’t know exactly how those ideas can be reduced to many-player game theory. But some related ideas and intuitions are in David Friedman’s “A Positive Account of Property Rights”, George Ainslie’s Breakdown of Will, and Eliezer Yudkowsky’s post Interpersonal Morality. (Has anyone written about punishment of nonpunishers and punishment of too-eager punishers as two parts of the same problem?)

I also had comments here, here, here, and here that I thought had good ideas.
What links here?
- Steve_Rayhawk's comment on The Contrarian Status Catch-22 by Eliezer Yudkowsky (20 Dec 2009 23:36 UTC; 7 points)

Steve_Rayhawk 29 Apr 2009 3:27 UTC
21 points
on: What is control theory, and why do you need to know about it?
I have not studied control theory, but I think a PID controller may be the Bayes-optimal controller if:
- the system is a second-order linear system with constant coefficients,
- the system is controllable,
- all disturbances in the system are additive white noise forcing terms,
- there is no noise in perception,
- the cost functional is the integral of the square of the error,
- the time horizon is infinite in both directions (no transients), and
- the prior belief distribution over possible reference signals is the same as if the reference signal was a Brownian motion (which needs first-order control) plus an integral of a Brownian motion (which needs second-order control).
What makes it a full Bayesian decision problem is the prior belief distribution over possible reference signals. At each time, you don’t know what the future reference signal is going to be, but you have a marginal posterior belief distribution over possible future reference signals given what the reference signal has been in the past. Part of this knowledge about possible future reference signals is represented in the state of the system you have been controlling, and part of it is represented in the state of the I element of the controller. You also don’t know what the delayed effects of past disturbances will be, but you have a marginal posterior belief distribution over possible future delayed effects given what the perception signal has been in the past. Part of this knowledge is also represented in the system and in the I element. (Not all of your knowledge about possible future reference signals and possible future delayed effects of past disturbances is represented, only your knowledge about possible future differences between them.) This representation is related to sufficient statistics (“sufficiency is the property possessed by a statistic . . . when no other statistic which can be calculated from the same sample provides any additional information”) and to updating of the parameter for a parametric family of belief distributions.

In a real engineering problem, the true belief about expected possible reference signals would be more specific than a belief of a random Brownian motion. But if a reference signal would not be improbable for Brownian motion, then a PID controller can still do well on that reference signal.

I think these conditions are sufficient but not necessary. If I knew control theory I would tell you more general conditions. If the cost functional has a term for the integral of the squared control signal, then a PID controller may not be optimal without added filters to keep the control signal from having infinite power.

Example 6.3-1 in Optimal Control and Estimation by Robert F. Stengel (1994 edition, pp. 540-541) is about PID controllers as optimal regulators in linear-quadratic-Gaussian control problems.

I see optimal control theory as the shared generalization of Bayesian decision networks and dynamic Bayesian networks in the continuous-time limit. (Dynamic Bayesian networks are Bayes nets which model how variables change over discretized time steps. When the time step size goes to zero and the variables are continuous, the limit is stochastic differential equations such as the equations of Brownian motion. When the time step size goes to zero and the variables are discrete, the limit is almost Uri Nodelman’s continuous-time Bayesian networks. Bayesian decision networks are Bayes nets which represent a decision problem and contain decision nodes, utility nodes, and information arcs.)
What links here?

Steve_Rayhawk 9 Jan 2012 6:19 UTC
20 points
on: Q&A with experts on risks from AI #1
I think experts’ opinions on the possibility of AI self-improvement may covary with their awareness of work on formal, machine-representable concepts of optimal AI design, particularly Solomonoff induction, including its application to reinforcement learning as in AIXI, and variations of Levin search such as Hutter’s algorithm M and Gödel machines. If an expert is unaware of those concepts, this unawareness may serve to explain away the expert’s belief that there are no approaches to engineering self-improvement-capable AI on any foreseeable horizon.

If it’s not too late, you should probably include a question to judge the expert’s awareness of these concepts in your questionnaires, such as:

“Qn: Are you familiar with formal concepts of optimal AI design which relate to searches over complete spaces of computable hypotheses or computational strategies, such as Solomonoff induction, Levin search, Hutter’s algorithm M, AIXI, or Gödel machines?”

...bearing in mind that the presence of such a question may affect their other answers.

(This was part of what I was getting at with my analysis of the AAAI panel interim report: “What cached models of the planning abilities of future machine intelligences did the academics have available [...]?” “What fraction of the academics are aware of any current published AI architectures which could reliably reason over plans at the level of abstraction of ‘implement a proxy intelligence’?”)

Other errors which might explain away an expert’s unconcern for AI risk are:
- incautious thinking about the full implications of a given optimization criterion or motivational system;
- when considering AI self-improvement scenarios, incautious thinking about parameter uncertainty and structural uncertainty in economic descriptions of computational complexity costs and efficiency gains over time (particularly given that a general AI will be motivated to investigate many different possible structures for the process for self-improvement, including structures one may not oneself have considered, in order to choose a process whose economics are as favorable as possible); and
- incomplete reasoning about options for gathering information about technical factors affecting AI risk scenarios, when considering the potential relative costs of delaying AI safety projects until better information is available (on the implicit expectation that, in the event that the technical factors turn out to imply safety, delaying will have prevented the cost of the AI safety projects, and (more viscerally) that having advocated delay will prevent one’s own loss of prestige, unthinkingly taken as a proxy for correctness, whereas failure to have advocated an immediate start to AI safety projects could not result in loss of one’s own prestige in any event).
However, it’s harder to find uncontroversial questions which would be diagnostic of these errors.
What links here?
- XiXiDu's comment on Q&A with experts on risks from AI #2 by XiXiDu (9 Jan 2012 20:08 UTC; 14 points)
- [Template] Questions regarding possible risks from artificial intelligence by XiXiDu (10 Jan 2012 11:59 UTC; 10 points)

Steve_Rayhawk 16 Nov 2009 4:31 UTC
19 points
in reply to: Eliezer Yudkowsky’s comment on: Less Wrong Q&A with Eliezer Yudkowsky: Ask Your Questions
Roughly, what I expect to happen by default is no modular analysis at all—just snap consideration and snap judgment. I feel little need to explain such.

You, or somebody anyway, could still offer a modular causal model of that snap consideration and snap judgment. For example:
- What cached models of the planning abilities of future machine intelligences did the academics have available when they made the snap judgment?
  - What fraction of the academics are aware of any current published AI architectures which could reliably reason over plans at the level of abstraction of “implement a proxy intelligence”?
  - What fraction of them have thought carefully about when there might be future practical AI architectures that could do this?
  - What fraction use a process for answering questions about the category distinctions that will be known in the future, which uses as an unconscious default the category distinctions known in the present?
- What false claims have been made about AI in the past? What decision rules might academics have learned to use, to protect themselves from losing prestige for being associated with false claims like those?
  - How much do those decision rules refer to modular causal analyses of the object of a claim and of the fact that people are making the claim?
  - How much do those decision rules refer to intuitions about other peoples’ states of mind and social category memberships?
  - How much do those decision rules refer to intuitions about other peoples’ intuitive decision rules?
  - Historically, have peoples’ own abilities to do modular causal analyses been good enough to make them reliably safe from losing prestige by being associated with false claims? What fraction of academics have the intuitive impression that their own ability to do analysis isn’t good enough to make them reliably safe from losing prestige by association with a false claim, so that they can only be safe if they use intuitions about the states of mind and social category memberships of a claim’s proponents?
- Of those AI academics who believe that a machine intelligence could exist which could outmaneuver humans if motivated, how do they think about the possible motivations of a machine intelligence?
  - What fraction of them think about AI design in terms of a formalism such as approximating optimal sequential decision theory under a utility function? How easy would it be for them to substitute anthropomorphic intuitions for correct technical predictions?
  - What fraction of them think about AI design in terms of intuitively justified decision heuristics? How easy would it be for them to substitute anthropomorphic intuitions for correct technical predictions?
  - What fraction of them understand enough evolutionary psychology and/or cognitive psychology to recognize moral evaluations as algorithmically caused, so that they can reject the default intuitive explanation of the cause of moral evaluations, which seems to be: “there are intrinsic moral qualities attached to objects in the world, and when any intelligent agent apprehends an object with a moral quality, the action of the moral quality on the agent’s intelligence is to cause the agent to experience a moral evaluation”?
  - What combination of specializations in AI, moral philosophy, and cognitive psychology would an academic need to have, to be an “expert” whose disagreements about the material causes and implementation of moral evaluations were significant?
- On the question of takeoff speeds, what fraction of the AI academics have a good enough intuitive understanding of decision theory to see that a point estimate or default scenario should not be substituted for a marginal posterior distribution, even in a situation where it would be socially costly in the default scenario to take actions which prevent large losses in one tail of the distribution?
  - What fraction recognized that they had a prior belief distribution over possible takeoff speeds at all?
  - What fraction understood that, regarding a variable which is underconstrained by evidence, “other people would disapprove of my belief distribution about this variable” is not an indicator for “my belief distribution about this variable puts mass in the wrong places”, except insofar as there is some causal reason to expect that disapproval would be somehow correlated with falsehood?
- What other popular concerns have academics historically needed to dismiss? What decision rules have they learned to decide whether they need to dismiss a current popular concern?
  - After they make a decision to dismiss a popular concern, what kinds of causal explanations of the existence of that concern do they make reference to, when arguing to other people that they should agree with the decision?
  - How much do the true decision rules depend on those causal explanations?
  - How much do the decision rules depend on intuitions about the concerned peoples’ states of mind and social category memberships?
  - How much do the causal explanations use concepts which are implicitly defined by reference to hidden intuitions about states of mind and social category memberships?
  - Can these intuitively defined concepts carry the full weight of the causal explanations they are used to support, or does their power to cause agreement come from their ability to activate social intuitions?
- Which people are the AI academics aware of, who have argued that intelligence explosion is a concern? What social categories do they intuit those people to be members of? What arguments are they aware of? What states of mind do they intuit those arguments to be indicators of (e.g. as in intuitively computed separating equilibria)?
  - What people and arguments did the AI academics think the other AI academics were thinking of? If only a few of the academics were thinking of people and arguments who they intuited to come from credible social categories and rational states of mind, would they have been able to communicate this to the others?
- When the AI academics made the decision to dismiss concern about an intelligence explosion, what kinds of causal explanations of the existence of that concern did they intuitively expect that they would be able make reference to, if they later had to argue to other people that they should agree with the decision?
It is also possible to model the social process in the panel:
- Are there factors that might make a joint statement by a panel of AI academics reflect different conclusions than they would have individually reached if they had been outsiders to the AI profession with the same AI expertise?
  - One salient consideration would be that agreeing with popular concern about an intelligence explosion would result in their funding being cut. What effects would this have had?
  - Would it have affected the order in which they became consciously aware of lines of argument that might make an intelligence explosion seem less or more deserving of concern?
  - Would it have made them associate concern about an intelligence explosion with unpopularity? In doubtful situations, unpopularity of an argument is one cue for its unjustifiability. Would they associate unpopularity with logical unjustifiability, and then lose willingness to support logically justifiable lines of argument that made an intelligence explosion seem deserving of concern, just as if they had felt those lines of argument to be logically unjustifiable, but without any actual unjustifiability?
  - There are social norms to justify taking prestige away from people who push a claim that an argument is justifiable while knowing that other prestigious people think the argument to to be a marker of a non-credible social category or state of mind. How would this have affected the discussion?
  - If there were panelists who personally thought the intelligence explosion argument was plausible, and they were in the minority, would the authors of the panel’s report mention it?
  - Would the authors know about it?
  - If the authors knew about it, would they feel any justification or need to mention those opinions in the report, given that the other panelists may have imposed on the authors an implicit social obligation to not write a report that would “unfairly” associate them with anything they think will cause them to lose prestige?
  - If panelists in such a minority knew that the report would not mention their opinions, would they feel any need or justification to object, given the existence of that same implicit social obligation?
- How good are groups of people at making judgments about arguments that unprecedented things will have grave consequences?
  - How common is a reflective, causal understanding of the intuitions people use when judging popular concerns and arguments about unprecedented things, of the sort that would be needed to compute conditional probabilities like “Pr( we would decide that concern is not justified | we made our decision according to intuition X ∧ concern was justified )”?
  - How common is the ability to communicate the epistemic implications of that understanding in real-time while a discussion is happening, to keep it from going wrong?

Steve_Rayhawk 8 Aug 2012 22:02 UTC
18 points
on: Friendly AI and the limits of computational epistemology
You invoke as granted the assumption that there’s anything besides your immediately present self (including your remembered past selves) that has qualia, but then you deny that some anticipatable things will have qualia. Presumably there are some philosophically informed epistemic-ish rules that you have been using, and implicitly endorsing, for the determination of whether any given stimuli you encounter were generated by something with qualia, and there are some other meta-philosophical epistemology-like rules that you are implicitly using and endorsing for determining whether the first set of rules was correct. Can you highlight any suitable past discussion you have given of the epistemology of the problem of other minds?

eta: I guess the discussions here, or here, sort of count, in that they explain how you could think what you do… except they’re about something more like priors than like likelihoods.

In retrospect, the rest of your position is like that too, based on sort of metaphysical arguments about what is even coherently postulable, though you treat the conclusions with a certainty I don’t see how to justify (e.g. one of your underlying concepts might not be fundamental the way you imagine). So, now that I see that, I guess my question was mostly just a passive-aggressive way to object to your argument procedure. The objectionable feature made more explicit is that the constraint you propose on the priors requires such a gerrymandered-seeming intermediate event—that consciousness-simulating processes which are not causally (and, therefore, in some sense physically) ‘atomic’ are not experienced, yet would still manage to generate the only kind of outward evidence about their experiencedness that anyone else could possibly experience without direct brain interactions or measurements—in order to make the likelihood of the (hypothetical) observations (of the outward evidence of experiencedness, and of the absence of that outward evidence anywhere else) within the gerrymandered event come out favorably.

Steve_Rayhawk 18 Jul 2009 13:44 UTC
18 points
in reply to: Alicorn’s comment on: Absolute denial for atheists
Related:

Envy Up and Contempt Down: Neural and Emotional Signatures of Social Hierarchies, presented by Susan T. Fiske, co-authors Mina Cikara and Ann Marie Russell, in the “Social Emotion and the Brain” session of the 2009 AAAS Meeting in Chicago (The Independent, Scientific American podcast, The Guardian, The Daily Princetonian, National Geographic, CNN, The Neurocritic)

The Independent:

The panel of 21 heterosexual male students were first rated in terms of their sexist attitudes to women, using answers to interview questions. Then they were placed in a brain scanner while viewing a set of images of women in bikinis, women in clothes and men in clothes. The scientists also used “sexualised” images, where the head of each semi-naked photograph was cut off so that only the torso was visible. . .

Scientific American:

. . . they had the men look at the photos while their brains were scanned and what she found was that, ”...this memory correlated with activation in part of the brain that is a pre-motor, having intentions to act on something, so it was as if they immediately thought about how they might act on these bodies.”

Fiske explained that the areas, the premotor cortex and posterior middle temporal gyrus, typically light up when one anticipates using tools, like a screwdriver. “I’m not saying that they literally think these photographs of women are photographs of tools per se, or photographs of non-humans, but what the brain imaging data allow us to do is to look at it as scientific metaphor. That is, they are reacting to these photographs as people react to objects.”

Fisk also tested the men for levels of sexism and found a surprising effect those who scored high on this test, ”...the hostile sexists were likely to deactivate the part of the brain that thinks about other people’s intentions. The lack of activation of this social cognition area is really odd, because it hardly ever happens. It’s a very reliable effect, that the medial prefrontal cortex comes online when people think about other people, see pictures of them, imagine other people.”

“Normally when you examine social cognition, people’s aim is to figure out what the other person is thinking and intending. And we see in these data really no evidence of that. So the deactivation of medial prefrontal cortex to these pictures is really kind of shocking.”

The Independent:

“The only other time we’ve observed the deactivation of this region is when people look at pictures of homeless people and drug addicts who they really don’t want to think about what’s in their minds because they are put off by them.”

Scientific American:

To be sure this is a preliminary study, and Fiske intends to follow up with a larger sample, but nonetheless she concludes, ”...these findings are all consistent with the idea that they are responding to these photographs as if they are responding to objects and not to people with independent agency.”

Dehumanizing the Lowest of the Low: Neuroimaging Responses to Extreme Out-Groups, by Lasana T. Harris and Susan T. Fiske:

Abstract -- . . . The SCM [Stereotype Content Model] predicts that only extreme out-groups, groups that are both stereotypically hostile and stereotypically incompetent (low warmth, low competence), such as addicts and the homeless, will be dehumanized. . . . Functional magnetic resonance imaging provided data for examining brain activations in 10 participants viewing 48 photographs of social groups and 12 participants viewing objects . . . Analyses revealed mPFC activation to all social groups except extreme (low-low) out-groups . . . No objects, though rated with the same emotions, activated the mPFC. This neural evidence supports the prediction that extreme out-groups may be perceived as less than human, or dehumanized. . . .

Accumulating data from social neuroscience establish that medial prefrontal cortex (mPFC) is activated when participants engage in distinctly social cognition² (Amodio & Frith, 2006; Ochsner, 2005). Prior functional magnetic resonance imaging (fMRI) data show the mPFC as differentially activated in social compared with nonsocial cognition. . . .

² We are not implying that the function of mPFC is solely social cognition. The evidence as to its exact functions is still being gathered. However, the literature indicates that mPFC activation reliably covaries with social cognition, that is, thinking about people, compared with thinking about objects.

Steve_Rayhawk 15 May 2011 19:09 UTC
17 points
on: Rationalists don’t care about the future
What information can be derived about utility functions from behavior?

(Here, “information about utility functions” may be understood in your policy-relevant sense, of “factors influencing the course of action that rational expected-utility maximization might surprisingly choose to force upon us after it was too late to decommit.”)

Suppose you observe that some agents, when they are investing, take into account projected market rates of return when trading off gains and losses at different points in time. Here are two hypotheses about the utility functions of those agents.

Hypothesis 1: These agents happened to already have a utility function whose temporal discounting was to match what the market rate of return would be. This is to say: The utility function already assigned particular intrinsic values to hypothetical events in which assets were gained or lost at different times. The ratios between these intrinsic values were already equal to what the appropriate exponential of the integrated market rate of return would later turn out to be.

Hypothesis 2: These agents have a utility function in which assets gained or lost in the near term are valued because of an intrinsic good which could be purchased with those assets at a point in the distant future. These agents evaluate near-term investments and payoffs happening at different times in terms of market rates of return, for understandable and purely instrumental reasons relating to opportunity cost.

Neither hypothesis is quite plausible psychologically or historically, but the second hypothesis is closer to being plausible, and each hypothesis makes the same predictive distribution about the agents’ near-term investment behaviors. This is to say that the “preference likelihood” ratio between the two hypotheses is flat.

(In your apparent policy terms, this would correspond roughly to the idea that, while rational expected-utility maximization may be trying to “choose” which of these two utility functions to define as normative, so that it can then “force” the courses of action dictated by the chosen utility function “upon” the agents, in this case the balance of factors affecting rational expected-utility maximization’s “choice” evens out. Therefore, rational expected-utility maximization’s “decision” will depend on its prior disposition to “prefer” one or the other utility function, for reasons unrelated to observation.)

Now, suppose that the agents from the second hypothesis forecast market rates of return for some period, and then create new agents. These new agents have recognizable internal data structures representing utility functions in a form as per the first hypothesis, and these data structures will be queried to determine the new agents’ decisions about near-term trades. However, the new agents’ only source of information about their utility functions comes from observing their own behavior: they do not have direct introspective access to their internal data structure, and they do not know about the asset conversion event in the future. (However, they will convert their holdings at that time, as a hard-coded instinct; in terms of revealed preference, this can be interpreted as having a utility function that assigns the purchased good infinite relative value). Now, which hypothesis should we say is “really” true of these new agents’ utility functions?

(And how do we delineate what the parts of this situation even are, that supposedly “have” the utility functions we want to inquire about?)

This is a general problem with our present framework for reasoning about utility. The predictions and recommendations from a hypothesized utility function are invariant under various transformations of the hypothesis; in particular, transformations that preserve relative intervals of expected utility between available actions at each juncture. For example, for a perfect expected-utility maximizer, the reward function constructed by a perfectly trained temporal-difference reinforcement learning system motivates exactly the same behavior as the reward function whose integrals the TD learner was trained to predict. (This is quite apart from the problem of invariance under transformations that stretch or squeeze probability and reward simultaneously, such as the transformations that relate different methods of anthropic reasoning.)

As if to add to the confusion, when humans are informed about utility theory, and asked to interpret their introspective information about their preferences in terms of utility, they will report different preferences as being “intrinsic” vs. “instrumental” at different points in time [citation: folk belief]. There may be a psychological process related to temporal-difference reinforcement learning which converts preferences which introspectively appear “instrumental” into preferences which introspectively appear “intrinsic”.

Why were you so certain, in your original draft, that exponential temporal discounting behavior was a matter of intrinsic value rather than instrumental value, so that a normative framework of utilitarian reasoning would force it upon us, and the alternative possibility was not worth mentioning?

Steve_Rayhawk 24 Nov 2014 7:55 UTC
16 points
on: Breaking the vicious cycle

there very likely exist misrepresentations. There are many reasons for this, but I can assure you that I never deliberately lied and that I never deliberately tried to misrepresent anyone. The main reason might be that I feel very easily overwhelmed

I think the thing to remember is that, when you’ve run into contexts where you feel like someone might not care that they’re setting you up to be judged unfairly, you’ve been too overwhelmed to keep track of whether or not your self-defense involves doing things that you’d normally be able to see would set them up to be judged unfairly.

You’ve been trying to defend a truth about a question—about what actions you could reasonably be expected to have been sure you should have taken, after having been exposed to existential-risk arguments -- that’s made up of many complex implicit emotional and social associations, like the sort of “is X vs. Y the side everyone should be on?” that Scott Alexander discusses in “Ethnic Tension and Meaningless Arguments”. But you’ve never really developed the necessary emotional perspective to fully realize that the only language you’ve had access to, to do that with, is a different language: that of explicit factual truths. If you try to state truths in one language using the other without accounting for the difference, blinded by pain and driven by the intuitive impulse escape the pain, you’re going to say false things. It only makes sense that you would have screwed up.

written in a tearing hurry, akin to a reflexive retraction from the painful stimulus

Try to progress to having a conscious awareness of your desperation, I mean a conscious understanding of how the desperation works and what it’s tied to emotionally. Once you’ve done that, you should be able to consciously keep in mind better the other ways that the idea of “justice” might also relate to your situation, and so do a lot less unjust damage. (Contrariwise, if you do choose to do damage, a significantly greater fraction of it will be just.)

It might also help to have a stronger deontological proscription against misrepresenting anyone in a way that would cause them to be judged unfairly. That proscription would put you under more pressure to develop this kind of emotional perspective and conscious awareness, although it would do this at the cost of adding extra deontological hoops you have to jump through to escape the pain when it comes. If this leaves you too bound-up to say anything, you can usually go meta and explain how you’re too bound-up, at least once you have enough practice at explaining things like that.

I’m sorry. I claim to have some idea what it’s like.

(Also, on reflection, I should admit that mostly I’m saying this because I’m afraid of third parties keeping mistakenly unfavorable impressions about your motives; so it’s slightly dishonest of me to word some of the above comments as simply directed to you, the way I have. And in the process I’ve converted an emotional truth, “I think it’s important for other people not to believe as-bad things about your motives, because I can see how that amount of badness is likely mistaken”, into a factual claim, “your better-looking motives are exactly X”.)

Steve_Rayhawk 7 Mar 2012 19:22 UTC
15 points
in reply to: wedrifid’s comment on: AI Risk and Opportunity: A Strategic Analysis
That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone’s remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of “something to protect”: “something to not be culpable for failure to protect”.

If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.

I don’t think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of “whatever I did, it was less indefensible than the alternative”.

(I feel like there has to be some kind of third alternative I’m missing here, that would derail the ongoing damage from this sort of desperate effort by him to compel someone or something to magically generate a way out for him. I think the underlying phenomenon is worth developing some insight into. Alex wouldn’t be the only person with some amount of this kind of psychology going on—just the most visible.)

Steve_Rayhawk 4 Mar 2012 11:44 UTC
15 points
in reply to: XiXiDu’s comment on: AI Risk and Opportunity: A Strategic Analysis
Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.

Do you think it would be possible to design an intelligence which could do this more reliably?

Steve_Rayhawk 22 May 2011 6:02 UTC
15 points
in reply to: PhilGoetz’s comment on: Rationalists don’t care about the future

This is a critical topic, but LessWrong hates it. Matthew 7:6 comes to mind.

For the record, I dispute your causal model of the audience’s response.

In particular, I dispute your model of the audience’s moral reasoning as to what is inevitably being approved of or disapproved of by expressions of approval or disapproval of your actions relating to the post.

I also dispute your model of the audience’s factual and moral reasoning about the gravity of the problem you suggest. I dispute specifically your model of the audience’s process of choosing to suppose that non-exponential weighting functions could be considered sufficiently indicative of potential solutions as to justify relative unconcern. (This is because I dispute your model of the utility function structures initially familiar to the audience. As part of this, I dispute your model of their descriptions of discounting functions, according to which it apparently would be impossible for them to intend to refer to a function which was to be applied on a prespecified absolute timescale, without being translated to start at an agent’s present time. If that was not your model, then I dispute your confusing apparent claim that such functions, if non-exponential, must be dynamically inconsistent.)

I am concerned that the errors in your model of the audience, if left unchallenged, will only serve to reinforce in you the apparent resentful, passive-aggressive self-righteousness which would have largely been itself the cause of the misinterpretations which led to those errors originally. This self-reinforcing effect might create needless mutual epistemic alienation.

Steve_Rayhawk 19 May 2011 7:04 UTC
15 points
on: Rationalists don’t care about the future
You keep constructing scenarios whose intent, as far as I can tell, is to let you argue that in those scenarios any currently imaginable non-human system would be incapable of choosing a correct or defensible course of action. By comparison, however, you must also be arguing that some human system in each of those scenarios would be capable of choosing a correct or defensible course of action. How?

And: Suppose you knew that someone was trying to understand the answer to this question, and create the field of “Artificial Ability to Choose Correct and Defensible Courses of Action The Way Humans Apparently Can”. What kinds of descriptions do you think they might give of the engineering problem at the center of their field of study, of their criteria for distinguishing between good and bad ways of thinking about the problem, and of their level of commitment to any given way in which they’ve been trying to think about the problem? Do those descriptions differ from Eliezer’s descriptions regarding “Friendly AI” or “CEV”?

You seem to be frustrated about some argument(s) and conclusions that you think should be obvious to other people. The above is an explanation of how some conclusions that seem obvious to you could seem not obvious to me. Is this explanation compatible with your initial model of my awareness of arguments’ obviousnesses?