Thoth Hermes
On Yudkowsky and being wrong:
I’m going to be careful about reading in to his words too much, and assuming he said something that I disagree with.
But I have noticed and do notice a tendency towards pessimism and pessimists in general to prefer beliefs that skew towards “wrongness” and “incorrectness” and “mistake-making” that tends to be borderline-superstitious. The superstitious-ness I refer to regards the tendency to give errors higher-status than they deserve, e.g., by predicting things to go wrong, in order for them to be less likely to go wrong, or as badly as they could otherwise go.
Rather than predicting that things could go “badly”, “wrongly”, or “disastrously”, it seems much healthier to instead see things as iterations in which each subsequent attempt improves upon the last attempt. For example, building rockets, knowing that first iterations are more likely than later ones to explode, and placing sensors in many places inside the rocket that transmit data back to the HQ so that failures in specific components are detected immediately before an explosion. If the rockets explode far fewer times than predicted, and lead to a design that doesn’t explode at all, you wouldn’t call any point of the process “incorrect”, even at the points at which the rocket did explode. The process was correct.
If you’re building a rocket for the first time ever, and you’re wrong about something, it’s not surprising if you’re wrong about something. It’s surprising if the thing that you’re wrong about causes the rocket to go twice as high, on half the fuel you thought was required and be much easier to steer than you were afraid of.
This may mean that in general, it is more often the case that when we’re wrong about something, that we predicted something to go well, and it didn’t, rather than the reverse. Because I disagree with that sentiment, I allow myself to be wrong here. (Note that this would be the reverse-case, however, if so.)
I don’t see how it in general would help to predict things to be difficult or hard to do, to make such things easier or less hard to do. That would only steer your mental processes towards solutions that look harder than ones that look easier, since the latter we’d have predicted not to lead anywhere useful. If we apply that frame everywhere, then we’re going to be using solutions that feel difficult to use on a lot more problems than we would otherwise, thereby not making things easier for us.
I can’t find the source right now, but I remember reading that Bjarne Stroustrup avoids using any thrown exceptions in his C++ code, but the author of the post that mentioned this said that this was only because he wrote extremely-high-reliability code used in flight avionics for Boeings or something like that. I remember thinking: Well, obviously flight avionics code can’t throw any temper-tantrums a-la quitting-on-errors. But why doesn’t this apply everywhere? The author argued that most software use-cases called for exceptions to be thrown, because it was better for software to be skittish, cautious and not make any hefty assumptions lest it make the customer angry. But it seems odd that “cautiousness” of this nature is not called for in the environment in which your code shutting-off in edge-cases or other odd scenarios would cause the plane’s engines to shut down.
Thrown exceptions represent pessimism, because they involve the code choosing to terminate rather than deal with whatever would happen if it were to continue using whatever state it had considered anomalous or out-of-distribution. The point is, if pessimism is meant to represent cautiousness, it clearly isn’t functioning as intended.
Deception is only a useful strategy for someone who is a) under surveillance and b) subject to constant pressures to do things differently than one would otherwise do them.
To make the AI be non-deceptive, you have three options: (a) make this fact be false; (b) make the AI fail to notice this truth; (c) prevent the AI from taking advantage of this truth.
B and C in the quote become my A and B when “this truth” in the quote are swapped for “any truth.” You can make this fact be false by not doing your B and C or my A and B.
whereas truths are all tangled together.
I think it would be worthwhile to point out that if truths are all tangled together, then your truths and its truths ought to be tangled together, too. The only situation where that wouldn’t be the case is an adversarial situation. But in your B and C cases, this is an adversarial situation, albeit one you brought upon it rather than it upon you.
But even in an adversarial situation, it should still be the case that truths are all tangled together. Therefore, there shouldn’t really be any facts you wouldn’t want it to know about—unless there is another fact that you know that causes you to not want it to discover that fact or a different fact.
If so, then what would happen if it were to discover that fact as well, in addition to the one you didn’t want it to know?
I don’t buy that truthful things should be, in general, difficult to distinguish from untruthful things. I’m not even sure of what that would mean, exactly, for truth-seeking to just be “difficult.”
We could ask whether we would expect that true claims would “sound better” to the one reading / hearing them than false claims. This would have important implications: For example, if they do sound better, then “persuasion” isn’t something anyone need to worry about, unless they were intentionally trying to persuade someone of something that was both false and sounded bad, which would be the case by assumption, here.
The idea that truth-seeking is inherently difficult is an idea that sounds bad. Thus, for me to believe it would require me to believe that bad-sounding things could be true and good-sounding things could be false. How often would this mismatch happen? There is no way a priori to tell how often we would expect this, and that in itself is a bad-sounding thing.
An individual who investigates stuff, but isn’t popular, has nowhere they can put their findings and expect others to find them. Sure, you can put up a blog or a Twitter thread, but that hardly means anyone will look at it.
I even more don’t buy the idea that false things monetize better than true things. But this is a complaint I sometimes hear, and I can’t help but sneer at it a bit. It’s one thing to think that false things and true things compete on an even playing-field, but it’s a wholly different thing to think that people are inherently hardwired to find false things more palatable and therefore spend more time looking for it / paying for it.
It sounds very similar to the arguments for fighting misinformation on social media platforms: Mainly, that it tends to spread more easily than “true but boring / unpleasant” things. During COVID-19, for example, the people that thought we ought to stem the spread of misinformation also typically believed that COVID-19 was more dangerous than the opposite group.
This seems like a very important crux, then, at least: The dichotomy between good-seeming / bad-seeming and true / false. I agree that we should get to the bottom of it.
I don’t understand why you say “should be difficult to distinguish” rather than “are difficult”, why you seem to think finding the truth isn’t difficult, or what you think truthseeking consists of.
Because it feels like it’s a choice whether or not I want to consider truth-seeking to be difficult. You are trying to convince me that I should consider it difficult, so that means I have the option to or not. If it simply is difficult, you don’t need to try and convince me of that, it would be obvious on it’s own.
In addition to that, “should be” means that I think something ought to be a certain way. It certainly would be better if truth-seeking weren’t difficult, wouldn’t you agree?
I didn’t say “false things monetize better than true things”. I would say that technically correct and broadly fair debunkings (or technically correct and broadly fair publications devoted to countering false narratives) don’t monetize well, certainly not to the tune of millions of dollars annually for a single pundit. Provide counterexamples if you have them.
So you’re not saying that false things monetize better than true things, you’re saying that things which correctly state that other things are false monetize worse than that the things that they claim are false. I don’t think I misunderstood you here, but I may have interpreted your meaning more broadly than it was intended.
I would think that how well something monetizes depends on how much people want to hear it. So yes, that would mean that it depends on how good something sounds. Our disagreement is on whether or not how good something sounds has any relation whatsoever to how true it is.
But true claims don’t inherently “sound better”
To be clear, I’m saying that they do, and that this means that truth-seeking isn’t that difficult, and it is counterproductive to believe that it is difficult.
We should be able to mutually agree on what sounds better. For example, “vaccines work” probably sounds better to us both. People say things that don’t sound good all the time, just because they say it doesn’t mean they also think it sounds good.
Things like “we should be able to figure out the truth as it is relevant to our situation with the capabilities we have” have to sound good to everyone, I would think. That means there’s basis for alignment, here.
Excellent point. In one frame, pessimism applied to timelines makes them look further away than they actually turn out to be. In another frame, pessimism applied to doom makes it seem closer / more probable, but it uses the anti-pessimism frame applied to timelines—“AGI will happen much sooner than we think”.
I get the sense reading some LessWrong comments that there is a divide between “alignment-is-easy”-ers and “alignment-is-hard”-ers. I also get the sense that Yudkowsky’s p(doom) has increased over the years, to where it is now. Isn’t it somewhat strange that we should be getting two groups whose probability of p(doom) is moving away from the center?
Suppose you made your dataset larger and larger. Once it got “really large” let’s say, would you feel confident that your AI model will have learned enough such that even if its dataset contained nuke-building instructions, it would remain safe to use even in the hands of a bad actor?
The good news is that I expect AI development to be de facto open if not de jure open for the following reason:
AI labs still need to publish enough at at least a high-level summary or abstraction level to succeed in the marketplace and politically.
OpenAI (et al.) could try and force as much about the actual functioning, work-performing details of the engineering design of their models into low-level implementation details that remain closed-source, with the intent to base their design on principles that make such a strategy more feasible. But I believe that this will not work.
This has to do with more fundamental reasons on how successful AI models have to actually be structured, such that even their high-level, abstract summaries of how they work must reliably map to the reasons that the model performs well (this is akin to the Natural Abstraction hypothesis).
Therefore, advanced AI models could in principle be feasibly reverse-engineered or re-developed simply from the implementation details that are published.
This piece caught my eye since it is still being discussed a bit—I also don’t think anything is too old to talk about.
I think it is largely incorrect—and I don’t typically say that things are incorrect, if they seem like good-faith efforts. This doesn’t seem like a good-faith effort. I’ll explain why I can tell it’s not, and also, why we can still know it’s wrong anyway, without judging the intent of the author.
For one thing, I think when people name names, and use them as negative examples, then these are put-downs, which are in general, not true.
Throughout the post, Aella uses Frame Control in the context of abusers and manipulative people, which is also all in the negative, even when it’s not naming names. However, “Frame Control” itself is not defined in the negative, and is explicitly described in the neutral sense. This is why I think the basis of the post is incorrect. I often see cases like this used intentionally to be able to use it as a pretext for labeling of other (non-malign) behavior as evidence of treachery or simply to make people anxious—this entire post reads a lot like things you’ll see in other media about things like “mansplaining” or “sea-lioning” or other similar things.
You can’t actually construct a useful framework built around something you define in the negative, which this is.
“Frames” are defined as the context of the conversation and all of its assumptions, which is said could be good, but is often used by someone else to manipulate you in deceptive ways. This could only be as pernicious as it is argued it is if it were actually inherently easy to fool people, which it is not. Furthermore, it has its own assumptions which I believe we can dismiss as untrue. (This is really merely the idea that false information can somehow flow into the conversation unnoticed, be believed in, and cause large changes to the belief-structure of the conversants before they have time to notice and-or update). I don’t find that idea particularly compelling, and that gives me confidence that I don’t think anyone really needs to worry about Frame Control.
Here is also a quote that I find fairly easy to dismiss off-hand:
A related strategy is pushing the painful update button. I’m sure you’ve had experiences where you learned and grew, and it was really painful to do so. You had to face some hard truths, let go of how you saw yourself, and maybe even do a bit of surrendering your ego. This is legitimately good!
No, actually. I never have. I’ve never had a painful update. This is one of those things that I have never believed in—the so-called “hard truths”—and find kind of absurd, frankly. This is also one of those things that if you say to me, “You’re lying! You definitely have had a painful update!” I can laugh at that, and say “Actually I would know better than you about my own life, and also I think you’re lying.” This is why I think that this post was not written in good-faith. I don’t think that someone could honestly say the above.
I feel like a lot of this advice is telling me to do what I was going to do anyway. Which is why I wonder if you’re actually telling me not to do what I was going to do anyway, because it makes sense for advice posts to normally be about telling people to do something other than what they were normally going to do:
I don’t see ways to really help in a sizable way right now. I’m keeping my eyes open, and I’m churning through a giant backlog of things that might help a nonzero amount—but I think it’s important not to confuse this with taking meaningful bites out of a core problem the world is facing, and I won’t pretend to be doing the latter when I don’t see how to.
Since most of this post is interspersed with things like this, which say that the problem seems to you to be intractable, and that therefore most avenues of research will end up being dead-ends, it seems like you’re also advising people not to worry about things too much, just let the experts handle it, who also say that the problem is intractable and too hard.
If you’re saying “look at where everyone else has dropped the ball”, and I notice that everyone else has dropped the ball, on purpose, because they think the problem is intractable, then, I have to disagree strongly with that, if I am to follow your advice.
It’s just odd that you would say this and have it very openly apply to everything you’re saying as well.
This comment and your first one come-off as quite catty. E.g.,
I like that you’re writing about something early-stage! Particularly given that it seems interesting and important. But I will wish you would do it in a way that telegraphs the early-stage-ness and lends momentum toward having readers join you as fellow scientists/philosophers/naturalists who are squinting at the phenomena together. There are a lot of kinds of sentences that can invite investigation.
(Emphasis mine).
Your criticisms are mostly in the downward-direction, meaning, they don’t point out how to make what you’re criticizing better. Furthermore, they tend to ambiguate saying that the post could be improved (implying that we can make use out of what is being proposed) and saying the opposite:
I think the phenomena you’re investigating are interesting and important, but that the framework you present for thinking about them is early-stage. I don’t think these concepts yet “cleave nature at its joints.”
It’s hard to tell if you are being condescending towards the whole thing—implying that she should give up the whole endeavor, or if it would be more useful with more polish. However, I will point out that even saying “this would be good if it were more polished” doesn’t add much value to be said even if it were to be taken at face-value.
If it’s good, it should be useful even before it becomes more polished. If it’s bad, we should say why.
(I am a student of the particular school of philosophy which states that things can be useful to use or believe in even before they have been socially-agreed-upon to become high-status incumbent members of the orthodox school-of-thought).
Something wonderful happens that isn’t well-described by any option listed.
Has been in the lead for some time now. The other options tend to describe something going well with AI alignment itself; Could it be that this option [the quoted] refers to a scenario in which the alignment problem is rendered irrelevant?
I think this comment isn’t rigorous enough for Noosphere89 to retract his comment this one responds to, but that’s up to him.
Claims of the form “Yudkowsky was wrong about things like mind-design space, the architecture of neural networks (specifically how he thought making large generalizations about the structure of the human brain wouldn’t work for designing neural architectures), and in general, probably his tendency to assume that certain abstractions just don’t apply whenever intelligence or capability is scaled way up.” I think have been argued well enough by now that they have at least some merit to them.
The claim about AI boxing I’m not sure about, but my understanding is that it’s currently being debated (somewhat hotly). [Fill in the necessary details where this comment leads a void, but I think this is mainly about GPT-4′s API and it being embedded into apps where it can execute code on its own and things like that.]
[Question] Why do the Sequences say that “Löb’s Theorem shows that a mathematical system cannot assert its own soundness without becoming inconsistent.”?
I don’t think that PA being able to prove that you cannot prove falsehood means that you can prove falsehood from the theorem. If you look at my response to quetzal_rainbow’s answer, a simple substitution of false for X returns “it is provable that (it is not provable that False) implies (it is not provable that False).”
So from quetzal_rainbow’s answer, I don’t have an assumption of self-trust, I only have the substitution of false. I believe making this substitution is fine, but that we prove that “it is not provable that False” from this. Also fine.
From ZT5′s answer, he asserts that self-trust is equivalent to the additional assertion “For all T, it is provable that T implies T.” I don’t have a problem with that assertion, only with the claim that this means that both T and not T are provable (it says that if T is provable, then T. But if T is provable, that doesn’t mean that not T is provable).
From both answers, and yours, I gleam that it is either believed that Löb’s Theorem by itself is inconsistent, or that it with the additional assumption of “self trust” (not yet formalized to my personal satisfaction yet) is inconsistent.
Depending on the answer, we apparently are not sure if self trust is an additional assumption on top of the theorem or one that is contained in the theorem itself as a conditional.
It might be that the formalization that Lob’s Theorem is based in isn’t powerful enough to deal with this problem yet.
However, it’s got to be powerful enough to deal with the self-trust issue, since on some level, it is about the self-trust issue.
Lob’s Theorem is ostensibly about PA or a system at least as powerful as PA. So it itself must be working within such a system.
If that system isn’t capable of self-trust, then we can’t trust Lob’s Theorem. What I believe you’re arguing and that the Sequences are arguing essentially amounts to that. What I’m not satisfied with is the Sequences’ level of clarity at articulating whether or not Lob’s Theorem (and by extension PA and systems at least as powerful as it) are adequate enough to be trusted such that we can use it to formalize self-trust in general. The Sequences are quite ambiguous about this IMO—they don’t even state the problem as if this issue were indeterminate—they say only that Lob’s Theorem itself means that we can’t formalize self-trust. That amounts to essentially saying that “We trust Lob’s Theorem completely, and PA, etc., which states that we can’t trust Lob’s Theorem completely nor PA, etc.”
IMO—from my own analysis of the problem, which I choose to trust for deep and very well-thought out reasons [heh] - Lob’s Theorem is compelling enough to trust, which says we can trust it and PA-or-greater systems enough to believe in things that it proves. This is the common-sense, intuitive way of interpreting what it means, which I note that importantly means the same thing as having self-trust and trust in the system you’re using.
It seems to be historically the case that “doomers” or “near-doomers” (public figures who espouse pessimistic views of the future, often with calls for collective drastic actions) do not always come out with a positive public perception when doom or near-doom is perceived not to occurred, or to have occurred far away from what was predicted.
Doomers seem to have a trajectory rather than a distribution, per se. From my perspective, this is on-trajectory. He believed doom was possible, now he believes it is probable.
I’m not sure how long it will be until we get past the “doom didn’t happen” point. Assuming he exists in the future, Eliezer_future lives in the world in which he was wrong. It’s not obvious to me that Eliezer_future exists with more probability the more Eliezer_current believes Eliezer_future doesn’t exist.
I can’t imagine such a proposal working well in the United States. I can imagine some countries e.g. China potentially being on board with proposals like these. Because the United Nations is a body chiefly concerned with enforcing international treaties, I imagine it would be incentivized to support arguments in favor of increasing its own scope and powers. I do predict that AI will be an issue it will eventually decide to weigh in on and possibly act on in a significant way.
However, that creates a kind of bi-polar geopolitical scenario for the remainder of this century, approximately speaking. The United States is already on adversarial terms with China and has incentives against participating in treaties that seem to benefit less-developed competing nations over itself. If China and other U.N.-aligned countries are more doggedly insistent on antagonizing the US for being willing to keep developing its technology no matter what (especially in secret) and that it can successfully get away with it, then you have forces that polarize the world into camps: On the one hand, the camp that advocates technology slowdown and surveillance—and associated with countries that already do this—and on the other hand, the camp that supports more liberal ideals and freedom (which will tend to be anti-doomers), and are likewise associated with countries that have governments supportive of such ideals.
Politically, the doomer-camp (FHI and FLI et. al.) will begin to be courted by more authoritarian governments, presenting an awkward situation.
Is it possible to accurately judge how profound an idea “actually is” from merely how profound it sounds? Assuming that these two things are disjoint, in general.
Then besides that, if those two things are indeed disjoint, are you proposing that we should prefer more skepticism towards ideas that actually are profound or that sound profound? (I imagine that you probably mean the latter, but from your writing, you seem to be using the word to mean both).