After hearing the idea, I believe that it is not at all dangerous. However, I think the general strategy of being more cautious than you think you have to be whenever you think you have a dangerous idea is a good one. If shminux’s comment made you feel any negative emotions associated with being too cautious, I would like to cancel those out by applauding your choice to err on the side of caution.
Scott Garrabrant
- 25 Sep 2014 21:18 UTC; 1 point) 's comment on I may have just had a dangerous thought. by (
It seems to me like MIRI hiring, especially researchers in 2015-2017, but also in general, reliably produced hires with a certain philosophical stance (i.e. people who like UDASSA, TDT, etc.) and people with a certain kind of mathematical taste (i.e. people who like reflective oracles, Lob, haskell, etc.).
I think that it selects pretty strongly for the above properties, and doesn’t have much room for “little mental resistance to organizational narratives” (beyond any natural correlations).I think there is also some selection on trustworthiness (e.g. following through with commitments) that is not as strong as the above selection, and that trustworthiness is correlated with altruism (and the above philosophical stance).
I think that altruism, ambition, timelines, agreement about the strategic landscape, agreement about probability of doom, little mental resistance to organizational narratives, etc. are/were basically rounding errors compared to selection on philosophical competence, and thus, by proxy, philosophical agreement (specifically a kind of philosophical agreement that things like agreement about timelines is not a good proxy for).
(Later on, there was probably more selection on opinions about information security, but I don’t think that applies much to you being hired.)
(Perhaps there is a large selection that is happening in the “applying for a job” side of the hiring pipeline. I don’t feel like I can rule that out.)
(I will not be offended by a comment predicting that I believe this largely because of “little mental resistance to organizational narratives”, even if the comment has no further justification.)
(I would also guess that I am somewhere in the bottom quartile of “mental resistance to organizational narratives” among MIRI employees.)
I didn’t really read much of the post, but I think you are rejecting weighting people by simplicity unfairly here.
Imagine you flip a fair coin until it comes up tails, and either A) you suffer if you flip >100 times, or B) you suffer if you flip <100 times. I think you should prefer action A.However if you think about there as being a countable collection of possible outcomes, one for each possible number of flips, you are are creating “infinite” suffering rather than “finite” suffering, so you should prefer B.
I think the above argument for B is wrong and similar to the argument you are giving.
Note that the choice of where we draw the boundary between outcomes mattered, and similarly the choice of where we draw the boundary between people in your reasoning matters. You need to make choices about what counts as different people vs same people for this reasoning to even make sense, and even if it does make sense, you are still not taking seriously the proposal that we care about the total simplicity of good/bad experience rather than the total count of good/bad experience.
Indeed, I think the lesson of the whole infinite ethics thing is mostly just grappling with we don’t understand how to talk about total count in the infinite case. But I don’t see the argument for wanting to talk about count in the first place. It feels like a property of where you are drawing the boundaries, rather than what is actually there. In the simple cases, we can just draw boundaries between people and declare that our measure is the uniform measure on this finite set, but then once we declare that to be our measure, we interact with it as a measure.
If I roll a 20 sided die until I roll a 1, the expected number of times I will need to roll the die is 20. Also, according to my current expectations, immediately before I roll the 1, I expect myself to expect to have to roll 20 more times. My future self will say it will take 20 more times in expectation, when in fact it will only take 1 more time. I can predict this in advance, but I can’t do anything about it.
I think everyone should spend enough time thinking about this to see why there is nothing wrong with this picture. This is what uncertainity looks like, and it had to be this way.
I want to disagree about MIRI.
Mostly, I think that MIRI (or at least a significant subset of MIRI) has always been primarily directed at agenty systems in general.
I want to separate agent foundations at MIRI into three eras. The Eliezer Era (2001-2013), the Benya Era (2014-2016), and the Scott Era(2017-).The transitions between eras had an almost complete overhaul of the people involved. In spite of this, I believe that they have roughly all been directed at the same thing, and that John is directed at the same thing.
The proposed mechanism behind the similarity is not transfer, but instead because agency in general is a convergent/natural topic.
I think throughout time, there has always been a bias in the pipeline from ideas to papers towards being more about AI. I think this bias has gotten smaller over time, as the agent foundations research program both started having stable funding, and started carrying less and less of the weight of all of AI alignment on its back. (Before going through editing with Rob, I believe Embedded Agency had no mention of AI at all.)
I believe that John thinks that the Embedded Agency document is especially close to his agenda, so I will start with that. (I also think that both John and I currently have more focus on abstraction than what is in the Embedded Agency document).
Embedded Agency, more so than anything else I have done was generated using an IRL shaped research methodology. I started by taking the stuff that MIRI has already been working on, mostly the artifacts of the Benya Era, and trying to communicate the central justification that would cause one to be interested in these topics. I think that I did not invent a pattern, but instead described a preexisting pattern that originally generated the thoughts.
This is consistent with having the pattern be about agency in general, and so I could find the pattern in ideas that were generated based on agency in AI, but I think this is not the case. I think the use of proof based systems is demonstrating an extreme disregard for the substrate that the agency is made of. I claim that the reason that there was a historic focus on proof-based agents, is because it is a system that we could actually say stuff about. The fact that real life agents looked very different of the surface from proof based agents was a shortfall that most people would use to completely reject the system, but MIRI would work in it because what they really cared about was agency in general, and having another system that is easy to say things about that could be used to triangulate agency in general. If MIRI was directed at a specific type of agency, they would have rejected the proof based systems as being too different.
I think that MIRI is often misrepresented as believing in GOFAI because people look at the proof based systems and think that MIRI would only study those if they thought that is what AI might look at. I think in fact, the reason for the proof based systems is because at the time, this was the most fruitful models we had, and we were just very willing to use any lens that worked when trying to look at something very very general.
(One counterpoint here, is maybe MIRI didn’t care about the substrate the agency was running on, but did have a bias towards singleton-like agency, rather than very distributed systems, I think this is slightly true. Today, I think that you need to understand the distributed systems, because realistic singleton-like agents follow many of the same rules, but it is possible that early MIRI did not believe this as much)
Most of the above was generated by looking at the Benya Era, and trying to justify that it was directed at agency in general at least/almost as much as the Scott Era, which seems like the hardest of three for me.
For the Scott Era, I have introspection. I sometimes stop thinking in general, and focus on AI. This is usually a bad idea, and doesn’t generate as much fruit, and it is usually not what I do.
For the Eliezer Era, just look at the sequences.
I just looked up and reread, and tried to steel man what you originally wrote. My best steel man is that you are saying that MIRI is trying the develop a prescriptive understanding of agency, and you are trying the develop a descriptive understanding of agency. There might be something to this, but it is really complicated. One way to define agency is as the pipeline from the prescriptive to the descriptive, so I am not sure that prescriptive and descriptive agency makes sense as a distinction.As for the research methodology, I think that we all have pretty different research methodologies. I do not think Benya and Eliezer and I have especially more in common with each other than we do with John, but I might be wrong here. I also don’t think Sam and Abram and Tsvi and I have especially more in common in terms of research methodologies, except in so far as we have been practicing working together.
In fact, the thing that might be going on here is that the distinctions in topics is coming from differences in research skills. Maybe proof based systems are the most fruitful model if you are a Benya, but not if you are a Scott or a John. But this is about what is easiest for you to think about, not about a difference in the shared convergent subgoal of understanding agency in general.- 19 Dec 2021 4:20 UTC; 13 points) 's comment on The Plan by (
Done. I accidentally hit enter when I had everything done except for the digit question, so It submitted my entry and I was not able to answer that question. :(
This comment is not only about this post, but is also a response to Scott’s model of Duncan’s beliefs about how epistemic communities work, and a couple of Duncan’s recent Facebook posts. It is also is a mostly unedited rant. Sorry.
I grant that overconfidence is in a similar reference class as saying false things. (I think there is still a distinction worth making, similar to the difference between lying directly and trying to mislead by saying true things, but I am not really talking about that distinction here.)
I think society needs to be robust to people saying false things, and thus have mechanisms that prevent those false things from becoming widely believed. I think that as little as possible of that responsibility should be placed on the person saying the false things, in order to make it more strategy-proof. (I think that it is also useful for the speaker to help by trying not to say false things, but I am more putting the responsibility in the listener)
I think there should be pockets of society, (e.g. collections of people, specific contexts or events) that can collect true beliefs and reliably significantly decrease the extent to which they put trust in the claims of people who say false things. Call such contexts “rigorous.”
I think that it is important that people look to the output these rigorous contexts when e.g. deciding on COVID policy.
I think it is extremely important that the rigorous pockets of society is not “everyone in all contexts.”
I think that that society is very much lacking reliable rigorous pockets.
I have this model where in a healthy society, there can be contexts where people generate all sorts of false beliefs, but also sometimes generate gold (e.g. new ontologies that can vastly improve the collective map). If this context is generating a sufficient supply of gold, you DO NOT go in and punish their false beliefs. Instead, you quarantine them. You put up a bunch of signs that point to them and say e.g. “80% boring true beliefs 19% crap 1% gold,” then you have your rigorous pockets watch them, and try to learn how to efficiently distinguish between the gold and the crap, and maybe see if they can generate the gold without the crap. However sometimes they will fail and will just have to keep digging through the crap to find the gold.
One might look at lesswrong, and say “We are trying to be rigorous here. Let’s push stronger on the gradient of throwing out all the crap.” I can see that. I want to be able to say that. I look at the world, and I see all the crap, and I want there to be a good pocket that can be about “true=good”, “false=bad”, and there isn’t one. Science can’t do it, and maybe lesswrong can.
Unfortunately, I also look at the world and see a bunch of boring processes that are never going to find gold, Science can’t do it, and maybe lesswrong can.
And, maybe there is no tradeoff here. Maybe it can do both. Maybe at our current level of skills, we find more gold in the long run by being better and throwing out the crap.
I don’t know what I believe about how much tradeoff there is. I am writing this, and I am not trying to evaluate the claims. I am imagining inhabiting the world where there is a huge trade off. Imagining the world where lesswrong is the closest thing we have to being able to have a rigorous pocket of society, but we have to compromise, because we need a generative pocket of society even more. I am overconfidently imagining lesswrong as better than it is at both tasks, so that the tradeoff feels more real, and I am imagining the world failing to pick up the slack of whichever one it lets slide. I am crying a little bit.
And I am afraid. I am afraid of being the person who overconfidently says “We need less rigor,” and sends everyone down the wrong path. I am also afraid of the person who overconfidently says “We need less rigor,” and gets flagged as a person who says false things. I am not afraid of saying “We need more rigor.” The fact that I am not afraid of saying “We need more rigor” scares me. I think it makes me feel that if I look to closely, I will conclude that “We need more rigor” is true. Specifically, I am afraid of concluding that and being wrong.
In my own head, I have a part of me that is inhabiting the world where there is a large tradeoff, and we need less rigor. I have another part that is trying to believe true things. The second part is making space for the first part, and letting it be as overconfident as it wants. But it is also quarantining the first part. It is not making the claim that we need more space and less rigor. This quarantine action has two positive effects. It helps the second part have good beliefs, but it also protects the first part from having to engage with hammer of truth until it has grown.
I conjecture that to the extent that I am good at generating ideas, it is partially because I quarantine, but do not squash, my crazy ideas. (Where ignoring the crazy ideas counts as squashing them) I conjecture further that in ideal society needs to do similar motions at the group level, not just the individual level. I said at the beginning that you need to put the responsibility for distinguishing in the listener for strategyproofness. This was not the complete story. I conjecture that you need to put the responsibility in the hand of the listener, because you need to have generators that are not worried about accidentally having false/overconfident beliefs. You are not supposed to put policy decisions in the hands of the people/contexts that are not worried about having false beliefs, but you are supposed to keep giving them attention, as long as they keep occasionally generating gold.
Personal Note: If you have the attention for it, I ask that anyone who sometimes listens to me keeps (at least) two separate buckets: one for “Does Scott sometimes say false things?” and one for “Does Scott sometimes generate good ideas?”, and decide whether to give me attention based on these two separate scores. If you don’t have the attention for that, I’d rather you just keep the second bucket, I concede the first bucket (for now), and think my comparative advantage is the be judged according the the second one, and never be trusted as epistemically sound. (I don’t think I am horrible at being epistemically sound, at least in some domains, but if I only get a one dimensional score, I’d rather relinquish the right to be epistemically trusted, in order to absolve myself of the responsibility to not share false beliefs, so my generative parts can share more freely.)
- 5 Nov 2021 18:25 UTC; 4 points) 's comment on Disagreeables and Assessors: Two Intellectual Archetypes by (EA Forum;
This is not a complete answer, but it is part of my picture:
(It is the part of the picture that I can give while being only descriptive, and not prescriptive. For epistemic hygiene reasons, I want avoid discussions of how much of different approaches we need in contexts (like this one) that would make me feel like I was justifying my research in a way that people might interpret as an official statement from the agent foundations team lead.)
I think that Embedded Agency is basically a refactoring of Agent Foundations in a way that gives one central curiosity based goalpost, rather than making it look like a bunch of independent problems. It is mostly all the same problems, but it was previously packaged as “Here are a bunch of things we wish we understood about aligning AI,” and in repackaged as “Here is a central mystery of the universe, and here are a bunch things we don’t understand about it.” It is not a coincidence that they are the same problems, since they were generated in the first place by people paying close to what mysteries of the universe related to AI we haven’t solved yet.
I think of Agent Foundations research has having a different type signature than most other AI Alignment research, in a way that looks kind of like Agent Foundations:other AI alignment::science:engineering. I think of AF as more forward-chaining and other stuff as more backward-chaining. This may seem backwards if you think about AF as reasoning about superintelligent agents, and other research programs as thinking about modern ML systems, but I think it is true. We are trying to build up a mountain of understanding, until we collect enough that the problem seems easier. Others are trying to make direct plans on what we need to do, see what is wrong with those plans, and try to fix the problems. Some consequences of this is that AF work is more likely to be helpful given long timelines, partially because AF is trying to be the start of a long journey of figuring things out, but also because AF is more likely to be robust to huge shifts in the field.
I actually like to draw an analogy with this: (taken from this post by Evan Hubinger)
I was talking with Scott Garrabrant late one night recently and he gave me the following problem: how do you get a fixed number of DFA-based robots to traverse an arbitrary maze (if the robots can locally communicate with each other)? My approach to this problem was to come up with and then try to falsify various possible solutions. I started with a hypothesis, threw it against counterexamples, fixed it to resolve the counterexamples, and iterated. If I could find a hypothesis which I could prove was unfalsifiable, then I’d be done.
When Scott noticed I was using this approach, he remarked on how different it was than what he was used to when doing math. Scott’s approach, instead, was to just start proving all of the things he could about the system until he managed to prove that he had a solution. Thus, while I was working backwards by coming up with possible solutions, Scott was working forwards by expanding the scope of what he knew until he found the solution.
(I don’t think it quite communicates my approach correctly, but I don’t know how to do better.)
A consequence of the type signature of Agent Foundations is that my answer to “What are the other major chunks of the larger problem?” is “That is what I am trying to figure out.”
I think the comments here point out just how much we do not have common knowledge about this thing that we are pretending we have common knowledge about.
I am not a fan of unbounded utilities, but it is worth noting that most (all?) the problems with unbounded utilties are actually a problem with utility functions that are not integrable with respect to your probabilities. It feels basically okay to me to have unbounded utilities as long as extremely good/bad events are also sufficiently unlikely.
The space of allowable probability functions that go with an unbounded utility can still be closed under finite mixtures and conditioning on positive probability events.Indeed, if you think of utility functions as coming from VNM, and you a space of lotteries closed under finite mixtures but not arbitrary mixtures, I think there are VNM preferences that can only correspond to unbounded utility functions, and the space of lotteries is such that you can’t make St. Petersburg paradoxes. (I am guessing, I didn’t check this.)
- 6 Feb 2022 21:59 UTC; 28 points) 's comment on Impossibility results for unbounded utilities by (
So I think my orientation on seeking out disagreement is roughly as follows. (This is going to be a rant I write in the middle of the night, so might be a little incoherent.)
There are two distinct tasks: 1)Generating new useful hypotheses/tools, and 2)Selecting between existing hypotheses/filtering out bad hypotheses.
There are a bunch of things that make people good at both these tasks simultaneously. Further, each of these tasks is partially helpful for doing the other. However, I still think of them as mostly distinct tasks.
I think skill at these tasks is correlated in general, but possibly anti-correlated after you filter on enough g correlates, in spite of the fact that they are each common subtasks of the other.
I don’t think this (anti-correlated given g) very confidently, but I do think it is good to track your own and others skill in the two tasks separately, because it is possible to have very different scores (and because of side effects of judging generators on reliability might make them less generative as a result of being afraid of being wrong, and similarly vise versa.)
I think that seeking out disagreement is especially useful for the selection task, and less useful for the generation task. I think that echo chambers are especially harmful for the selection task, but can sometimes be useful for the generation task. Working with someone who agrees with you on a bunch of stuff and shares your ontology allows you to build deeply faster. Someone with a lot of disagreement with you can cause you to get stuck on the basics and not get anywhere. (Sometimes disagreement can also be actively helpful for generation, but it is definitely not always helpful.)
I spend something like 90+% of my research time focused on the generation task. Sometimes I think my colleagues are seeing something that I am missing, and I seek out disagreement, so that I can get a new perspective, but the goal is to get a slightly different perspective on the thing I am working on, and not on really filtering based on which view is more true. I also sometimes do things like double-crux with people with fairly different world views, but even there, it feels like the goal is to collect new ways to think, rather than to change my mind. I think that for this task a small amount of focusing on people who disagree with you is pretty helpful, but even then, I think I get the most out of people who disagree with me a little bit, because I am more likely to be able to actually pick something up. Further, my focus is not really on actually understanding the other person, I just want to find new ways to think, so I will often translate things to something near by my ontology, and thus learn a lot, but still not be able to pass an ideological Turing test.
On the other hand, when you are not trying to find new stuff, but instead e.g. evaluate various different hypotheses about AI timelines, I think it is very important to try to understand views that are very far from your own, and take steps to avoid echo chamber effects. It is important to understand the view, the way the other person understands it, not just the way that conveniently fits with your ontology. This is my guess at the relevant skills, but I do not actually identify as especially good at this task. I am much better at generation, and I do a lot of outside-view style thinking here.
However, I think that currently, AI safety disagreements are not about two people having mostly the same ontology and disagreeing on some important variables, but rather trying to communicate across very different ontologies. This means that we have to build bridges, and the skills start to look more like generation skill. It doesn’t help to just say, “Oh, this other person thinks I am wrong, I should be less confident.” You actually have to turn that into something more productive, which means building new concepts, and a new ontology in which the views can productively dialogue. Actually talking to the person you are trying to bridge to is useful, but I think so is retreating to your echo chamber, and trying to make progress on just becoming less confused yourself.
For me, there is a handful of people who I think of as having very different views from me on AI safety, but are still close enough that I feel like I can understand them at all. When I think about how to communicate, I mostly think about bridging the gap to these people (which already feels like and impossibly hard task), and not as much the people that are really far away. Most of these people I would describe as sharing the philosophical stance I said MIRI selects for, but probably not all.
If I were focusing on resolving strategic disagreements, I would try to interact a lot more than I currently do with people who disagree with me. Currently, I am choosing to focus more on just trying to figure out how minds work in theory, which means I only interact with people who disagree with me a little. (Indeed, I currently also only interact with people who agree with me a little bit, and so am usually in an especially strong echo chamber, which is my own head.)
However, I feel pretty doomy about my current path, and might soon go back to trying to figure out what I should do, which means trying to leave the echo chamber. Often when I do this, I neither produce anything great nor change my mind, and eventually give up and go back to doing the doomy thing where at least I make some progress (at the task of figuring out how minds work in theory, which may or may not end up translating to AI safety at all).
Basically, I already do quite a bit of the “Here are a bunch of people who are about as smart as I am, and have thought about this a bunch, and have a whole bunch of views that differ from me and from each other. I should be not that confident” (although I should often take actions that are indistinguishable from confidence, since that is how you work with your inside view.) But learning from disagreements more than that is just really hard, and I don’t know how to do it, and I don’t think spending more time with them fixes it on its own. I think this would be my top priority if I had a strategy I was optimistic about, but I don’t, and so instead, I am trying to figure out how minds work, which seems like it might be useful for a bunch of different paths. (I feel like I have some learned helplessness here, but I think everyone else (not just MIRI) is also failing to learn (new ontologies, rather than just noticing mistakes) from disagreements, which makes me think it is actually pretty hard.)
- 5 Nov 2021 18:25 UTC; 4 points) 's comment on Disagreeables and Assessors: Two Intellectual Archetypes by (EA Forum;
It seems quite possible to me that the philosophical stance + mathematical taste you’re describing aren’t “natural kinds” (e.g. the topics you listed don’t actually have a ton in common, besides being popular MIRI-sphere topics).
So, I believe that the philosophical stance is a natural kind. I can try to describe it better, but note that I won’t be able to point at it perfectly:
I would describe it as “taking seriously the idea that you are
a computation[Edit: an algorithm].” (As opposed to a collection of atoms, or a location in spacetime, or a Christian soul, or any number of other things you could identify with.)I think that most of the selection for this philosophical stance happens not in MIRI hiring, but instead in being in the LW community. I think that the sequences are actually mostly about the consequences of this philosophical stance, and that the sequences pipeline is largely creating a selection for this philosophical stance.
One can have this philosophical stance without a bunch of math ability, (many LessWrongers do) but when the philosophical stance is combined with math ability, it leads to a lot of agreement in taste in math-philosophy models, which is what you see in MIRI employees.
To make a specific (but hard to verify) claim, I think that if you were to take MIRI employees, and intervene on before they found lesswrong, and show them a lot of things like UDASSA, TDT, reflective oracles, they will be very interested in them relative to other math/philosophy ideas. Further, if you were to take people in 2000, before the existence of LW and filter on being interested in some of these ideas, you will will find people interested in many of these ideas.
(I listed ideas that came from MIRI, but there are many ideas that did not come from MIRI that people with this philosophical stance (and math ability) tend to be interested in: Logic, Probability, Game Theory, Information Theory, Algorithmic Information Theory)
(I used to not believe this. When I first started working at MIRI, I felt like I was lucky to have all of these mathematical and philosophical interests converge to the same place. I attributed it to a coincidence, but now think it has a common natural cause.)
(I think that this philosophical stance is really not enough to cause people to converge on many strategic questions. For example, I think Eliezer Yudkowsky, Jessica Taylor, Paul Christiano, and Andrew Critch all score very highly on this philosophical stance, and have a wide range of different views on timelines, probability of doom, and the strategic landscape.)
- 29 Oct 2021 23:00 UTC; 4 points) 's comment on I Really Don’t Understand Eliezer Yudkowsky’s Position on Consciousness by (
Next year, can we have “something sort of like left-libertarianism-ist” on the big politics question. I think that there are many people here (myself included) that do not know how to categorize ourselves politically, but know that we have a lot in common with Yvain.
I think Chris Langan and the CTMU are very interesting, and I there is an interesting and important challenge for LW readers to figure out how (and whether) to learn from Chris. Here are some things I think are true about Chris (and about me) and relevant to this challenge. (I do not feel ready to talk about the object level CTMU here, I am mostly just talking about Chris Langan.)
Chris has a legitimate claim of being approximately the smartest man alive according to IQ tests.
Chris wrote papers/books that make up a bunch of words there are defined circularly, and are difficult to follow. It is easy to mistake him for a complete crackpot.
Chris claims to have proven the existence of God.
Chris has been something-sort-of-like-canceled for a long time. (In the way that seems predictable when “World’s Smartest Man Proves Existence of God.”)
Chris has some followers that I think don’t really understand him. (In the way that seems predictable when “World’s Smartest Man Proves Existence of God.”)
Chris acts socially in a very nonstandard way that seems like a natural consequence of having much higher IQ than anyone else he has ever met. In particular, I think this manifests in part as an extreme lack of humility.
Chris is actually very pleasant to talk to if (like me) it does not bother you that he acts like he is much smarter than you.
I personally think the proof of the existence of God is kid of boring. It reads to me as kind of like “I am going to define God to be everything. Notice how this meets a bunch of the criteria people normally attribute to God. In the CTMU, the universe is mind-like. Notice how this meets a bunch more criteria people normally attribute to God.”
While the proof of the existence of God feels kind of mundane to me, Chris is the kind of person who chooses to interpret it as a proof of the existence of God. Further, he also has other more concrete supernatural-like and conspiracy-theory-like beliefs, that I expect most people here would want to bet against.
I find the CTMU in general interesting, (but I do not claim to understand it).
I have noticed many thoughts that come naturally to me that do not seem to come naturally to other people (e.g. about time or identity), where it appears to me that Chris Langan just gets it (as in he is independently generating it all).
For years, I have partially depended on a proxy when judging other people (e.g. for recommending funding) that is is something like “Do I, Scott, like where my own thoughts go in contact with the other person?” Chris Langan is approximately at the top according to this proxy.
I believe I and others here probably have a lot to learn from Chris, and arguments of the form “Chris confidently believes false thing X,” are not really a crux for me about this.
IQ is not the real think-oomph (and I think Chris agrees), but Chris is very smart, and one should be wary of clever arguers, especially when trying to learn from someone with much higher IQ than you.
I feel like I am spending (a small amount of) social credit in this comment, in that when I imagine a typical LWer thinking “oh, Scott semi-endorses Chris, maybe I should look into Chris,” I imagine the most likely result is that they will reach the conclusion is that Chris is a crackpot, and that Scott’s semi-endorsements should be trusted less.
If any users do submit a set of launch codes, tomorrow I’ll publish their identifying details.
If we make it through this, here are some ideas to make it more realistic next year:
1) Anonymous codes.
2) Karma bounty for the first person to press the button.
1+2) Randomly and publicly give some people the same code as each other, and give a karma bounty to everyone who had the code that took down the site.
3) Anyone with button rights can share button rights with anyone, and a karma bounty for sharing with the most other people that only pays out if nobody presses the button.
So, I feel like I am concerned for everyone, including myself, but also including people who do not think that it would effect them. A large part of what concerns me is that the effects could be invisible.
For example, I think that I am not very effected by this, but I recently noticed a connection between how difficult it is to get to work on writing a blog post that I think it is good to write, and how much my system one expects some people to receive the post negatively. (This happened when writing the recent MtG post.) This is only anecdotal, but I think that posts that seems like bad PR caused akrasia, even when when controlling for how good I think the post is on net. The scary part is that there was a long time before I noticed this. If I believed that there was a credible way to detect when there are thoughts you can’t have in the first place, I would be less worried.
I didn’t have many data points, and the above connection might have been a coincidence, but the point I am trying to make is that I don’t feel like I have good enough introspective access to rule out a large, invisible, effect. Maybe others do have enough introspective access, but I do not think that just not seeing the outer incentives pulling on you is enough to conclude that they are not there.
The cover is incorrect :(
EDIT: If you do not understand this post, read essay 268 from the book!
I am actually worried that because I posted it, people will think it is more relevant to AI safety than it really is. I think it is a little related, but not strongly.
I do think it is surprising and interesting. I think it is useful for thinking about civilization and civilizational collapse and what aliens (or maybe AI or optimization daemons) might look like. My inner Andrew Critch also thinks it is more directly related to AI safety than I do. Also if I thought multipolar scenarios were more likely, I might think it is more relevant.
Also it is made out of pieces such that thinking about it was a useful exercise. I am thinking a lot about Nash equilibria and dynamics. I think the fact that Nash equilibria are not exactly a dynamic type of object and are not easy to find is very relevant to understanding embedded agency. Also, I think that modal combat is relevant, because I think that Lobian handshakes are pointing at an important part of reasoning about oneself.
I think it is relevant enough that it was worth doing, and such that I would be happy if someone expanded on it, but I am not planning on thinking about it much more because it does feel only tangentially related.
That being said, many times I have explicitly thought that I was thinking about a thing that was not really related to the bigger problems I wanted to be working on, only to later see a stronger connection.
Hmm, so this seems plausible, but in which case, it seems like the base rate for “little mental resistance to organizational narratives” is very low, and the story should not be “Hired people probably have little mental resistance because they were hired” but should instead be “Hired probably have little mental resistance because basically everyone has little mental resistance.” (these are explanatory uses of “because”, not a causal uses.)
This second story seems like it could be either very true or very false, for different values of “little”, so it doesn’t seems like it has a truth value until we operationalize “little.”Even beyond the base rates, it seems likely that a potential hire could be dismissed because they seem crazy, including at MIRI, but I would predict that MIRI is pretty far on the “willing to hire very smart crazy people” end of the spectrum.
MIRI is a research org. It is not an advocacy org. It is not even close. You can tell by the fact that it basically hasn’t said anything for the last 4 years. Eliezer’s personal twitter account does not make MIRI an advocacy org.
(I recognize this isn’t addressing your actual point. I just found the frame frustrating.)