The comments here are a storage of not-posts and not-ideas that I would rather write down than not.
Yesterday I noticed that I had a pretty big disconnect from this: There’s a very real chance that we’ll all be around, business somewhat-as-usual in 30 years. I mean, in this world many things have a good chance of changing radically, but automation of optimisation will not cause any change on the level of the industrial revolution. DeepMind will just be a really cool tech company that builds great stuff. You should make plans for important research and coordination to happen in this world (and definitely not just decide to spend everything on a last-ditch effort to make everything go well in the next 10 years, only to burn up the commons and your credibility for the subsequent 20).
Only yesterday when reading Jessica’s post did I notice that I wasn’t thinking realistically/in-detail about it, and start doing that.
Related hypothesis: people feel like they’ve wasted some period of time e.g. months, years, ‘their youth’, when they feel they cannot see an exciting path forward for the future. Often this is caused by people they respect (/who have more status than them) telling them they’re only allowed a small few types of futures.
Responding to Scott’s response to Jessica.
The post makes the important argument that if we have a word whose boundary is around a pretty important set of phenomena that are useful to have a quick handle to refer to, then
It’s really unhelpful for people to start using the word to also refer to a phenomena with 10x or 100x more occurrences in the world because then I’m no longer able to point to the specific important parts of the phenomena that I was previously talking about
e.g. Currently the word ‘abuser’ describes a small number of people during some of their lives. Someone might want to say that technically it should refer to all people all of the time. The argument is understandable, but it wholly destroys the usefulness of the concept handle.
People often have political incentives to push the concept boundary to include a specific case in a way that, if it were principled, indeed makes most of the phenomena in the category no use to talk about. This allows for selective policing being the people with the political incentive.
It’s often fine for people to bend words a little bit (e.g. when people verb nouns), but when it’s in the class of terms we use for norm violation, it’s often correct to impose quite high standards of evidence for doing so, as we can have strong incentives (and unconscious biases!) to do it for political reasons.
These are key points that argue against changing the concept boundary to include all conscious reporting of unconscious bias, and more generally push back against unprincipled additions to the concept boundary.
This is not an argument against expanding the concept to include a specific set of phenomena that share the key similarities with the original set, as long as the expansion does not explode the set. I think there may be some things like that within the category of ‘unconscious bias’.
While it is the case that it’s very helpful to have a word for when a human consciously deceives another human, my sense is that there are some important edge cases that we would still call lying, or at least a severe breach of integrity that should be treated similarly to deliberate conscious lies.
Humans are incentivised to self-deceive in the social domain in order to be able to tell convincing lies. It’s sometimes important that if it’s found out that someone strategically self-deceived, that they be punished in some way.
A central example here might be a guy who says he wants to be in a long and loving committed relationship, only to break up after he is bored of the sex after 6-12 months, and really this was predictable from the start if he hadn’t felt it was fine to make big commitments things without introspecting carefully on their truth. I can imagine the woman in this scenario feeling genuinely shocked and lied to. “Hold on, what are you talking about that you feel you want to move out? I am organising my whole life around this relationship, what you are doing right now is calling into question the basic assumptions that you have promised to me.” I can imagine this guy getting a reputation for being untrustworthy and lying to women. I think it is an open question about whether it is accurate for the woman cheated by this man to tell other people that he “lied to her”, though I think it is plausible that I want to punish this behaviour in a similar way that I want to punish much more conscious deception, in a way that motivates me to want to refer to it with the same handle—because it gives you basically very similar operational beliefs about the situation (the person systematically deceived me in a way that was clearly for their personal gain and this hurt me and I think they should be actively punished).
I think I can probably come up with an example where a politician wants power and does whatever is required to take it, such that they end up not being in alignment with the values they stated they held earlier in their career, and allow the meaning of words to fluctuate around them in accordance with what the people giving the politician votes and power want that they end up saying something that is effectively a lie, but that they don’t care about or really notice. This one is a bit slippery for me to point to.
Another context that is relevant: I can imagine going to a scientific conference in a field that has been hit so hard by the replication crisis, that basically all the claims in the conference were false, and I could know this. Not only are the claims at this conference false, but the whole subfield has never been about anything real (example, example, and of course, example). I can imagine a friend of mine attending such a conference and talking to me afterwards, and them thinking that some of the claims seemed true. And I can imagine saying to them “No. You need to understand that all the claims in there are lies. There is no truth-tracking process occurring. It is a sham field, and those people should not be getting funding for their research.” Now, do I think the individuals in the field are immoral? Not exactly, but sorta. They didn’t care about truth and yet paraded themselves as scientists. But I guess that’s a big enough thing in society that they weren’t unusually bad or anything. While it’s not a central case of lying, it currently feels to me like it’s actively helpful for me to use the phrase ‘lie’ and ‘sham’. There is a systematic distortion of truth that gives people resources they want instead of those resources going to projects not systematically distorting reality.
(ADDED: OTOH I do think that I have myself in the past been prompted to want to punish people for these kinds of ‘lies’ in ways that isn’t effective. I have felt that people who have committed severe breaches of integrity in the communities I’m part of are bad people and felt very angry at them, but I think that this has often been an inappropriate response. It does share other important similarities with lies though. Probably want to be a bit careful with the usage here and signal that the part of wanting to immediately socially punish them for a thing that they obviously did wrong is not helpful, because they will feel helpless and not that it’s obvious they did something wrong. But it’s important for me internally to model them as something close to lying, for the sanity of my epistemic state, especially when many people in my environment will not know/think the person has breached integrity and will socially encourage me to positively weight their opinions/statements.)
My current guess at the truth: there are classes of human motivations, such as those for sex, and for prestigious employment positions in the modern world, that have sufficiently systematic biases in favour of self-deception that it is not damaging to add them to the category of ‘lie’ - adding them is not the same as a rule that admits all unconscious bias consciously reported, just a subset that reliably turns up again and again. I think Jessica Taylor / Ben Hoffman / Michael Arc want to use the word ‘fraud’ to refer to it, I’m not sure.
I will actually clean this up and into a post sometime soon [edit: I retract that, I am not able to make commitments like this right now]. For now let me add another quick hypothesis on this topic whilst crashing from jet lag.
A friend of mine proposed that instead of saying ‘lies’ I could say ‘falsehoods’. Not “that claim is a lie” but “that claim is false”.
I responded that ‘falsehood’ doesn’t capture the fact that you should expect systematic deviations from the truth. I’m not saying this particular parapsychology claim is false. I’m saying it is false in a way where you should no longer trust the other claims, and expect they’ve been optimised to be persuasive.
They gave another proposal, which is to say instead of “they’re lying” to say “they’re not truth-tracking”. Suggest that their reasoning process (perhaps in one particular domain) does not track truth.
I responded that while this was better, it still seems to me that people won’t have an informal understanding of how to use this information. (Are you saying that the ideas aren’t especially well-evidenced? But they sound pretty plausible to me, so let’s keep discussing them and look for more evidence.) There’s a thing where if you say someone is a liar, not only do you not trust them, but you recognise that you shouldn’t even privilege the hypotheses that they produce. If there’s no strong evidence either way, if it turns out the person who told it you is a rotten liar, then if you wouldn’t have considered it before they raised it, don’t consider it now.
Then I realised Jacob had written about this topic a few months back. People talk as though ‘responding to economic incentives’ requires conscious motivation, but actually there are lots of ways that incentives cause things to happen that don’t require humans consciously noticing the incentives and deliberately changing their behaviour. Selection effects, reinforcement learning, and memetic evolution.
Similarly, what I’m looking for is basic terminology for pointing to processes that systematically produce persuasive things that aren’t true, that doesn’t move through “this person is consciously deceiving me”. The scientists pushing adult neurogenesis aren’t lying. There’s a different force happening here that we need to learn to give epistemic weight to the same way we treat a liar, but without expecting conscious motivation to be the root of the force and thus trying to treat it that way (e.g. by social punishment).
More broadly, it seems like there are persuasive systems in the environment that weren’t in the evolutionary environment for adaptation, that we have not collectively learned to model clearly. Perhaps we should invest in some basic terminology that points to these systems so we can learn to not-trust them without bringing in social punishment norms.
I responded that ‘falsehood’ doesn’t capture the fact that you should expect systematic deviations from the truth.
Is this “bias”?
Yeah good point I may have reinvented the wheel. I have a sense that’s not true but need to think more.
The definitional boundaries of “abuser,” as Scott notes, are in large part about coordinating around whom to censure. The definition is pragmatic rather than objective.*
If the motive for the definition of “lies” is similar, then a proposal to define only conscious deception as lying is therefore a proposal to censure people who defend themselves against coercion while privately maintaining coherent beliefs, but not those who defend themselves against coercion by simply failing to maintain coherent beliefs in the first place. (For more on this, see Nightmare of the Perfectly Principled.) This amounts to waging war against the mind.
Of course, in matter of actual fact we don’t strongly censure all cases of consciously deceiving. In some cases (e.g. “white lies”) we punish those who fail to lie, and those who call out the lie. I’m also pretty sure we don’t actually distinguish between conscious deception and e.g. reflexively saying an expedient thing, when it’s abundantly clear that one knows very well that the expedient thing to say is false, as Jessica pointed out here.
*It’s not clear to me that this is a good kind of concept to have, even for “abuser.” It seems to systematically force responses to harmful behavior to bifurcate into “this is normal and fine” and “this person must be expelled from the tribe,” with little room for judgments like “this seems like an important thing for future partners to be warned about, but not relevant in other contexts.” This bifurcation makes me less willing to disclose adverse info about people publicly—there are prominent members of the Bay Area Rationalist community doing deeply shitty, harmful things that I actually don’t feel okay talking about beyond close friends because I expect people like Scott to try to enforce splitting behavior.
Note: I just wrote this in one pass when severely jet lagged, and did not have the effort to edit it much. If I end up turning this into a blogpost I will probably do that. Anyway, I am interested in hearing via PM from anyone who feels that it was sufficiently unclearly written that they had a hard time understanding/reading it.
Good posts you might want to nominate in the 2018 Review
I’m on track to nominate around 30 posts from 2018, which is a lot. Here is a list of about 30 further posts I looked at that I think were pretty good but didn’t make my top list, in the hopes that others who did get value out of the posts will nominate their favourites. Each post has a note I wrote down for myself about the post.
Reasons compute may not drive AI capabilities growth
I don’t know if it’s good, but I’d like it to be reviewed to find out.
The Principled-Intelligence Hypothesis
Very interesting hypothesis generation. Unless it’s clearly falsified, I’d like to see it get built on.
Will AI See Sudden Progress? DONE
I think this post should be considered paired with Paul’s almost-identical post. It’s all exactly one conversation.
Personal Relationships with Goodness
This felt like a clear analysis of an idea and coming up with some hypotheses. I don’t think the hypotheses really captures what’s going on, and most of the frames here seem like they’ve caused a lot of people to do a lot of hurt to themselves, but it seemed like progress in that conversation.
Are ethical asymmetries from property rights?
Again, another very interesting hypothesis.
Incorrect Hypotheses Point to Correct Observations ONE NOMINATION
This seems to me like close to an important point but not quite saying it. I don’t know if I got anything especially knew from its framing, but its examples are pretty good.
Whose reasoning can you rely on when your own is faulty?
I really like the questions, and should ask them more about the people I know.
Inconvenience is Qualitatively Bad ONE NOMINATION
I think that the OP is an important idea. I think my comment on it is pretty good (and the discussion below it), though I’ve substantially changed my position since then, and should write up my new worldview once my life calms down. I don’t think I should nominate it because I’m a major part of the discussion.
The abruptness of nuclear weapons
Clearly valuable historical case, simple effect model.
Book Review: Pearl’s Book of Why
Why science has made a taboo of causality feels like a really important question to answer when figuring out how much to trust academia and how to make institutions that successfully make scientific progress, and this post suggests some interesting hypotheses.
Functional Institutions Are the Exception ONE NOMINATION
Was a long meditation on an important idea, that I’ve found valuable to read. Agree with commenter that it’s sorely lacking in examples however.
Strategies of Personal Growth ONE NOMINATION
Oli curated it, he should consider nominating and saying what he found useful. It all seemed good but I didn’t personally get much from it.
Preliminary Thoughts on Moral Weight DONE
A bunch of novel hypotheses about consciousness in different animals that I’d never heard before, which seem really useful for thinking about the topic.
Theories of Pain
I thought this was a really impressive post, going around and building simple models of lots of different theories, and giving a bit of personal experience with the practitioners of the theories. It was systematic and goal oriented and clear.
Clarifying “AI Alignment” DONE
Rohin’s comment is the best part of this post, not sure how best to nominate it.
Norms of Membership for Voluntary Groups
There’s a deep problem here of figuring out norms in novel and weird and ambiguous environments in the modern world, especially given the internet, and this post is kind of like a detailed, empirical study of some standard clusters of norms, which I think is very helpful.
How Old is Smallpox?
Central example of “Things we learned on LessWrong in 2018”. Should be revised though.
“Cheat to Win”: Engineering Positive Social Feedback
Feels like a clear example of a larger and important strategy about changing the incentives on you. Not clear how valuable the pots is alone, but I like it a lot.
I don’t know why, but I think about this post a lot.
Meditations on Momentum
I feel like this does a lot of good intuition building work, and I think about this post from time to time in my own life. I think that Jan brought up some good points in the comments about not wanting to cause confusion about different technical concepts all being the same, so I’d like to see the examples reviewed to check they’re all about attachment effects and not conflating different effects.
On Exact Mathematical Formulae
This makes a really important point anyone learns in the study of mathematics, and I think is generally an important distinction to have understand between language and reality. Just because we have words for some things doesn’t make them more real than things we don’t have words for. The point is to look at reality, not to look at the words.
Recommendations vs Guidelines
I think back to this occasionally. Seems like quite a useful distinction, and maybe we should try to encourage people making more guidelines. Maybe we should build a wiki and have a page type ‘guideline’ where people contribute to make great guidelines.
On the Chatham House Rule DONE
This is one of the first posts that impressed upon me the deeply tangled difficulties of information security, something I’ve increasingly thought a lot about in the intervening years, and expect to think about even more in the future.
Different types (not sizes!) of infinity
Some important conceptual work fundamental to mathematics. Very short and insightful. Not sure if I should allow this though, because if I do am I just allowing all high-level discussion of math to be included?
Expected Pain Parameters
It feels like useful advice and potentially a valuable observation with which to view a deeper problem. But unclear on the last one, and not sure if this post should be nominated just on the first alone.
Research: Rescuers During the Holocaust DONE
The most interesting part about this is the claim that most people who housed Jews during the holocaust did it because the social situation made the moral decision very explicit and that they felt they only had one possible outcome, not because they were proactive moral planners. I would like to see an independent review of this info.
Lessons from the cold war on information hazards: why internal communication is critical DONE
Seems like an important historical lesson.
Problem Solving With Crayons and Mazes
I didn’t find it much useful. Oli was excited when he curated it, should poke him to consider nominating it.
Insights from “Strategy of Conflict”
Seems helpful but weird to nominate, as the book is short and this post explicitly doesn’t contain all the key ideas in the book. I did learn from this that having lots of nukes is more stable than having a small number, and this has stuck with me.
The Bat and Ball Problem Revisited DONE
A curiosity-driven walk through what’s going on with the bat-and-the-ball problem by Kahneman.
Good Samaritans in Experiments
A highly opinionated and very engaging criticism of a study.
Hammertime Final Exam ONE NOMINATION
Was great, I’m not actually sure whether it fits into this review process?
Naming the Nameless DONE
Some people seemed to get a lot out of this, but I haven’t had the time to engage with it much.
Actually, just re-read it, and it’s brilliant, and one of the best 5-10 of the year. Will nominate it myself if nobody else does.
How did academia ensure papers were correct in the early 20th century? DONE
I’m glad I put this down in writing. I found it useful myself. But others should figure out whether to nominate.
Competitive Markets as Distributed Backdrop DONE
I felt great about it when I read this post last time. I’ve not given it a careful re-read, would like to see it reviewed, but I think it’s likely I’ll rediscover it’s a very helpful abstraction.
AI alignment posts you might want to nominate
[Edit: On reflection, I think that the Alignment posts that do not also have implications for human rationality aren’t important to go through this review process, and we’ll likely create another way to review that stuff and make it into books.]
There was also a lot of top-notch AI alignment writing, but I mostly don’t feel well-placed to nominate it. I hope others can look through and nominate selections from these.
Abram’s writing that year
Wei Dai’s writing that year
Scott’s “Fixed Point Discussion”
Feels like one of the few practical posts that can help a large number of people do embedded agency research, so really valuable from that perspective.
Vika’s “Discussion on the Machine Learning Approach to AI safety”
I recently circled for the first time. I had two one-hour sessions on consecutive days, with 6 and 8 people respectively.
My main thoughts: this seems like a great way for getting to know my acquaintances, connecting emotionally, and build closer relationships with friends. The background emotional processing happening in individuals is repeatedly brought forward as the object of conversation, for significantly enhanced communication/understanding. I appreciated getting to poke and actually find out whether people’s emotional states matched the words they were using. I got to ask questions like:
When you say you feel gratitude, do you just mean you agree with what I said, or do you mean you’re actually feeling warmth toward me? Where in your body do you feel it, and what is it like?
Not that a lot of my circling time was skeptical of people’s words, a lot of the time I trusted the people involved to be accurately reporting their experiences. It was just very interesting—when I noticed I didn’t feel like someone was honest about some micro-emotion—to have the affordance to stop and request an honest internal report.
It felt like there was a constant tradeoff between social-interaction and noticing my internal state. If all I’m doing is noticing my internal state, then I stop engaging with the social environment and don’t have anything off substance to report on. If I just focus on the social interactions, then I never stop and communicate more deeply about what’s happening for me internally. I kept on having an experience like “Hey, I want to interject to add nuance to what you said-” and then stopping and going “So, when you said <x> I felt a sense of irritation/excitement/distrust/other because <y>”.
One moment that I liked a lot, was around the epistemic skill of not deciding your position a second earlier than necessary. Person A was speaking, and person B jumped in and said something that sounded weirdly aggressive. It didn’t make sense, and then person B said “Wait, let me try to figure out what I mean, I feel I’m not using quite the right words”. My experience was first to feel some worry for person A feeling attacked. I quickly calmed down, noticing how thoroughly out of character it would be for person B to actually be saying anything aggressive. I then realised I had a clear hypothesis for what person B actually wanted to say, and waited politely for them to say it. But then I noticed that actually I didn’t have much evidence for my hypothesis at all… so I moved into a state of only curiosity about what person B was going to say, not holding onto my theory of what they would say. And indeed, it turned out they said something entirely different. (I subsequently related this whole experience to person B during the circle.)
This is really important. Being able to hold off on keeping your favoured theory in the back of your head and counting all evidence as pro- or anti- the theory, and instead keeping the theory separate from your identity and feeling full creative freedom to draw a new theory around the evidence that comes in.
There were other personal moments where I brought up how I was feeling toward my friends and they to me, in ways that allowed me to look at long-term connections and short-term conflicts in a clearer light. It was intense.
Both circles were very emotionally interesting and introspectively clarifying, and I will do more with friends in the future.
I was just re-reading the classic paper Artificial Intelligence as Positive and Negative Factor in Global Risk. It’s surprising how well it holds up. The following quotes seem especially relevant 13 years later.
On the difference between AI research speed and AI capabilities speed:
The first moral is that confusing the speed of AI research with the speed of a real AI once built is like confusing the speed of physics research with the speed of nuclear reactions. It mixes up the map with the territory. It took years to get that first pile built, by a small group of physicists who didn’t generate much in the way of press releases. But, once the pile was built, interesting things happened on the timescale of nuclear interactions, not the timescale of human discourse. In the nuclear domain, elementary interactions happen much faster than human neurons fire. Much the same may be said of transistors.
On neural networks:
The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque—the user has no idea how the neural net is making its decisions—and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.
On funding in the AI Alignment landscape:
If tomorrow the Bill and Melinda Gates Foundation allocated a hundred million dollars of grant money for the study of Friendly AI, then a thousand scientists would at once begin to rewrite their grant proposals to make them appear relevant to Friendly AI. But they would not be genuinely interested in the problem—witness that they did not show curiosity before someone offered to pay them. While Artificial General Intelligence is unfashionable and Friendly AI is entirely off the radar, we can at least assume that anyone speaking about the problem is genuinely interested in it. If you throw too much money at a problem that a field is not prepared to solve, the excess money is more likely to produce anti-science than science—a mess of false solutions.
If unproven brilliant young scientists become interested in Friendly AI of their own accord, then I think it would be very much to the benefit of the human species if they could apply for a multi-year grant to study the problem full-time. Some funding for Friendly AI is needed to this effect—considerably more funding than presently exists. But I fear that in these beginning stages, a Manhattan Project would only increase the ratio of noise to signal.
This long final quote shows the security mindset when thinking about takeoff speeds, points Eliezer has returned to commonly since then.
Let us concede for the sake of argument that, for all we know (and it seems to me also probable in the real world) that an AI has the capability to make a sudden, sharp, large leap in intelligence. What follows from this?
First and foremost: it follows that a reaction I often hear, “We don’t need to worry about Friendly AI because we don’t yet have AI,” is misguided or downright suicidal. We cannot rely on having distant advance warning before AI is created; past technological revolutions usually did not telegraph themselves to people alive at the time, whatever was said afterward in hindsight. The mathematics and techniques of Friendly AI will not materialize from nowhere when needed; it takes years to lay firm foundations. And we need to solve the Friendly AI challenge before Artificial General Intelligence is created, not afterward; I shouldn’t even have to point this out. There will be difficulties for Friendly AI because the field of AI itself is in a state of low consensus and high entropy. But that doesn’t mean we don’t need to worry about Friendly AI. It means there will be difficulties. The two statements, sadly, are not remotely equivalent.
The possibility of sharp jumps in intelligence also implies a higher standard for Friendly AI techniques. The technique cannot assume the programmers’ ability to monitor the AI against its will, rewrite the AI against its will, bring to bear the threat of superior military force; nor may the algorithm assume that the programmers control a “reward button” which a smarter AI could wrest from the programmers; et cetera. Indeed no one should be making these assumptions to begin with. The indispensable protection is an AI that does not want to hurt you. Without the indispensable, no auxiliary defense can be regarded as safe. No system is secure that searches for ways to defeat its own security. If the AI would harm humanity in any context, you must be doing something wrong on a very deep level, laying your foundations awry. You are building a shotgun, pointing the shotgun at your foot, and pulling the trigger. You are deliberately setting into motion a created cognitive dynamic that will seek in some context to hurt you. That is the wrong behavior for the dynamic; write code that does something else instead.
For much the same reason, Friendly AI programmers should assume that the AI has total access to its own source code. If the AI wants to modify itself to be no longer Friendly, then Friendliness has already failed, at the point when the AI forms that intention. Any solution that relies on the AI not being able to modify itself must be broken in some way or other, and will still be broken even if the AI never does modify itself. I do not say it should be the only precaution, but the primary and indispensable precaution is that you choose into existence an AI that does not choose to hurt humanity.
To avoid the Giant Cheesecake Fallacy, we should note that the ability to self-improve does not imply the choice to do so. The successful exercise of Friendly AI technique might create an AI which had the potential to grow more quickly, but chose instead to grow along a slower and more manageable curve. Even so, after the AI passes the criticality threshold of potential recursive self-improvement, you are then operating in a much more dangerous regime. If Friendliness fails, the AI might decide to rush full speed ahead on self-improvement—metaphorically speaking, it would go prompt critical.
I tend to assume arbitrarily large potential jumps for intelligence because (a) this is the conservative assumption; (b) it discourages proposals based on building AI without really understanding it; and (c) large potential jumps strike me as probable-in-the-real-world. If I encountered a domain where it was conservative from a risk-management perspective to assume slow improvement of the AI, then I would demand that a plan not break down catastrophically if an AI lingers at a near-human stage for years or longer. This is not a domain over which I am willing to offer narrow confidence intervals.
I cannot perform a precise calculation using a precisely confirmed theory, but my current opinion is that sharp jumps in intelligence are possible, likely, and constitute the dominant probability. This is not a domain in which I am willing to give narrow confidence intervals, and therefore a strategy must not fail catastrophically—should not leave us worse off than before—if a sharp jump in intelligence does not materialize. But a much more serious problem is strategies visualized for slow-growing AIs, which fail catastrophically if there is a first-mover effect.
My current strategic outlook tends to focus on the difficult local scenario: The first AI must be Friendly. With the caveat that, if no sharp jumps in intelligence materialize, it should be possible to switch to a strategy for making a majority of AIs Friendly. In either case, the technical effort that went into preparing for the extreme case of a first mover should leave us better off, not worse.
Reviews of books and films from my week with Jacob:
The Big Short
Review: Really fun. I liked certain elements of how it displays bad nash equilibria in finance (I love the scene with the woman from the ratings agency—it turns out she’s just making the best of her incentives too!).
Review: Wow. A simple story, yet entirely lacking in cliche, and so seemingly original. No cliched characters, no cliched plot twists, no cliched humour, all entirely sincere and meaningful. Didn’t really notice that it was animated (while fantastical, it never really breaks the illusion of reality for me). The few parts that made me laugh, made me laugh harder than I have in ages.
There’s a small visual scene, unacknowledged by the ongoing dialogue, between the mouse-baby and the dust-sprites which is the funniest thing I’ve seen in ages, and I had to rewind for Jacob to notice it.
I liked how by the end, the team of characters are all a different order of magnitude in size.
A delightful, well-told story.
Stranger Than Fiction
Review: This is now my go-to film of someone trying something original and just failing. Filled with new ideas, but none executed well, and overall just a flop, and it really phones-it-in for the last 20 minutes. It does make me notice the distinct lack of originality in most other films that I’ve seen though—most don’t even try to be original like this does. B+ for effort, but D for output.
I Love You, Daddy
Review: A great study of fatherhood, coming of age, and honesty. This was my second watch, and I found many new things that I didn’t find the first time -about what it means to grow up and have responsibility. One moment I absolutely loved is when the Charlie Day character (who was in my opinion representing the id), was brutally honest and totally rewarded for it. I might send this one to my mum, I think she’ll get a lot out of it.
My Dinner With Andre
Review: Very thought-provoking. Jacob and I discussed it for a while afterward. I hope to watch it again some day. I spent 25% of the movie thinking about my own response to what was being discussed, 25% imagining how I would create my version of this film (what the content of the conversation would be), and 50% actually paying close attention to the film.
Overall I felt that both characters were good representatives of their positions, and I liked how much they stuck to their observations over their theories (they talked about what they’d seen and experienced more than they made leaky abstractions and focused on those). The main variable that was not discussed, was technology. It is the agricultural and industrial revolutions that lead to humans feeling so out-of-sorts in the present day world, not any simple fact of how we socialise today, that can simply be fixed / gotten out of. Nonetheless, I expect that the algorithm that Andre is running will help him gain useful insights about how to behave in the modern world. But you do have to actually interface with it and be part of it to have a real cup of coffee waiting for you in the morning, or to lift millions out of poverty.
The last line of Roger Ebert’s review of this was great. Something like: “They’re both trying to get the other to wake up and smell the coffee. Only in Willy’s case, it’s real coffee.”
Books read (only parts of):
Computability and Logic (3rd edition)
I always forget basic definitions of languages and models, so a bunch of time was spent doing that. Jacob and I read half of the chapter on the non-standard numbers, to see how the constructions worked, and I just have the basics down more clearly now. Eliezer’s writings about these numbers connects more strongly to my other learning about first order logic now.
Book is super readable given the subject matter, easy to reference the concepts back to other parts of the book, and all round excellent (though it was the hardest slog on this list). Look forward to reading some more.
Modern Principles: Microeconomics (by Cowen and Tabarrok)
I’ve never read much about supply and demand curves, so it was great to go over them in detail, and how the price equilibrium is reached. We resolved many confusions, that I might end up writing in a LW post. I especially liked learning how the equilibrium price maximises social good, but is not the maximum for either the supplier or the buyer.
It was very wordy and I’d like to read a textbook that had the goal of this level of intuitiveness, but aimed at readers with assumed strong math background. I don’t need paragraphs explaining how to read a 2-D graph each time one comes up.
Jacob made a good point about how the book failed to distinguish hypothesis versus empirical evidence, when presenting standard microeconomic theory. Just because you have the theory down doesn’t mean you should believe it corresponds to reality, but the book didn’t seem to notice the difference.
Overall pretty good. I don’t expect to read most chapters in this book, but we also looked through asymmetric information (some of which later tied into our watching of The Big Short), and there were a few others that looked exciting.
I am in love with this book. I remember picking it up when I was about 17 and not being able to handle it at all and just flicking through to the answers—but this time, especially with Jacob, we were both able to notice when we felt we really understood something and wanted to check the answer to confirm, versus when we’d said ‘reasonable’ things but which didn’t really bottom out in our experiences of the world.
“Well, if you draw the force vectors like this, there should be a normal force of this strength, which splits up into these two basis vectors and so the ball should roll down at this speed.” “Why do you get to assume a force along the normal?” “I don’t know.” “Why do you get to break it up into two vectors who sum to the initial vector?” “I don’t know.” “Then I think we haven’t answered the question yet. Let’s think some more about our experience of balls rolling down hills.”
One of the best things about doing it with Jacob was that I often had cached answers to problems (both from studying mechanics in high school and having read the book 4 years ago), but instead on reading a problem I would give Jacob time to get confused about it, perhaps by supplying useful questions. Then eventually I’d propose my “Well isn’t it obviously X” answer, and Jacob would be able to point out the parts I hadn’t justified from first principles, helping me notice them. There’s a problem in discussing difficult ideas where if people have been taught the passwords, and especially if the passwords have a certain amount of structure that feels like understanding, that it’s hard to notice the gaps. Jacob helped me notice those, and then I could later come up with real answers, that were correct for the right reasons.
The least good thing about this book is the answers to the problems. Often Jacob and I would come up with an answer, then scrap it and build up a first-principles model that predicted it based in our experiences that we were very confident in, and then also deconstruct the initial false intuition some. Then we’d check the answer, and we were right, but the answer didn’t really address the intuitions in either direction, just gave a (correct) argument for the (correct) solution.
I think it might be really valuable to fully deconstruct the intuition behind why people expect a heavier object to fall faster. I’ve made some progress, but it feels like this is a neglected problem of learning a new field—explaining not only what intuitions you should have, but understanding why you assumed something different.
But the value of the book isn’t the answers—it’s the problems. I’ve never experienced such a coherent set of problems, where you can solve each from first principles (and building off what you’ve learned from the previous problems). With most good books, the more you put in the more you get out, but never have I seen a book where you can get this much out of it by putting so much in (most books normally hit a plateau earlier than this one).
Anyway, we got maybe 1/10th through the book. I can’t wait to work through this more the next time I see Jacob.
It’s already affected our discussions of other topics, how well we notice what we do and don’t understand, and what sorts of explanations we look for.
I’m also tempted, for other things I study, to spend less time writing up the insights and instead spend that time coming up with a problem set that you can solve from first principles.
This book made me think that the natural state of learning isn’t ‘reading’ but ‘play’. Playing with ideas, equations, problems, rather than reading and checking understanding.
Jacob and I now have a ritual of continuing the tradition of trying to understand the world, by going to places in Oxford where great thinkers have learned about the universe, and then solving a problem in this book. We visited a square in Magdalen College where Schroedinger worked on his great works, and solved some problems there.
You only get to read this book once. Use it well.
Hanging out with Jacob:
Grade: A++, would do again in a heartbeat.
Hypothesis: power (status within military, government, academia, etc) is more obviously real to humans, and it takes a lot of work to build detailed, abstract models of anything other than this that feel as real. As a result people who have a basic understanding of a deep problem will consistently attempt to manoeuvre into powerful positions vaguely related to the problem, rather than directly solve the open problem. This will often get defended with “But even if we get a solution, how will we implement it?” without noticing that (a) there is no real effort by anyone else to solve the problem and (b) the more well-understood a problem is, the easier it is to implement a solution.
I think this is true for people who’ve been through a modern school system, but probably not a human universal.
My, that was a long and difficult but worthwhile post. I see why you think it is not the natural state of affairs. Will think some more on it (though can’t promise a full response, it’s quite an effortful post). Am not sure I fully agree with your conclusions.
I’m much more interested in finding out what your model is after having tried to take those considerations into account, than I am in a point-by-point response.
This seems like a good conversational move to have affordance for.
(b) the more well-understood a problem is, the easier it is to implement a solution.
This might be true, but it doesn’t sound like it contradicts the premise of “how will we implement it”? Namely, just because understanding a problem makes it easier to implement, doesn’t mean that understanding alone makes it anywhere near easy to implement, and one may still need significant political clout in addition to having the solution. E.g. the whole infant nutrition thing.
Seems related to Causal vs Social Reality.
Do you have an example of a problem that gets approached this way?
Global warming? The need for prison reform? Factory Farming?
“Being a good person standing next to the development of dangerous tech makes the tech less dangerous.”
It seems that AI safety has this issue less than every other problem in the world, by proportion of the people working on it.
Some double digit percentage of all of the people who are trying to improve the situation, are directly trying to solve the problem, I think? (Or maybe I just live in a bubble in a bubble.)
And I don’t know how well this analysis applies to non-AI safety fields.
I’d take a bet at even odds that it’s single-digit.
To clarify, I don’t think this is just about grabbing power in government or military. My outside view of plans to “get a PhD in AI (safety)” seems like this to me. This was part of the reason I declined an offer to do a neuroscience PhD with Oxford/DeepMind. I didn’t have any secret for why it might be plausibly crucial.
Strong agree with Jacob.
There’s a game for the Oculus Quest (that you can also buy on Steam) called “Keep Talking And Nobody Explodes”.
It’s a two-player game. When playing with the VR headset, one of you wears the headset and has to defuse bombs in a limited amount of time (either 3, 4 or 5 mins), while the other person sits outside the headset with the bomb-defusal manual and tells you what to do. Whereas with other collaboration games, you’re all looking at the screen together, with this game the substrate of communication is solely conversation, the other person is providing all of your inputs about how their half is going (i.e. not shown on a screen).
The types of puzzles are fairly straightforward computational problems but with lots of fiddly instructions, and require the outer person to figure out what information they need from the inner person. It often involves things like counting numbers of wires of a certain colour, or remembering the previous digits that were being shown, or quickly describing symbols that are not any known letter or shape.
So the game trains you and a partner in efficiently building a shared language for dealing with new problems.
More than that, as the game gets harder, often some of the puzzles require substantial independent computation from the player on the outside. At this point, it can make sense to play with more than two people, and start practising methods for assigning computational work between the outer people (e.g. one of them works on defusing the first part of the bomb, and while they’re computing in their head for ~40 seconds, the other works on defusing the second part of the bomb in dialogue with the person on the inside). This further creates a system which trains the ability to efficiently coordinate on informational work under.
Overall I think it’s a pretty great game for learning and practising a number of high-pressure communication skills with people you’re close to.
There’s a similar free game for Android and iOs called space team that I highly recommend.
I use both this game and Space Team as part of training people in the on-call rotation at my company. They generally report that it’s fun, and I love it because it usually creates the kind of high-pressure feelings in people they may experience when on-call, so it creates a nice, safe environment for them to become more familiar with those feelings and how to work through them.
On a related note, I’m generally interested in finding more cooperative games with asymmetric information and a need to communicate. Lots of games meet one or two of those criteria, but very few games are able to meet all simultaneously. For example, Hanabi is cooperative and asymmetric, but lacks much communication (you’re not allowed to talk), and many games are asymmetric and communicative but not cooperative (Werewolf, Secret Hitler, etc.) or cooperative and communicative but not asymmetric (Pandemic, Forbidden Desert, etc.).
+1 – this game is great.
It’s really good with 3-4 people giving instructions and one person in the hot seat. Great for team bonding.
I talked with Ray for an hour about Ray’s phrase “Keep your beliefs cruxy and your frames explicit”.
I focused mostly on the ‘keep your frames explicit’ part. Ray gave a toy example of someone attempting to communicate something deeply emotional/intuitive, or perhaps a buddhist approach to the world, and how difficult it is to do this with simple explicit language. It often instead requires the other person to go off and seek certain experiences, or practise inhabiting those experiences (e.g. doing a little meditation, or getting in touch with their emotion of anger).
Ray’s motivation was that people often have these very different frames or approaches, but don’t recognise this fact, and end up believing aggressive things about the other person e.g. “I guess they’re just dumb” or “I guess they just don’t care about other people”.
I asked for examples that were motivating his belief—where it would be much better if the disagreers took to hear the recommendation to make their frames explicit. He came up with two concrete examples:
Jim v Ray on norms for shortform, where during one hour they worked through the same reasons-for-disagreement three times.
[blank] v Ruby on how much effort required to send non-threatening signals during disagreements, where it felt like a fundamental value disagreement that they didn’t know how to bridge.
I didn’t get a strong sense for what Ray was pointing at. I see the ways that the above disagreements went wrong, where people were perhaps talking past each other / on the wrong level of the debate, and should’ve done something different. My understanding of Ray’s advice is for the two disagreers to bring their fundamental value disagreements to the explicit level, and that both disagreers should be responsible for making their core value judgements explicit. I think this is too much of a burden to give people. Most of the reasons for my beliefs are heavily implicit and I cannot make things fully explicit ahead of time. In fact, this just seems not how humans work.
One of the key insights that Kahneman’s System 1 and System 2 distinction makes is that my conscious, deliberative thinking (System 2) is a very small fraction of the work my brain is doing, even though it is the part I have the most direct access to. Most of my world-model and decision-making apparatus is in my System 1. There is an important sense in which asking me to make all of my reasoning accessible to my conscious, deliberative system is an AGI-complete request.
What in fact seems sensible to me is that during a conversation I will have a fast feedback-loop with my interlocutor, which will give me a lot of evidence about which part of my thinking to zoom in on and do the costly work of making conscious and explicit. There is great skill involved in doing this live in conversation effectively and repeatedly, and I am excited to read a LW post giving some advice like this.
That said, I also think that many people have good reasons to distrust bringing their disagreements to the explicit level, and rightfully expect it to destroy ability to communicate. I’m thinking of Scott’s epistemic learned helplessness here, but I’m also thinking about experiences where trying to crystalise and name a thought I’m having before I know how to fully express it has a negative effect on my ability to think clearly about it. I’m not sure what this is but this is another time when I feel hesitant to make everything explicit.
As a third thing, my implicit brain is better than my explicit reasoning at modelling social/political dynamics. Let me handwave at a story of a nerd attempting to negotiate with a socially-savvy bully/psychopath/person-who-just-has-different-goals, where the nerd tries to repeatedly and helpfully make all of their thinking explicit, and is confused why they’re losing at the negotiation. I think even healthy and normal people have patterns around disagreement and conflict resolution that could take advantage of a socially inept individually trying to only rely on the things they can make explicit.
These three reasons lead me to not want to advise people to ‘keep their frames explicit’: it seems prohibitively computationally costly to do it for all things, many people should not trust their explicit reasoning to capture their implicit reasons, and that this is especially true for social/political reasoning.
My general impression of this advice is that it seems to want to make everything explicit all of the time (a) as though that were a primitive operation that can solve all problems and (b) I have a sense that it takes up too much of my working memory when I talk with Ray. I have some sense that this approach implies a severe lack of trust in people’s implicit/unconscious reasoning and only believes explicit/conscious reasoning can ever be relied upon, though that seems a bit of a simplistic narrative. (Of course there are indeed reasons to strongly trust conscious reasoning over unconscious—one cannot unconsciously build rockets that fly to the moon—but I think humans do not have the choice to not build a high-trust relationship with their unconscious mind.)
I find “keep everything explicit” to often be a power move designed to make non-explicit facts irrelevant and non-admissible. This often goes along with burden of proof. I make a claim (real example of this dynamic happening, at an unconference under Chatham house rules: That pulling people away from their existing community has real costs that hurt those communities), and I was told that, well, that seems possible, but I can point to concrete benefits of taking them away, so you need to be concrete and explicit about what those costs are, or I don’t think we should consider them.
Thus, the burden of proof was put upon me, to show (1) that people central to communities were being taken away, (2) that those people being taken away hurt those communities, (3) in particular measurable ways, (4) that then would impact direct EA causes. And then we would take the magnitude of effect I could prove using only established facts and tangible reasoning, and multiply them together, to see how big this effect was.
I cooperated with this because I felt like the current estimate of this cost for this person was zero, and I could easily raise that, and that was better than nothing, but this simply is not going to get this person to understand my actual model, ever, at all, or properly update. This person is listening on one level, and that’s much better than nothing, but they’re not really listening curiously, or trying to figure the world out. They are holding court to see if they are blameworthy for not being forced off of their position, and doing their duty as someone who presents as listening to arguments, of allowing someone who disagrees with them to make their case under the official rules of utilitarian evidence.
Which, again, is way better than nothing! But is not the thing we’re looking for, at all.
I’ve felt this way in conversations with Ray recently, as well. Where he’s willing and eager to listen to explicit stuff, but if I want to change his mind, then (de facto) I need to do it with explicit statements backed by admissible evidence in this court. Ray’s version is better, because there ways I can at least try to point to some forms of intuition or implicit stuff, and see if it resonates, whereas in the above example, I couldn’t, but it’s still super rough going.
Another problem is that if you have Things One Cannot Explicitly Say Or Consider, but which one believes are important, which I think basically everyone importantly does these days, then being told to only make explicit claims makes it impossible to make many important claims. You can’t both follow ‘ignore unfortunate correlations and awkward facts that exist’ and ‘reach proper Bayesian conclusions.’ The solution of ‘let the considerations be implicit’ isn’t great, but it can often get the job done if allowed to.
My private conversations with Ben have been doing a very good job, especially recently, of doing the dig-around-for-implicit-things and make-explicit-the-exact-thing-that-needs-it jobs.
Given Ray is writing a whole sequence, I’m inclined to wait until that goes up fully before responding in long form, but there seems to be something crucial missing from the explicitness approach.
To complement that: Requiring my interlocutor to make everything explicit is also a defence against having my mind changed in ways I don’t endorse but that I can’t quite pick apart right now. Which kinda overlaps with your example, I think.
I sometimes will feel like my low-level associations are changing in a way I’m not sure I endorse, halt, and ask for something that the more explicit part of me reflectively endorses. If they’re able to provide that, then I will willingly continue making the low-level updates, but if they can’t then there’s a bit of an impasse, at which point I will just start trying to communicate emotionally what feels off about it (e.g. in your example I could imagine saying “I feel some panic in my shoulders and a sense that you’re trying to control my decisions”). Actually, sometimes I will just give the emotional info first. There’s a lot of contextual details that lead me to figure out which one I do.
One last bit is to keep in mind that most (or, many things), can be power moves.
There’s one failure mode, where a person sort of gives you the creeps, and you try to bring this up and people say “well, did they do anything explicitly wrong?” and you’re like “no, I guess?” and then it turns out you were picking up something important about the person-giving-you-the-creeps and it would have been good if people had paid some attention to your intuition.
There’s a different failure mode where “so and so gives me the creeps” is something you can say willy-nilly without ever having to back it up, and it ends up being it’s own power move.
I do think during politically charged conversations it’s good to be able to notice and draw attention to the power-move-ness of various frames (in both/all directions)
(i.e. in the “so and so gives me the creeps” situation, it’s good to note both that you can abuse “only admit explicit evidence” and “wanton claims of creepiness” in different ways. And then, having made the frame of power-move-ness explicit, talk about ways to potentially alleviate both forms of abuse)
Want to clarify here, “explicit frames” and “explicit claims” are quite different, and it sounds here like you’re mostly talking about the latter.
The point of “explicit frames” is specifically to enable this sort of conversation – most people don’t even notice that they’re limiting the conversation to explicit claims, or where they’re assuming burden of proof lies, or whether we’re having a model-building sharing of ideas or a negotiation.
Also worth noting (which I hadn’t really stated, but is perhaps important enough to deserve a whole post to avoid accidental motte/bailey by myself or others down the road): My claim is that you should know what your frames are, and what would change* your mind. *Not* that you should always tell that to other people.
Ontological/Framework/Aesthetic Doublecrux is a thing you do with people you trust about deep, important disagreements where you think the right call is to open up your soul a bit (because you expect them to be symmetrically opening their soul, or that it’s otherwise worth it), not something you necessarily do with every person you disagree with (especially when you suspect their underlying framework is more like a negotiation or threat than honest, mutual model-sharing)
*also, not saying you should ask “what would change my mind” as soon as you bump into someone who disagrees with you. Reflexively doing that also opens yourself up to power moves, intentional or otherwise. Just that I expect it to be useful on the margin.
Interesting. It seemed in the above exchanges like both Ben and you were acting as if this was a request to make your frames explicit to the other person, rather than a request to know what the frame was yourself and then tell if it seemed like a good idea.
I think for now I still endorse that making my frame fully explicit even to myself is not a reasonable ask slash is effectively a request to simplify my frame in likely to be unhelpful ways. But it’s a lot more plausible as a hypothesis.
I’ve mostly been operating (lately) within the paradigm of “there does in fact seem to be enough trust for a doublecrux, and it seems like doublecrux is actually the right move given the state of the conversation. Within that situation, making things as explicit as possible seems good to me.” (But, this seems importantly only true within that situation)
But it also seemed like both Ben (and you) were hearing me make a more aggressive ask than I meant to be making (which implies some kind of mistake on my part, but I’m not sure which one). The things I meant to be taking as a given are:
1) Everyone has all kinds of implicit stuff going on that’s difficult to articulate. The naively Straw Vulcan failure mode is to assume that if you can’t articulate it it’s not real.
2) I think there are skills to figuring out how to make implicit stuff explicit, in a careful way that doesn’t steamroll your implicit internals.
3) Resolving serious disagreements requires figuring out how to bridge the gap of implicit knowledge. (I agree that in a single-pair doublecrux, doing the sort of thing you mention in the other comment can work fine, where you try to paint a picture and ask them questions to see if they got the picture. But, if you want more than one person to be able to understand the thing you’ll eventually probably want to figure out how to make it explicit without simplifying it so hard that it loses its meaning)
4) The additional, not-quite-stated claim is “I nowadays seem to keep finding myself in situations where there’s enough longstanding serious disagreements that are worth resolving that it’s worth Stag Hunting on Learning to Make Beliefs Cruxy and Frames Explicit, to facilitate those conversations.”
I think maybe the phrase “*keep* your beliefs cruxy and frames explicit” implied more of an action of “only permit some things” rather than “learn to find extra explicitness on the margin when possible.”
As far as explicit claims go: My current belief is something like:
If you actually want to communicate an implicit idea to someone else, you either need
1) to figure out how to make the implicit explicit, or
2) you need to figure out the skill of communicating implicit things implicitly… which I think actually can be done. But I don’t know how to do it and it seems hella hard. (Circling seems to work via imparting some classes of implicit things implicitly, but depends on being in-person)
My point is not at all to limit oneself to explicit things, but to learn how to make implicit things explicit (or, otherwise communicable). This is important because the default state often seems to be failing to communicate at all.
(But it does seem like an important, related point that trying to push for this ends up very similar sounding, from the outside, like ‘only explicit evidence is admissable’, which is a fair thing to have a instinctive resistance to)
But, the fact that this is real hard is because the underlying communication is real hard. And I think there’s some kind of grieving necessary to accept the fact that “man, why can’t they just understand my implicit things that seem real obvious to me?” and, I dunno, they just can’t. :/
Agreed it’s a learned skill and it’s hard. I think it’s also just necessary. I notice that the best conversations I have about difficult to describe things definitely don’t involve making everything explicit, and they involve a lot of ‘do you understand what I’m saying?’ and ‘tell me if this resonates’ and ‘I’m thinking out loud, but maybe’.
And then I have insights that I find helpful, and I can’t figure out how to write them up, because they’d need to be explicit, and they aren’t, so damn. Or even, I try to have a conversation with someone else (in some recent cases, you) and share these types of things, and it feels like I have zero idea how to get into a frame where any of it will make any sense or carry any weight, even when the other person is willing to listen by even what would normally be strong standards.
Sometimes this turns into a post or sequence that ends up explaining some of the thing? I dunno.
FWIW, upcoming posts I have in the queue are:
Noticing Frame Differences
Tacit and Explicit Knowledge
Backpropagating Facts into Aesthetics
Keeping Frames Explicit
(Possibly, in light of this conversation, adding a post called something like “Be secretly explicit [on the margin]”)
I’d been working on a sequence explaining this all in more detail (I think there’s a lot of moving parts and inferential distance to cover here). I’ll mostly respond in the form of “finish that sequence.”
But here’s a quick paragraph that more fully expands what I actually believe:
If you’re building a product with someone (metaphorical product or literal product), and you find yourself disagreeing, and you explain “This is important because X, which implies Y”, and they say “What!? But, A, therefore B!” and then you both keep repeating those points over and over… you’re going to waste a lot of time, and possibly build a confused frankenstein product that’s less effective than if you could figure out how to successfully communicate.
In that situation, I claim you should be doing something different, if you want to build a product that’s actually good.
If you’re not building a product, this is less obviously important. If you’re just arguing for fun, I dunno, keep at it I guess.
A separate, further claim is that the reason you’re miscommunicating is because you have a bunch of hidden assumptions in your belief-network, or the frames that underly your belief network. I think you will continue to disagree and waste effort until you figure out how to make those hidden assumptions explicit.
You don’t have to rush that process. Take your time to mull over your beliefs, do focusing or whatever helps you tease out the hidden assumptions without accidentally crystallizing them wrong.
Meanwhile, you can reference the fact that the differing assumptions exist by giving them placeholder names like “the sparkly pink purple ball thing”.
This isn’t an “obligation” I think people should have. But I think it’s a law-of-the-universe that if you don’t do this, your group will waste time and/or your product will be worse.
(Lots of companies successfully build products without dealing with this, so I’m not at all claiming you’ll fail. And meanwhile there’s lots of other tradeoffs your company might be making that are bad and should be improved, and I’m not confident this is the most important thing to be working on)
But among rationalists, who are trying to improve their rationality while building products together, I think resolving this issue should be a high priority, which will pay for itself pretty quickly.
Thirdly: I claim there is a skill to building up a model of your beliefs, and your cruxes for those beliefs, and the frames that underly your beliefs… such that you can make normally implicit things explicit in advance. (Or, at least, every time you disagree with someone about one of your beliefs, you automatically flag what the crux for the belief was, and then keep track of it for future reference). So, by the time you get to a heated disagreement, you already have some sense of what sort of things would change your mind, and why you formed the beliefs you did.
You don’t have to share this with others, esp. if they seem to be adversarial. But understanding it for yourself can still help you make sense of the conversation.
Relatedly, there’s a skill to detecting when other people are in a different frame from you, and helping them to articulate their frame.
Literal companies building literal products can alleviate this problem by only hiring people with similar frames and beliefs, so they have an easier time communicating. But, it’s
This seems important because weird, intractable conversations have shown up repeatedly...
in the EA ecosystem
(where even though people are mostly building different products, there is a shared commons that is something of a “collectively built product” that everyone has a stake in, and where billions of dollars and billions of dollars worth of reputation are at stake)
on LessWrong the website
(where everyone has a stake in a shared product of “how we have conversations together” and what truthseeking means)
on the LessWrong development team
where we are literally building a product (a website), and often have persistent, intractable disagreements about UI, minimalism, how shortform should work, is Vulcan a terribly shitshow of a framework that should be scrapped, etc.
every time you disagree with someone about one of your beliefs, you [can] automatically flag what the crux for the belief was
This is the bit that is computationally intractable.
Looking for cruxes is a healthy move, exposing the moving parts of your beliefs in a way that can lead to you learning important new info.
However, there are an incredible number of cruxes for any given belief. If I think that a hypothetical project should accelerate it’s development time 2x in the coming month, I could change my mind if I learn some important fact about the long-term improvements of spending the month refactoring the entire codebase; I could change my mind if I learn that the current time we spend on things is required for models of the code to propagate and become common knowledge in the staff; I could change my mind if my models of geopolitical events suggest that our industry is going to tank next week and we should get out immediately.
I’m not claiming you can literally do this all the time. [Ah, an earlier draft of the previous comment emphasized this this was all “things worth pushing for on the margin”, and explicitly not something you were supposed to sacrifice all other priorities for. I think I then rewrote the post and forgot to emphasize that clarification]
I’ll try to write up better instructions/explanations later, but to give a rough idea of the amount of work I’m talking about. I’m saying “spend a bit more time than you normally do in ‘doublecrux mode’”. [This can be, like, an extra half hour sometimes when having a particular difficult conversation].
When someone seems obviously wrong, or you seem obviously right, ask yourself “what are cruxes are most loadbearing”, and then:
Be mindful as you do it, to notice what mental motions you’re actually performing that help. Basically, do Tuning Your Cognitive Strategies to the double crux process, to improve your feedback loop.
When you’re done, cache the results. Maybe by writing it down, or maybe just sort of thinking harder about it so you remember it a better.
The point is not to have fully mapped out cruxes of all your beliefs. The point is that you generally have practiced the skill of noticing what the most important cruxes are, so that a) you can do it easily, and b) you keep the results computed for later.
Live a life worth leaving Facebook for.
Trying to think about building some content organisations and filtering systems on LessWrong. I’m new to a bunch of the things I discuss below, so I’m interested in other people’s models of these subjects, or links to sites that solve the problems in different ways.
So, one problem you might try to solve is that people want to see all of a thing on a site. You might want to see all the posts on reductionism on LessWrong, or all the practical how-to guides (e.g. how to beat procrastination, Alignment Research Field Guide, etc), or all the literature reviews on LessWrong. And so you want people to help build those pages. You might also want to see all the posts corresponding to a certain concept, so that you can find out what that concept refers to (e.g. what is the term “goodhart’s law” or “slack” or “mesa-optimisers” etc).
Another problem you might try to solve, is that while many users are interested in lots of the content on the site, they have varying levels of interest in the different topics. Some people are mostly interested in the posts on big picture historical narratives, and less so on models of one’s own mind that help with dealing with emotions and trauma. Some people are very interested AI alignment, some are interested in only the best such posts, and some are interested in none.
I think the first problem is supposed to be solved by Wikis, and the second problem is supposed to be solved by Tagging.
Speaking generally, Wikis allow dedicated users to curated pages around certain types of content, highlighting the best examples, some side examples, writing some context for people arriving on the page to understand what the page is about. It’s a canonical, update-able, highly editable page built around one idea.
Tagging is much more about filtering than about curating.
Let me describe some different styles of tagging.
One the site lobste.rs there are about 100 tags in total. Most tags give a very broad description of an area of interest such as “haskell” “databases” and “compilers”. These are shown next to posts on the frontpage. Most posts have 1-3 tags. This allows easy filtering by interest.
A site I’ve just been introduced to, and been fairly impressed by the tagging of, is called ‘Gelbooru’, an anime/porn image website where many images have over 100 tags, accurately describing everything contained in the image (e.g. “blue sky”, “leaf”, “person standing”, etc). That is a site where the purpose is to search-by-tags. A key element that allows Gelbooru to function is that, while I think it probably has limited dispute mechanisms for resolving whether a tag is appropriate, that’s fine because all tags are literal descriptions of objects in the image. There are no tags describing e.g. the emotions of people in the images, which would be much less easy to build common knowledge around. I do not really know how the site causes people to tag 100,000s of photos each with such scintillating tags as “arm rest”, “monochrome” and “chair”, but it seems to work quite well.
The first site uses tags as filters when looking at a single feed. As long as there is a manageable number of tags it’s easy for an author to tag things appropriately, or for readers to helpfully tag things correctly. The second site uses tagging as primary method of finding content on the site—the homepage of the site is a search bar for tags.
In the former style, tags are about filtering for fairly different kinds of content. You might wonder why one should have tags rather than just subreddits, which also filter posts by interest quite well. A key distinction is that subreddits are typically non-overlapping, whereas tags overlap often. In general, a single post can have multiple tags, but a post belongs to a single subreddit. I currently think of tags as different lenses with which to view a single subreddit, and only when your interests are sufficiently non-overlapping with the current subreddit should you go through the effort to build a new subreddit. (With its own tags.)
There are some other (key) questions of how to build incentives for users to tag things correctly, and how to solve disputes over whether a tag is correct for a post. If, as lobste.rs above, LW should have a tagging system that only has ~100 tags, and is not attempting to solve disputes on a much larger scale like Wikipedia does, then I think applying a fairly straightforward voting system might suffice. This would look like:
When a post is tagged with “AI alignment”, users can vote on the tag (with the same weight that they vote on a post), to indicate whether it’s a fit for that tag. (This means tag-post objects have their own karma.)
Whoever added the tag to that post gets the karma that the tag-post object gets. (Perhaps a smaller reward proportional to this karma score, if it seems too powerful, but definitely still positive.)
New tags cannot be created by most users. New tags are added by the moderation team, though users can submit new tags to the mod team.
If so, when we end up building a tagging system on LessWrong, the goal should be to distinguish the main types of post people are interested in viewing, and create a limited number of tags that determine this. I think that building that would mainly help users who are viewing new content on the frontpage, and that for much more granular sorting of historical content, a wiki would be better placed.
Afterthought on conceptual boundary
The conceptual boundary is something like the following: A tag is literally just a list of posts, where you can just determine whether something is in that list or not. A Wiki is an editable text-field, curate-able with much more depth than a simple list. A Tag is a communal list object, a Wiki Page is a communal text-body object.
I spent an hour or two talking about these problems with Ruby. Here are two further thoughts. I will reiterate that I have little experience with wikis and tagging, so I am likely making some simple errors.
Connecting Tagging and Wikis
One problem to solve is that if a topic is being discussed, users want to go from a page discussing that topic to find a page that explains that topic, and lists all posts that discuss that topic. This page should be easily update-able with new content on the topic.
Some more specific stories:
A user reads a post on a topic, and wants to better understand what’s already known about that topic and the basic ideas
A user is primarily interested in a topic, and wants to make sure to see all content about that topic
The solution for the first is to link to a page that contains all other posts on that topic. The solution to the second is to link to a wiki page on that topic. And one possible solution is to make both of those the same button.
This page is a combination of a Wiki and a Tag. It is a communally editable explanation of the concept, with links to key posts explaining it, and other pages that are related. And below that, it also has a post-list of every posts that is relevant, sortable by things like recency, karma, and relevancy. Maybe below that it even has its own Recent Discussion section, for comments on posts that have the tag. It’s a page you can subscribe to (e.g. via RSS), and come back to to see discussion of a particular topic.
Now, to make this work, it’s necessary that all posts that are in the category are successfully listed in the tag. One problem you will run into is that there are a lot of concepts in the space, so the number of such pages will quickly become unmanageable. “Inner Alignment”, “Slack”, “Game Theory”, “Akrasia”, “Introspection”, “Corrigibility”, etc, is a very large list, such that it is not reasonable to scroll through it and check if your post fits into any of them, and expect to do this successfully. You’ll end up with a lot of Wiki pages with very incomplete lists.
This is especially bad, because the other use of the tag system you might be hoping for is the one described in the parent to this comment, where you can see the most relevant tags directly from the frontpage, to help with figuring out what you want to read. If you want to make sure to read all the AI alignment posts, it’s not helpful to give you a tag that sometimes works, because then you still have to check all the other posts anyway.
However, there are three ways to patch this over. Firstly, the thing that will help the Wiki system the most here, is the ability to add posts to the Wiki page from the post page, instead of having to independently visit the Wiki page and then add it in. This helps the people who care about maintaining Wiki pages quite a bit, making their job much easier.
Secondly, you can help organise those tags in order of likely relevance. For example, if you link to a lot of posts that have the tag “AI alignment” then you probably are about AI alignment, so that tag should appear higher.
Thirdly, you can sort tags into two types. The first type is given priority, and is a very controlled set of concepts, that also get used for filtering on the frontpage. This is a small, stable set of tags that people learn and can easily confirm if you should be sorted by. The second is the much larger, user-generated set of tags that correspond to user-generated wiki pages, and there can be 100s of these.
In this world, wiki pages are split into two types: those that are tags and those that aren’t. Those which are tags have a big post-list item that is searchable, maybe even a recent discussion section, and can be used to tag posts. Those that are not tags do not have these features and properties.
This idea seems fairly promising to me, and I don’t see any problems with it yet. For the below, I’ll call such a page a ‘WikiTag’.
Speaking more generally, my main worry about a lot of systems like Wikis and Tagging is about something that is especially prevalent in science and in the sort of work we do on LessWrong, where we try to figure out better conceptual boundaries to draw in reality, and whereby old concepts get deprecated. I expect that on sites like lobste.rs and Gelbooru, tags rarely turn out to have been the wrong way to frame things. There are rarely arguments about whether something is really a blue sky, or just the absence of clouds. Whereas a lot of progress in science is this sort of subtle conceptual progress, where you maybe shouldn’t have said that the object fell to the ground, but instead that the object and the Earth fell into each other at rates proportional to some function of their masses.
On LessWrong I think we’ve done a lot of this sort of thing.
We used to talk about optimisation daemons, now we talk about the inner alignment problem.
We used to talk about people being stupid and the world being mad, and now we talk about coordination problems.
We used to talk about agent foundations and now we maybe think embedded agency is a better conceptualisation of the problem.
In places like the in-person CFAR space I’ve heard talk of akrasia often deprecated and instead ideas like ‘internal alignment’ are discussed.
We made progress from TDT to UDT.
So I’m generally worried about setting up infrastructure that makes concepts get stuck in place, by e.g. whoever picked the name first.
One problem I was worried about, was that all post would have to be categorised according to the old names. In particular, post that have already been tagged ‘optimisation daemons’ would now have a hard time changing to being tagged ‘inner alignment problem’.
However, after fleshing it out, I’m not so sure it’s going to be a problem.
Firstly, it’s not clear that old posts should have their tags updated. If there is a sequence of posts taking about akrasia and how to deal with it, it would be very confusing for those posts to have a tag for ‘internal alignment’, a term not mentioned anywhere in the post nor obviously related to the framing of the posts. Similarly for ‘optimisation daemons’ discussion to be called ‘the inner alignment problem’.
Secondly, there’s a fairly natural thing to do when such conceptual shifts in the conversation occur. You build a new WikiTag. Then you tag all the new posts, and write the wiki entry explaining the concept, and link back to the old concept. It just needs to say something like “Old work was done under the idea that objects fell down to the ground. We now think that the object and the Earth fall into each other, but you can see the old work and its experimental results on this page <link>. Plus here are some links to the key posts back then that you’ll still want to know about today.” And indeed if such a thing happens with agent foundations and embedded agency, or something, then it’ll be necessary to have posts explaining how the old work fits into the current paradigm. That translational work is not done by renaming a tag, but by a person who understands that domain writing some posts explaining how to think about and use the old work, in the new conceptual framework. And those should be prominently linked to on the wiki/tag page.
So I think that this system does not have the problems I thought that it had.
I guess I’m still fairly worried about subtle errors, like if instead of a tag for ‘Forecasting’ we have a tag called ‘Calibration’ or ‘Predictions’, these would shift the discourse in different ways. I’m a bit worried about that. But I think it’s likely that a small community like ours will overall be able to resist such small shifts, and that argument will prevail, even if the names are a little off sometimes. It sounds like a problem that makes progress a little slower but doesn’t push it off the rails. And if the tag is sufficiently wrong then I expect we can do the process above, where we start a new tag and link back to the old tag. Or, if the conceptual shift is sufficiently small (e.g. ‘Forecasting’ → ‘Predictions’) I can imagine renaming the tag directly.
So I’m no longer so worried about conceptual stickiness as a fundamental blocker to Wikis and Tagging as ways of organising the conceptual space.
As a general comment, StackExchange’s tagging system seems pretty perfect (and battle-tested) to me, and I suspect we should just copy their design as closely as we can.
So, on StackExchange any user can edit any of the tags, and then there is a whole complicated hierarchy that exists for how to revert changes, how to approve changes, how to lock posts from being edited, etc.
Which is a solution, but it sure doesn’t seem like an easy or elegant solution to the tagging problem.
I think the peer review queue is pretty sensible in any world where there’s “one ground truth” that you expect trusted users to have access to (such that they can approve / deny edits that cross their desk).
and link back to the old concept.
It’s also important to have the old concept link to the new concept.
I’m currently working through my own thoughts and vision for tagging.
If and when we end up building a tagging system on LessWrong, the goal will be to distinguish the main types of post people are interested in viewing, and create a limited number of tags that determine this. I think building this will mainly help users who are viewing new content on the frontpage, and that for much more granular sorting of historical content, a wiki is better placed.
I’m pretty sure I disagree with this and object to you making an assertion that makes it sound like the team is definitely decided about what the goal of tagging system will be.
I’ll write a proper response tomorrow.
Hm, I think writing this and posting it at 11:35 lead to me phrasing a few things quite unclearly (and several of those sentences don’t even make sense grammatically). Let me patch with some edits right now, maybe more tomorrow.
On the particular thing you mention, never mind the whole team, I myself am pretty unsure that the above is right. The thing I meant to write there was something like “If the above is right, then when we end up building a tagging system on LessWrong, the goal should be” etc. I’m not clear on whether the above is right. I just wanted to write the idea down clearly so it could be discussed and have counterarguments/counterevidence brought up.
That clarifies it and makes a lot of sense. Seems my objection rested upon a misunderstanding of your true intention. In short, no worries.
I look forwards to figuring this out together.
I block all the big social networks from my phone and laptop, except for 2 hours on Saturday, and I noticed that when I check Facebook on Saturday, the notifications are always boring and not something I care about. Then I scroll through the newsfeed for a bit and it quickly becomes all boring too.
And I was surprised. Could it be that, all the hype and narrative aside, I actually just wasn’t interested in what was happening on Facebook? That I could remove it from my life and just not really be missing anything?
On my walk home from work today I realised that this wasn’t the case. Facebook has interesting posts I want to follow, but they’re not in my notifications. They’re sparsely distributed in my newsfeed, such that they appear a few times per week, randomly. I can get a lot of value from Facebook, but not by checking once per week—only by checking it all the time. That’s how the game is played.
Anyway, I am not trading all of my attention away for such small amounts of value. So it remains blocked.
I’ve found Facebook absolutely terrible as a way to both distribute and consume good content. Everything you want to share or see is just floating in the opaque vortex of the f%$&ing newsfeed algorithm. I keep Facebook around for party invites and to see who my friends are in each city I travel too, I disabled notifications and check the timeline for less than 20 minutes each week.
OTOH, I’m a big fan of Twitter. (@yashkaf) I’ve curated my feed to a perfect mix of insightful commentary, funny jokes, and weird animal photos. I get to have conversations with people I admire, like writers and scientists. Going forward I’ll probably keep tweeting, and anything that’s a fit for LW I’ll also cross-post here.
This thread is the most bizarrely compelling argument that twitter may be better than FB
In my experience this problem is easily solved if you simply unfollow ~95% of your friends. You can mass unfollow people relatively easily from the News Feed Preferences page in Settings. Ever since doing this a few years ago, my Facebook timeline has had an extremely high signal-to-noise ratio—I’m quite glad to encounter something like 85% of posts. Also, since this 5% only produces ~5-20 minutes of reading/day, it’s easy to avoid spending lots of time on the site.
I did actually unfollow ~95% of my friends once but then found myself in that situation where suddenly Facebook became interesting again I was checking it more often. I recommend the opposite and follow as many friends from high school and work as possible (assuming you don’t work at a cool place).
Either way I’ll still only check it in a 2 hour window on Saturdays, so I feel safe trying it out.
Huh, 95% is quite extreme. But I realise this probably also solves the problem whereby if the people I’m interested in comment on *someone else’s* wall, I still get to see it. I’ll try this out next week, thx.
(I don’t get to be confident I’ve seen 100% of all the interesting people’s good content though, the news feed is fickle and not exhaustive.)
Not certain, but I think when your news feed becomes sparse enough it might actually become exhaustive.
My impression is that sparse newsfeeds tend to start doing things you don’t want.
While I basically endorse blocking FB (pssst, hey everyone still saying insightful things on Facebook, come on over to LessLong.com!), but fwiw, if you want to keep tabs on things there, I think most reliably way is to make a friends-list of the people who seem especially high signal-to-noise-ratio, and then create a bookmark for specifically following that list.
Yeah, it’s what I do with Twitter, and I’ll probably start this with FB. Won’t show me all their interesting convo on other people’s walls though. On a Twitter I can see all their replies, not on FB.
Reading this post, where the author introspects and finds a strong desire to be able to tell a good story about their career, suggests that a way of understanding how people will make decisions will be heavily constrained by the sorts of stories about your career that are definitely common knowledge.
I remember at the end of my degree, there was a ceremony where all the students dressed in silly gowns and the parents came and sat in a circular hall while we got given our degrees and several older people told stories about how your children have become men and women, after studying and learning so much at the university.
This was a dumb/false story, because I’m quite confident the university did not teach these people most important skills for being an adult, and certainly my own development was largely directed by the projects I did on my own dime, not through much of anything the university taught.
But everyone was sat in a circle, where they could see each other listen to the speech in silence, as though it were (a) important and (b) true. And it served as a coordination mechanism, saying “If you go into the world and tell people that your child came to university and grew into an adult, then people will react appropriately and treat your child with respect and not look at them weird asking why spending 3 or 4 years passing exams with no bearing on the rest of their lives is considered worthy of respect.” It lets those people tell a narrative, which in turn makes it seem okay for other people to send their kids to the university, and for the kids themselves to feel like they’ve matured.
Needless to say, I felt quite missed by this narrative, and only played along so my mother could have a nice day out. I remember doing a silly thing—I noticed I had a spot on my face, and instead of removing it that morning, I left it there just as a self-signal that I didn’t respect the ceremony.
Anyway, I don’t really have any narrative for my life at the minute. I recall Paul Graham saying that he never answers the question “What do you do?” with a proper answer (he says he writes Lisp compilers and that usually shuts people up). Perhaps I will continue to avoid narratives. But I think a healthy society would be able to give me a true narrative that I felt comfortable following.
Another solution would be to build a small circle of trusted and supportive friends with whom we share a narrative about me that I endorse, and try to continue to not want to get social support from a wider circle than that.
Peter Thiel has the opinion that many of our stories are breaking down. I’m curious to hear others’ thoughts on what stories we tell ourselves, which ones are intact, and which are changing.
I remember the narrative breaking, really hard, in two particular occasions:
The twin towers attack.
The 2008 mortgage financial crisis.
I don’t think, particularly, that the narrative is broken now, but I think that it has lost some of its harmony (Trump having won the 2014 elections, I believe, is a symptom of that).
This is very close to what fellows like Thiel and Weinstein are talking about. In this particular sense, yes, I understand it’s crucial to maintain the narrative although I don’t know anymore whose job it’s—to keep it from breaking out entirely (for example, say, in a explosion of the American student debt, or China going awry with its USD holdings).
These stories are not part of any law of our universe, so they are bound to break at anytime. It takes only a few smart, uncaring individuals to tear at the fabric of reality until it breaks—that is not okay!
So that it’s why I believe is happening at the macro-narrative; but to be more directed towards the individual, which is what your post seems to hint at, I don’t think for a second that your life does not run from narrative, maybe that’s a narrative itself. I believe further that some rituals are important to keep and to have an individual story is important to be able to do any work we deem important.
(I’m not sure if you meant to reply to Benito’s shortform comment here, or one of Ben’s recent Thiel/Weinstein transcript posts)
It may be more apt for the fifth post in his sequence (Stories About Progress) but it’s not posted yet. But I think it sort-of works in both and it’s more of a shortform comment than anything!
At the SSC Meetup tonight in my house, I was in a group conversation. I asked a stranger if they’d read anything interesting on the new LessWrong in the last 6 months or so (I had not yet mentioned my involvement in the project). He told me about an interesting post about the variance in human intelligence compared to the variance in mice intelligence. I said it was nice to know people read the posts I write. The group then had a longer conversation about the question. It was enjoyable to hear strangers tell me about reading my posts.
I’ve finally moved into a period of my life where I can set guardrails around my slack without sacrificing the things I care about most. I currently am pushing it to the limit, doing work during work hours, and not doing work outside work hours. I’m eating very regularly, 9am, 2pm, 7pm. I’m going to sleep around 9-10, and getting up early. I have time to pick up my hobby of classical music.
At the same time, I’m also restricting the ability of my phone to steal my attention. All social media is blocked except for 2 hours on Saturday, which is going quite well. I’ve found Tristan Harris’s advice immensely useful—my phone is increasingly not something that I give all of my free attention to, but instead something I give deliberate attention and then stop using. Tasks, not scrolling.
Now I have weekends and mornings though, and I’m not sure what to do with myself. I am looking to get excited about something, instead of sitting, passively listening to a comedy podcast while playing a game on my phone. But I realise I don’t have easy alternative options—Netflix is really accessible. I suppose one of the things that a Sabbath is supposed to be is an alarm, showing that something is up, and at the minute I’ve not got enough things I want to do for leisure that don’t also feel a bit like work.
So I’m making lists of things I might like (cooking, reading, improv, etc) and I’ll try those.
So I’m making lists of things I might like (cooking, reading, improv, etc) and I’ll try those
This comment is a bit interesting in terms of it’s relation to this old comment of yours (about puzzlement over cooking being a source of slack)
I realize this comment isn’t about cooking-as-slack per se, but curious to hear more about your shift in experience there (since before it didn’t seem like cooking as a thing you did much at all)
Try practicing doing nothing I.e. meditation and see how that goes. When I have nothing particular to do my mind needs some time to make the switch from that mode where it tries to distract itself by coming up with new things it wants to do until finally it reaches a state where it is calm and steady. I consider that state the optimal one to be in since only then my thoughts are directed deliberately at neglected and important issues rather than exercising learned thought patterns.
I think you’re missing me with this. I’m not very distractable and I don’t need to learn to be okay with leisure time. I’m trying to actually have hobbies, and realising that is going to take work.
I could take up meditation as a hobby, but at the minute I want things that are more social and physical.
Why has nobody noticed that the OpenAI logo is three intertwined paperclips? This is an alarming update about who’s truly in charge...
I think of myself as pretty skilled and nuanced at introspection, and being able to make my implicit cognition explicit.
However, there is one fact about me that makes me doubt this severely, which is that I have never ever ever noticed any effect from taking caffeine.
I’ve never drunk coffee, though in the past two years my housemates have kept a lot of caffeine around in the form of energy drinks, and I drink them for the taste. I’ll drink them any time of the day (9pm is fine). At some point someone seemed shocked that I was about to drink one after 4pm, and I felt like I should feel bad or something, so I stopped. I’ve not been aware of any effects.
But two days ago, I finally noticed. I had to do some incredibly important drudge work, and I had two red bulls around 12-2pm. I finished work at 10pm. I realised that while I had not felt weird in any way, I had also not had any of the normal effects of hanging around for hours, which is getting tired, distracted, needing to walk around, wanting to do something different. I had a normal day for 10 hours solely doing crappy things I normally hate.
So I guess now I see the effect of caffeine: it’s not a positive effect, it just removes the normal negative effects of the day. (Which is awesome.)
Hot take: The actual resolution to the simulation argument is that most advanced civilizations don’t make loads of simulations.
Two things make this make sense:
Firstly, it only matters if they make unlawful simulations. If they make lawful simulations, then it doesn’t matter whether you’re in a simulation or a base reality, all of your decision theory and incentives are essentially the same, you want to take the same decisions in all of the universes. So you can make lots of lawful simulations, that’s fine.
Secondly, they will strategically choose to not make too many unlawful simulations (to the level where the things inside are actually conscious). This is because to do so would induce anthropic uncertainty over themselves. Like, if the decision-theoretical answer is to not induce anthropic uncertainty over yourself about whether you’re in a simulation, then by TDT everyone will choose not to make unlawful simulations.
I think this is probably wrong in lots of ways but I didn’t stop to figure them out.
Your first point sounds like it is saying we are probably in a simulation, but not the sort that should influence our decisions, because it is lawful. I think this is pretty much exactly what Bostrom’s Simulation Hypothesis is, so I think your first point is not an argument for the second disjunct of the simulation argument but rather for the third.
As for the second point, well, there are many ways for a simulation to be unlawful, and only some of them are undesirable—for example, a civilization might actually want to induce anthropic uncertainty in itself, if it is uncertainty about whether or not it is in a simulation that contains a pleasant afterlife for everyone who dies.
I don’t buy that it makes sense to induce anthropic uncertainty. It makes sense to spend all of your compute to run emulations that are having awesome lives, but it doesn’t make sense to cause yourself to believe false things.
I’m not sure it makes sense either, but I don’t think it is accurately described as “cause yourself to believe false things.” I think whether or not it makes sense comes down to decision theory. If you use evidential decision theory, it makes sense; if you use causal decision theory, it doesn’t. If you use functional decision theory, or updateless decision theory, I’m not sure, I’d have to think more about it. (My guess is that updateless decision theory would do it insofar as you care more about yourself than others, and functional decision theory wouldn’t do it even then.)
I just don’t think it’s a good decision to make, regardless of the math. If I’m nearing the end of the universe, I prefer to spend all my compute instead maximising fun / searching for a way out. Trying to run simulations to make it so I no longer know if I’m about to die seems like a dumb use of compute. I can bear the thought of dying dude, there’s better uses of that compute. You’re not saving yourself, you’re just intentionally making yourself confused because you’re uncomfortable with the thought of death.
Well, that wasn’t the scenario I had in mind. The scenario I had in mind was: People in the year 2030 pass a law requiring future governments to make ancestor simulations with happy afterlives, because that way it’s probable that they themselves will be in such a simulation. (It’s like cryonics, but cheaper!) Then, hundreds or billions of years later, the future government carries out the plan, as required by law.
Not saying this is what we should do, just saying it’s a decision I could sympathize with, and I imagine it’s a decision some fraction of people would make, if they thought it was an option.
Thinking more, I think there are good arguments for taking actions that as a by-product induce anthropic uncertainty; these are the standard hansonian situation where you build lots of ems of yourself to do bits of work then turn them off.
But I still don’t agree with the people in the situation you describe because they’re optimising over their own epistemic state, I think they’re morally wrong to do that. I’m totally fine with a law requiring future governments to rebuild you / an em of you and give you a nice life (perhaps as a trade for working harder today to ensure that the future world exists), but that’s conceptually analogous to extending your life, and doesn’t require causing you to believe false things. You know you’ll be turned off and then later a copy of you will be turned on, there’s no anthropic uncertainty, you’re just going to get lots of valuable stuff.
The relevant intuition to the second point there, is to imagine you somehow found out that there was only one ground truth base reality, only one real world, not a multiverse or a tegmark level 4 verse or whatever. And you’re a civilization that has successfully dealt with x-risks and unilateralist action and information vulnerabilities, to the point where you have the sort of unified control to make a top-down decision about whether to make massive numbers of civilizations. And you’re wondring whether to make a billion simulations.
And suddenly you’re faced with the prospect of building something that will make it so you no longer know whether you’re in the base universe. Someday gravity might get turned off because that’s what your overlords wanted. If you pull the trigger, you’ll never be sure that you weren’t actually one of the simulated ones, because there’s suddenly so many simulations.
And so you don’t pull the trigger, and you remain confident that you’re in the base universe.
This, plus some assumptions about all civilizations that have the capacity to do massive simulations also being wise enough to overcome x-risk and coordination problems so they can actually make a top-down decision here, plus some TDT magic whereby all such civilizations in the various multiverses and Tegmark levels can all coordinate in logical time to pick the same decision… leaves there being no unlawful simulations.
My crux here is that I don’t feel much uncertainty about whether or not our overlords will start interacting with us (they won’t and I really don’t expect that to change), and I’m trying to backchain from that to find reasons why it makes sense.
My basic argument is that all civilizations that have the capability to make simulations that aren’t true histories (but instead have lots of weird stuff happen in them) will all be philosophically sophisticated to collectively not do so, and so you can always expect to be in a true history and not have weird sh*t happen to you like in The Sims. The main counterargument here is to show that there are lots of civilizations that will exist with the powers to do this but lacking the wisdom to not do it. Two key examples that come to mind:
We build an AGI singleton that lacks important kinds of philosophical maturity, so makes lots of simulations that ruins the anthropic uncertainty for everyone else.
Civilizations at somewhere around our level get to a point where they can create massive numbers of simulations but haven’t managed to create existential risks like AGI. Even while you might think our civilization is pretty close to AGI, I could imagine alternative civilizations that aren’t, just like I could imagine alternative civilizations that are really close to making masses of ems but that aren’t close enough to AGI. This feels like a pretty empirical question about whether such civilizations are possible and whether they can have these kinds of resources without causing an existential catastrophe / building singleton AGI.
Why appeal to philosophical sophistication rather than lack of motivation? Humans given the power to make ancestor-simulations would create lots of interventionist sims (as is demonstrated by the populatity games like The Sims), but if the vast hypermajority of ancestor-simulations are run by unaligned AIs doing their analogue of history research, that could “drown out” the tiny minority of interventionist simulations.
That’s interesting. I don’t feel comfortable with that argument, it feels too much like random chance whether or not we should expect ourselves to be in an interventionist universe or not, whereas I feel like I should be able to find strong reasons to not be in an interventionist universe.
Alternatively, “lawful universe” has lower Kolmogorov complexity than “lawful universe plus simulator intervention” and thereore gets exponentially more measure under the universal prior?? (See also “Infinite universes and Corbinian otaku” and “The Finale of the Ultimate Meta Mega Crossover”.)
Now that’s fun. I need to figure out some more stuff about measure, I don’t quite get why some universes should be weighted more than others. But I think that sort of argument is probably a mistake—even if the lawful universes get more weighting for some reason, unless you also have reason to think that they don’t make simulations, there’s still loads of simulations within each of their lawful universes, setting the balance in favour of simulation again.
One big reason why it makes sense is that the simulation is designed for the purpose of accurately representing reality.
Another big reason why (a version of it) makes sense is that the simulation is designed for the purpose of inducing anthropic uncertainty in someone at some later time in the simulation. e.g. if the point of the simulation is to make our AGI worry that it is in a simulation, and manipulate it via probable environment hacking, then the simulation will be accurate and lawful (i.e. un-tampered-with) until AGI is created.
I think “polluting the lake” by increasing the general likelihood of you (and anyone else) being in a simulation is indeed something that some agents might not want to do, but (a) it’s a collective action problem, and (b) plenty of agents won’t mind it that much, and (c) there are good reasons to do it even if it has costs. I admit I am a bit confused about this though, so thank you for bringing it up, I will think about it more in the coming months.
Ugh, anthropic warfare, feels so ugly and scary. I hope we never face that sh*t.
I think in many environments I’m in, especially with young people, the fact that Paul Graham is retired with kids sounds nice, but there’s an implicit acknowledgement that “He could’ve chosen to not have kids and instead do more good in the world, and it’s sad that he didn’t do that”. And it reassures me to know that Paul Graham wouldn’t reluctantly agree. He’d just think it was wrong.
But, like, he is wrong? I mean, in the sense that I expect a post-CEV Paul Graham to regret his choices. The fact that he does not believe so does the opposite of reassuring me, so I am confused about this.
I think part of the problem here is underspecification of CEV.
Let’s say Bob has never been kind to anyone unless its’ in his own self interest. He has noticed that being selfless is sort of an addictive thing for people, and that once they start doing it they start raving about how good it feels, but he doesn’t see any value in it right now. So he resolves to never be selfless, in order to never get hooked.
There are two ways for CEV to go in this instance, one way is to never allow bob to make a change that his old self wouldn’t endorse. Another way would be to look at all the potential changes he could make, posit a version of him that has had ALL the experiences and is able to reflect on them, then say “Yeah dude, you’re gonna really endorse this kindness thing once you try it.”
I think the second scenario is probably true for many other experiences than kindness, possibly including having children, enlightenment, etc. From our current vantage point it feels like having children would CHANGE our values, but another interpretation is that we always valued having children, we just never had the qualia of having children so we don’t understand how much we would value that particular experience.
What reasoning do you have in mind when you say you think he’ll regret his choices?
Sometimes I get confused between r/ssc and r/css.
Something I’ve thought about the existence of for years, but imagined was impossible: this 70s song by Italian Adriano Celentano. It fully registers to my mind as English. But it isn’t. It’s like skimming the output of GPT-2.
A thing you can google is “doubletalk”. The blog ‘Language Log’ has a few posts on it.
I’ve been thinking lately that picturing an AI catastrophe is helped a great deal by visualising a world where critical systems in society are performed by software. I was spending a while trying to summarise and analyse Paul’s “What Failure Looks Like”, which lead me this way. I think that properly imagining such a world is immediately scary, because software can deal with edge cases badly, like automated market traders causing major crashes, so that’s already a big deal. Then you add ML in, and can talk about how crazy it is to hand critical systems over to code we do not understand and cannot make simple adjustments to, then you’re already hitting catastrophes. Once you then argue that ML can become superintelligent then everything goes from “global catastrophe” to “obvious end of the world”, but the first steps are already pretty helpful.
While Paul’s post helps a lot, it still takes a fair bit of effort for me to concretely visualise the scenarios he describes, and I would be excited for people to take the time to detail what it would look like to hand critical systems over to software – for which systems would this happen, why would we do it, who would be the decision-makers, what would it feel like from the average citizen’s vantage point, etc. A smaller version of Hanson’s Age of Em project, just asking the question “Which core functions in society (food, housing, healthcare, law enforcement, governance, etc) are amenable to tech companies building solutions for, and what would it look like for society to transition to 1%, 10%, 50% and 90% of core functions to be automated with 1) human-coded software 2) machine learning 3) human-level general AI?”