# Matthew Barnett’s Shortform

I intend to use my shortform feed for two purposes:

1. To post thoughts that I think are worth sharing that I can then reference in the future in order to explain some belief or opinion I have.

2. To post half-finished thoughts about the math or computer science thing I’m learning at the moment. These might be slightly boring and for that I apologize.

No nominations.
No reviews.
• There is a large set of people who went around, and are still are going around, telling people that “The coronavirus is nothing to worry about” despite the fact that robust evidence has existed for about a month that this virus could result in a global disaster. (Don’t believe me? I wrote a post a month ago about it).

So many people have bought into the “Don’t worry about it” syndrome as a case of pretending to be wise, that I have become more pessimistic about humanity correctly responding to global catastrophic risks in the future. I too used to be one of those people who assumed that the default mode of thinking for an event like this was panic, but I’m starting to think that the real default mode is actually high status people going around saying, “Let’s not be like that ambiguous group over there panicking.”

Now that the stock market has plummeted, from what my perspective appeared entirely predictable given my inside view information, I am also starting to doubt the efficiency of the stock market in response to historically unprecedented events. And this outbreak could be even worse than even some of the most doomy media headlines are saying. If epidemiologists like the one in this article are right, and the death rate ends up being 2-3% (which seems plausible, especially if world infrastructure is strained), then we are looking at a mainline death count of between 60-160 million people dead within about a year. That could mark the first time that world population dropped in over 350 years.

This is not just a normal flu. It’s not just a “thing that takes out old people who are going to die anyway.” This could be like economic depression-level stuff, and is a big deal!

• Just this Monday evening, a professor at the local medical school emailed someone I know, “I’m sorry you’re so worried about the coronavirus. It seems much less worrying than the flu to me.” (He specializes in rehabilitation medicine, but still!) Pretending to be wise seems right to me, or another way to look at it is through the lens of signaling and counter-signaling:

1. The truly ignorant don’t panic because they don’t even know about the virus.

2. People who learn about the virus raise the alarm in part to signal their intelligence and knowledge.

3. “Experts” counter-signal to separate themselves from the masses by saying “no need to panic”.

4. People like us counter-counter-signal the “experts” to show we’re even smarter /​ more rational /​ more aware of social dynamics.

Here’s another example, which has actually happened 3 times to me already:

1. The truly ignorant don’t wear masks.

2. Many people wear masks or encourage others to wear masks in part to signal their knowledge and conscientiousness.

3. “Experts” counter-signal with “masks don’t do much”, “we should be evidence-based” and “WHO says ‘If you are healthy, you only need to wear a mask if you are taking care of a person with suspected 2019-nCoV infection.’”

4. I respond by citing actual evidence in the form of a meta-analysis: medical procedure masks combined with hand hygiene achieved RR of .73 while hand hygiene alone had a (not statistically significant) RR of .86.

that I have become more pessimistic about humanity correctly responding to global catastrophic risks in the future

Maybe correctly understanding the underlying social dynamics can help us figure out how to solve or ameliorate the problem, for example by deliberately pushing more people toward the higher part of the counter-signaling ladder (but hopefully not so much that another group forms to counter-signal us).

Now that the stock market has plummeted, from what my perspective appeared entirely predictable given my inside view information, I am also starting to doubt the efficiency of the stock market in response to historically unprecedented events.

I used to be a big believer in stock market efficiency, but I guess Bitcoin taught me that sometimes there just are 20 bills lying on the street. So I actually made a sizable bet against the market two weeks ago. • “Experts” counter-signal to separate themselves from the masses by saying “no need to panic”. I think the main reason is that the social dynamic is probably favorable to them in the longrun. I worry that there is a higher social risk to being alarmist than being calm. Let me try to illustrate one scenario: My current estimate is that there is only 15 − 20% probability of a global disaster (>50 million deaths within 1 year) mostly because the case fatality rate could be much lower than the currently reported rate, and previous illnesses like the swine flu became looking much less serious after more data came out. [ETA: I did a lot more research. I think it’s now like 5% risk of this.] Let’s say that the case fatality rate turns out to be 0.3% or something, and the illness does start looking like an abnormally bad flu, and people stop caring within months. “Experts” face no sort of criticism since they remained calm and were vindicated. People like us sigh in relief, and are perhaps reminded by the “experts” that there was nothing to worry about. But let’s say that the case fatality rate actually turns out to be 3%, and 50% of the global population is infected. Then it’s a huge deal, global recession looks inevitable. “Experts” say that the disease is worse than anyone could have possibly seen coming, and most people believe them. People like us aren’t really vindicated, because everyone knows that the alarmists who predict doom every year will get it right occasionally. Like with cryonics, the relatively low but still significant chance of a huge outcome makes people systematically refuse to calculate expected value. It’s not a good feature of human psychology. I’m reminded of the fire alarm essay When I observe that there’s no fire alarm for AGI, I’m not saying that there’s no possible equivalent of smoke appearing from under a door. What I’m saying rather is that the smoke under the door is always going to be arguable; it is not going to be a clear and undeniable and absolute sign of fire; and so there is never going to be a fire alarm producing common knowledge that action is now due and socially acceptable. I think what we’re seeing now is the smoke coming out from under the door and people don’t want to be the first one to cause a scene. • ETA: I did a lot more research. I think it’s now like 5% risk of this. I’ve moved in the opposite direction. Please share your research? • So many people have bought into the “Don’t worry about it” syndrome as a case of pretending to be wise, that I have become more pessimistic about humanity correctly responding to global catastrophic risks in the future. See also this story which gives another view of what happened: Most importantly, Italy looked at the example of China, Ms. Zampa said, not as a practical warning, but as a “science fiction movie that had nothing to do with us.” And when the virus exploded, Europe, she said, “looked at us the same way we looked at China.” BTW can you say something about why you were optimistic before? There are others in this space who are relatively optimistic, like Paul Christiano and Rohin Shah (or at least they were—they haven’t said whether the pandemic has caused an update), and I’d really like to understand their psychology better. • I’ll take the under for any line you sound like you’re going to set. “plummeted”? S&P 500 is down half a percent for the last 30 days and up 12% for the last 6 months. Death rate so far seems well under that for auto collisions. Also, I don’t have to pay if I’m dead and you do have to pay if nothing horrible happens. I don’t think I’d say “don’t worry about it”, though. Nor would I say that for climate change, government spending, or runaway AI. There are significant unknowns and it could be Very Bad(tm). But I do think it matters _HOW_ you worry about it. Avoid “something must be done and this is something” propositions. Think through actual scenarios and how your behaviors might actually influence them, rather than just making you feel somewhat less guilty about it. Most of things I can do on the margin won’t mitigate the severity or reduce the probability of a true disaster (enough destruction that global supply chains fully collapse and everyone who can’t move into and defend their farming village dies). Some of them DO make it somewhat more comfortable in temporary or isolated problems. • “plummeted”? S&P 500 is down half a percent for the last 30 days and up 12% for the last 6 months. The last few days have been much more rapid. Here’s the chart I have for the last 1 year, and you can definitely spot the recent trend. Death rate so far seems well under that for auto collisions. According to this source, “Nearly 1.25 million people die in road crashes each year.” That comes out to approximately 0.017% of the global population per year. By contrast, unless I the sources I provided are seriously incorrect, the coronavirus could kill between 0.78% to 2.0% of the global population. That’s nearly two orders of magnitude of a difference. Think through actual scenarios and how your behaviors might actually influence them, rather than just making you feel somewhat less guilty about it. The point of my shortform wasn’t that we can do something right now to reduce the risk massively. It was that people seem irrationally poised to dismiss a potential disaster. This is plausibly bad if this behavior shows up in future catastrophes that kill eg. billions of people. • This is plausibly bad if this behavior shows up in future catastrophes that kill eg. billions of people. It’s bad if this behavior shows up in future catastrophes IFF different behavior was available (knowable and achievable in terms of coordination) that would have reduced or mitigated the disaster. I argue that the world is fragile enough today that different behavior is not achievable far enough in advance of the currently-believable catastrophes to make much of a difference. If you can’t do anything effective, you may well be better off optimizing happiness experienced both before the disaster occurs and in the potential universes where the disaster doesn’t occur. • It’s bad if this behavior shows up in future catastrophes IFF different behavior was available (knowable and achievable in terms of coordination) that would have reduced or mitigated the disaster. Are things only bad if we can do things to prevent them? Let’s imagine the following hypothetical situation: One month ago I identify a meteor on collision course towards Earth and I point out to people that if it hit us (which is not clear, but there is some pretty good evidence) then over a hundred million people will die. People don’t react. Most tell me that it’s nothing to worry about since it hasn’t hit Earth yet and the therefore the deathrate is 0.0%. Today, however, the stock market fell over 3%, following a day in which it fell 3%, and most media outlets are attributing this decline to the fact that the meteor has gotten closer. I go on Lesswrong shortform and say, “Hey guys, this is not good news. I have just learned that the world is so fragile that it looks highly likely we can’t get our shit together to plan for a meteor even we can see it coming more than a month in advance.” Someone tells me that this is only bad IFF different behavior was available that would have reduced or mitigated the disaster. But information was available! I put it in a post and told people about it. And furthermore, I’m just saying that our world is fragile. Things can still be bad even if I don’t point to a specific policy proposal that could have prevented it. • Are things only bad if we can do things to prevent them? Nope. But we should do things to prevent them only if we can do things to prevent them. That seems tautologically obvious to me. If you can suggest things that actually will deflect the meteor (or even secure your mine shaft to further your own chances), that don’t require historically-unprecedented authority or coordination, definitely do so! • If the stock market indeed fell due to the coronavirus, and traders at the time misunderstood the severity, I say that I could have given actionable information in the form of “Sell your stock now” or something similar • If you knew that then, it was actionable. If you know it now, and other traders also do, it’s not. • [ETA: I’m writing this now to cover myself in case people confuse my short form post as financial advice or something.] To be clear, and for the record, I am not saying that I had exceptional foresight, or that I am confident this outbreak will cause a global depression, or that I knew for sure that selling stock was the right thing to do a month ago. All I’m doing is pointing out that if you put together basic facts, then the evidence points to a very serious potential outcome, and I think it would be irrational at this point to place very low probabilities on doomy outcomes like the global population declining this year for the first time in centuries. People seem to be having weird biases that cause them to underestimate the risk. This is worth pointing out, and I pointed it out before. • As I said, I wrote a post about the risk about a month ago... • And how much did you short the market, or otherwise make use of this better-than-median prediction? My whole point is that the prediction isn’t the hard part. The hard part is knowing what actions to take, and to have any confidence that the actions will help. • Is it really necessary that I personally used my knowledge to sell stock? Why is it that important that I actually made money from what I’m saying? I’m simply pointing to a reasonable position given the evidence: you could have seen a potential pandemic coming, and anticipated the stock market falling. Wei Dai says above that he did it. Do I have to be the one who did it? In any case, I used my foresight to predict that Metaculus’ median estimate would rise, and that seems to have borne out so far. • I’m not sure exactly what I’m saying about how and whether you used knowledge personally. You’re free to value and do what you want. I’m mostly disagreeing with your thesis that “don’t worry about it” is a syndrome or a serious problem to fix. For people that won’t or can’t act on the concern in a way that actually improves the situation, there’s not much value in worrying about it. • That’s ok for most people. I can hope that bureaucrats, expert advisers, politicians and eg. Trump’s internal staff don’t share the same attitude. • Quite. Those with capability to actually prepare or change outcomes definitely SHOULD do so. But not by worrying—by analyzing and acting. Whether bureaucrats and politicians can or will do this is up for debate. I wish I could believe that politicians and bureaucrats were clever enough to be acting strongly behind the scenes while trying to avoid panic by loudly saying “don’t worry” to the people likely to do more harm than good if they worry. But I suspect not. • There’s a phenomenon I currently hypothesize to exist where direct attacks on the problem of AI alignment are criticized much more often than indirect attacks. If this phenomenon exists, it could be advantageous to the field in the sense that it encourages thinking deeply about the problem before proposing solutions. But it could also be bad because it disincentivizes work on direct attacks to the problem (if one is criticism averse and would prefer their work be seen as useful). I have arrived at this hypothesis from my observations: I have watched people propose solutions only to be met with immediate and forceful criticism from others, while other people proposing non-solutions and indirect analyses are given little criticism at all. If this hypothesis is true, I suggest it is partly or mostly because direct attacks on the problem are easier to defeat via argument, since their assumptions are made plain If this is so, I consider it to be a potential hindrance on thought, since direct attacks are often the type of thing that leads to the most deconfusion—not because the direct attack actually worked, but because in explaining how it failed, we learned what definitely doesn’t work. • Nod. This is part of a general problem where vague things that can’t be proven not to work are met with less criticism than “concrete enough to be wrong” things. A partial solution is a norm wherein “concrete enough to be wrong” is seen as praise, and something people go out of their way to signal respect for. • Did you have some specific cases in mind when writing this? For example, HCH is interesting and not obviously going to fail in the ways that some other proposals I’ve seen would, and the proposal there seems to have gotten better as more details have been fleshed out even if there’s still some disagreement on things that can be tested eventually even if not yet. Against this we’ve seen lots of things, like various oracle AI proposals, that to my mind usually have fatal flaws right from the start due to misunderstanding something that they can’t easily be salvaged. I don’t want to disincentivize thinking about solving AI alignment directly when I criticize something, but I also don’t want to let pass things that to me have obvious problems that the authors probably didn’t think about or thought about from different assumptions that maybe are wrong (or maybe I will converse with them and learn that I was wrong!). It seems like an important part of learning in this space is proposing things and seeing why they don’t work so you can better understand the constraints of the problem space to work within them to find solutions. • Occasionally, I will ask someone who is very skilled in a certain subject how they became skilled in that subject so that I can copy their expertise. A common response is that I should read a textbook in the subject. Eight years ago, Luke Muehlhauser wrote, For years, my self-education was stupid and wasteful. I learned by consuming blog posts, Wikipedia articles, classic texts, podcast episodes, popular books, video lectures, peer-reviewed papers, Teaching Company courses, and Cliff’s Notes. How inefficient! I’ve since discovered that textbooks are usually the quickest and best way to learn new material. However, I have repeatedly found that this is not good advice for me. I want to briefly list the reasons why I don’t find sitting down and reading a textbook that helpful for learning. Perhaps, in doing so, someone else might appear and say, “I agree completely. I feel exactly the same way” or someone might appear to say, “I used to feel that way, but then I tried this...” This is what I have discovered: • When I sit down to read a long textbook, I find myself subconsciously constantly checking how many pages I have read. For instance, if I have been sitting down for over an hour and I find that I have barely made a dent in the first chapter, much less the book, I have a feeling of hopelessness that I’ll ever be able to “make it through” the whole thing. • When I try to read a textbook cover to cover, I find myself much more concerned with finishing rather than understanding. I want the satisfaction of being able to say I read the whole thing, every page. This means that I will sometimes cut corners in my understanding just to make it through a difficult part. This ends in disaster once the next chapter requires a solid understanding of the last. • Reading a long book feels less like I’m slowly building insights and it feels more like I’m doing homework. By contrast, when I read blog posts it feels like there’s no finish line, and I can quit at any time. When I do read a good blog post, I often end up thinking about its thesis for hours afterwards even after I’m done reading it, solidifying the content in my mind. I cannot replicate this feeling with a textbook. • Textbooks seem overly formal at points. And they often do not repeat information, instead putting the burden on the reader to re-read things rather than repeating information. This makes it difficult to read in a linear fashion, which is straining. • If I don’t understand a concept I can get “stuck” on the textbook, disincentivizing me from finishing. By contrast, if I just learned as Muehlhauser described, by “consuming blog posts, Wikipedia articles, classic texts, podcast episodes, popular books, video lectures, peer-reviewed papers, Teaching Company courses, and Cliff’s Notes” I feel much less stuck since I can always just move from one source to the next without feeling like I have an obligation to finish. • I used to feel similarly, but then a few things changed for me and now I am pro-textbook. There are caveats—namely that I don’t work through them continuously. Textbooks seem overly formal at points This is a big one for me, and probably the biggest change I made is being much more discriminating in what I look for in a textbook. My concerns are invariably practical, so I only demand enough formality to be relevant; otherwise I am concerned with a good reputation for explaining intuitions, graphics, examples, ease of reading. I would go as far as to say that style is probably the most important feature of a textbook. As I mentioned, I don’t work through them front to back, because that actually is homework. Instead I treat them more like a reference-with-a-hook; I look at them when I need to understand the particular thing in more depth, and then get out when I have what I need. But because it is contained in a textbook, this knowledge now has a natural link to steps before and after, so I have obvious places to go for regression and advancement. I spend a lot of time thinking about what I need to learn, why I need to learn it, and how it relates to what I already know. This does an excellent job of helping things stick, and also of keeping me from getting too stuck because I have a battery of perspectives ready to deploy. This enables the reference approach. I spend a lot of time what I have mentally termed triangulating, which is deliberately using different sources/​currents of thought when I learn a subject. This winds up necessitating the reference approach, because I always wind up with questions that are neglected or unsatisfactorily addressed in a given source. Lately I really like founding papers and historical review papers right out of the gate, because these are prone to explaining motivations, subtle intuitions, and circumstances in a way instructional materials are not. • I’ve also been reading textbooks more and experiencied some frustration, but I’ve found two things that, so far, help me get less stuck and feel less guilt. After trying to learn math from textbooks on my own for a month or so, I started paying a tutor (DM me for details) with whom I meet once a week. Like you, I struggle with getting stuck on hard exercises and/​or concepts I don’t understand, but having a tutor makes it easier for me to move on knowing I can discuss my confusions with them in our next session. Unfortunately, a paying a tutor requires actually having to spare on an ongoing basis, but I also suspect for some people it just “feels weird”. If someone reading this is more deterred by this latter reason, consider that basically everyone who wants to seriously improve at any physical activity gets 1-on-1 instruction, but for some reason doing the same for mental activities as an adult is weirdly uncommon (and perhaps a little low status).

I’ve also started to follow MIT OCW courses for things I want to learn rather than trying to read entire textbooks. Yes, this means I may not cover as much material, but it has helped me better gauge how much time to spend on different topics and allow me to feel like I’m progressing. The major downside of this strategy is that I have to remind myself that even though I’m learning based on a course’s materials, my goal is to learn the material in a way that’s useful to me, not to memorize passwords. Also, because I know how long the courses would take in a university context, I do occasionally feel guilt if I fall behind due to spending more time on a specific topic. Still, on net, using courses as loose guides has been working better for me than just trying to 100 percent entire math textbooks.

• When I try to read a textbook cover to cover, I find myself much more concerned with finishing rather than understanding. I want the satisfaction of being able to say I read the whole thing, every page. This means that I will sometimes cut corners in my understanding just to make it through a difficult part. This ends in disaster once the next chapter requires a solid understanding of the last.

When I read a textbook, I try to solve all exercises at the end of each chapter (at least those not marked “super hard”) before moving to the next. That stops me from cutting corners.

• The only flaw I find with this is that if I get stuck on an exercise, I reach the following decision: should I look at the answer and move on, or should I keep at it.

If I choose the first option, this makes me feel like I’ve cheated. I’m not sure what it is about human psychology, but I think that if you’ve cheated once, you feel less guilty a second time because “I’ve already done it.” So, I start cheating more and more, until soon enough I’m just skipping things and cutting corners again.

If I choose the second option, then I might be stuck for several hours, and this causes me to just abandon the textbook develop an ugh field around it.

• Maybe commit to spending at least N minutes on any exercise before looking up the answer?

• Perhaps it says something about the human brain (or just mine) that I did not immediately think of that as a solution.

• I was of the very same mind that you are now. I was somewhat against textbooks, but now textbooks are my only way of learning, not only for strong knowledge but also fast.

I think there are several important things in changing to textbooks only, first I have replaced my habit of completionism: not finishing a particular book in some field but change, it if I don’t feel like it’s helping me or a if things seem confusing, by another textbook in the same field. lukeprog’s post is very handy here.

The idea of changing text-books has helped me a lot, sometimes I just thought I did not understand something but apparently I was only needing another explanation.

Two other important things, is that I take quite a lot of notes as I’m reading. I believe that if someone is just reading a text-book, that person is doing it wrong and a disservice to themselves. So I fill as much as I can in my working memory, be it three, four paragraphs of content and I transcribe those myself in my notes. Coupled with this is making my own questions and answers and then putting them on Anki (space-repetition memory program).

This allows me to learn vast amounts of knowledge in low amounts of time, assuring myself that I will remember everything I’ve learned. I believe textbooks are key component for this.

• I bet Robin Hanson on Twitter my $9k to his$1k that de novo AGI will arrive before ems. He wrote,

OK, so to summarize a proposal: I’d bet my $1K to your$9K (both increased by S&P500 scale factor) that when US labor participation rate < 10%, em-like automation will contribute more to GDP than AGI-like. And we commit our descendants to the bet.

• I get the feeling that for AI safety, some people believe that it’s crucially important to be an expert in a whole bunch of fields of math in order to make any progress. In the past I took this advice and tried to deeply study computability theory, set theory, type theory—with the hopes of it someday giving me greater insight into AI safety.

Now, I think I was taking a wrong approach. To be fair, I still think being an expert in a whole bunch of fields of math is probably useful, especially if you want very strong abilities to reason about complicated systems. But, my model for the way I frame my learning is much different now.

I think my main model which describes my current perspective is that I think employing a lazy style of learning is superior for AI safety work. Lazy is meant in the computer science sense of only learning something when it seems like you need to know it in order to understand something important. I will contrast this with the model that one should learn a set of solid foundations first before going any further.

Obviously neither model can be absolutely correct in an extreme sense. I don’t, as a silly example, think that people who can’t do basic arithmetic should go into AI safety before building a foundation in math. And on the other side of the spectrum, I think it would be absurd to think that one should become a world renowned mathematician before reading their first AI safety paper. That said, even though both models are wrong, I think my current preference is for the lazy model rather than the foundation model.

Here are some points in favor of both, informed by my first-person experience.

Points in favor of the foundations model:

• If you don’t have solid foundations in mathematics, you may not even be aware of things that you are missing.

• Having solid foundations in mathematics will help you to think rigorously about things rather than having a vague non-reductionistic view of AI concepts.

• Subpoint: MIRI work is motivated by coming up with new mathematics that can describe error-tolerant agents without relying on fuzzy statements like “machine learning relies on heuristics so we need to study heuristics rather than hard math to do alignment.”

• We should try to learn the math that will be useful for AI safety in the future, rather than what is being used for machine learning papers right now. If your view of AI is that it is at least a few decades away, then it’s possible that learning the foundations of mathematics will be more robustly useful no matter where the field shifts.

Points in favor of the lazy model:

• Time is limited and it usually takes several years to become proficient in the foundations of mathematics. This is time that could have been spent reading actual research directly related to AI safety.

• The lazy model is better for my motivation, since it makes me feel like I am actually learning about what’s important, rather than doing homework.

• Learning foundational math often looks a lot like just taking a shotgun and learning everything that seems vaguely relevant to agent foundations. Unless you have a very strong passion for this type of mathematics, it would seem outright strange that this type of learning is fun.

• It’s not clear that the MIRI approach is correct. I don’t have a strong opinion on this, however

• Even if the MIRI approach was correct, I don’t think it’s my comparative advantage to do foundational mathematics.

• The lazy model will naturally force you to learn the things that are actually relevant, as measured by how much you come in contact with them. By contrast, the foundational model forces you to learn things which might not be relevant at all. Obviously, we won’t know what is and isn’t relevant beforehand, but I currently err on the side of saying that some things won’t be relevant if they don’t have a current direct input to machine learning.

• Even if AI is many decades away, machine learning has been around for a long time, and it seems like the math useful for machine learning hasn’t changed much. So, it seems like a safe bet that foundational math won’t be relevant for understanding normal machine learning research any time soon.

• I happened to be looking at something else and saw this comment thread from about a month ago that is relevant to your post.

• I’m somewhat sympathetic to this. You probably don’t need the ability, prior to working on AI safety, to already be familiar with a wide variety of mathematics used in ML, by MIRI, etc.. To be specific, I wouldn’t be much concerned if you didn’t know category theory, more than basic linear algebra, how to solve differential equations, how to integrate together probability distributions, or even multivariate calculus prior to starting on AI safety work, but I would be concerned if you didn’t have deep experience with writing mathematical proofs beyond high school geometry (although I hear these days they teach geometry differently than I learned it—by re-deriving everything in Elements), say the kind of experience you would get from studying graduate level algebra, topology, measure theory, combinatorics, etc..

This might also be a bit of motivated reasoning on my part, to reflect Dagon’s comments, since I’ve not gone back to study category theory since I didn’t learn it in school and I haven’t had specific need for it, but my experience has been that having solid foundations in mathematical reasoning and proof writing is what’s most valuable. The rest can, as you say, be learned lazily, since your needs will become apparent and you’ll have enough mathematical fluency to find and pursue those fields of mathematics you may discover you need to know.

• Beware motivated reasoning. There’s a large risk that you have noticed that something is harder for you than it seems for others, and instead of taking that as evidence that you should find another avenue to contribute, you convince yourself that you can take the same path but do the hard part later ( and maybe never ).

But you may be on to something real—it’s possible that the math approach is flawed, and some less-formal modeling (or other domain of formality) can make good progress. If your goal is to learn and try stuff for your own amusement, pursuing that seems promising. If your goals include getting respect (and/​or payment) from current researchers, you’re probably stuck doing things their way, at least until you establish yourself.

• That’s a good point about motivated reasoning. I should distinguish arguments that the lazy approach is better for people and arguments that it’s better for me. Whether it’s better for people more generally depends on the reference class we’re talking about. I will assume people who are interested in the foundations of mathematics as a hobby outside of AI safety should take my advise less seriously.

However, I still think that it’s not exactly clear that going the foundational route is actually that useful on a per-unit time basis. The model I proposed wasn’t as simple as “learn the formal math” versus “think more intuitively.” It was specifically a question of whether we should learn the math on an as-needed basis. For that reason, I’m still skeptical that going out and reading textbooks on subjects that are only vaguely related to current machine learning work is valuable for the vast majority of people who want to go into AI safety as quickly as possible.

Sidenote: I think there’s a failure mode of not adequately optimizing time, or being insensitive to time constraints. Learning an entire field of math from scratch takes a lot of time, even for the brightest people alive. I’m worried that, “Well, you never know if subject X might be useful” is sometimes used as a fully general counterargument. The question is not, “Might this be useful?” The question is, “Is this the most useful thing I could learn in the next time interval?”

• A lot depends on your model of progress, and whether you’ll be able to predict/​recognize what’s important to understand, and how deeply one must understand it for the project at hand.

Perhaps you shouldn’t frame it as “study early” vs “study late”, but “study X” vs “study Y”. If you don’t go deep on math foundations behind ML and decision theory, what are you going deep on instead? It seems very unlikely for you to have significant research impact without being near-expert in at least some relevant topic.

I don’t want to imply that this is the only route to impact, just the only route to impactful research.
You can have significant non-research impact by being good at almost anything—accounting, management, prototype construction, data handling, etc.

• I don’t want to imply that this is the only route to impact, just the only route to impactful research.

“Only” seems a little strong, no? To me, the argument seems to be better expressed as: if you want to build on existing work where there’s unlikely to be low-hanging fruit, you should be an expert. But what if there’s a new problem, or one that’s incorrectly framed? Why should we think there isn’t low-hanging conceptual fruit, or exploitable problems to those with moderate experience?

• Perhaps you shouldn’t frame it as “study early” vs “study late”, but “study X” vs “study Y”.

My point was that these are separate questions. If you begin to suspect that understanding ML research requires an understanding of type theory, then you can start learning type theory. Alternatively, you can learn type theory before researching machine learning—ie. reading machine learning papers—in the hopes that it builds useful groundwork.

But what you can’t do is learn type theory and read machine learning research papers at the same time. You must make tradeoffs. Each minute you spend learning type theory is a minute you could have spent reading more machine learning research.

The model I was trying to draw was not one where I said, “Don’t learn math.” I explicitly said it was a model where you learn math as needed.

My point was not intended to be about my abilities. This is a valid concern, but I did not think that was my primary argument. Even conditioning on having outstanding abilities to learn every subject, I still think my argument (weakly) holds.

Note: I also want to say I’m kind of confused because I suspect that there’s an implicit assumption that reading machine learning research is inherently easier than learning math. I side with the intuition that math isn’t inherently difficult, it just requires memorizing a lot of things and practicing. The same is true for reading ML papers, which makes me confused why this is being framed as a debate over whether people have certain abilities to learn and do research.

• I’m trying to find a balance here. I think that there has to be a direct enough relation to a problem that you’re trying to solve to prevent the task expanding to the point where it takes forever, but you also have to be willing to engage in exploration

• I think there are some serious low hanging fruits for making people productive that I haven’t seen anyone write about (not that I’ve looked very hard). Let me just introduce a proof of concept:

Final exams in university are typically about 3 hours long. And many people are able to do multiple finals in a single day, performing well on all of them. During a final exam, I notice that I am substantially more productive than usual. I make sure that every minute counts: I double check everything and think deeply about each problem, making sure not to cut corners unless absolutely required because of time constraints. Also, if I start daydreaming, then I am able to immediately notice that I’m doing so and cut it out. I also believe that this is the experience of most other students in university who care even a little bit about their grade.

Therefore, it seems like we have an example of an activity that can just automatically produce deep work. I can think of a few reasons why final exams would bring out the best of our productivity:

1. We care about our grade in the course, and the few hours in that room are the most impactful to our grade.

2. We are in an environment where distractions are explicitly prohibited, so we can’t make excuses to ourselves about why we need to check Facebook or whatever.

3. There is a clock at the front of the room which makes us feel like time is limited. We can’t just sit there doing nothing because then time will just slip away.

4. Every problem you do well on benefits you by a little bit, meaning that there’s a gradient of success rather than a binary pass or fail (though sometimes it’s binary). This means that we care a lot about optimizing every second because we can always do slightly better.

If we wanted to do deep work for some other desired task, all four of these reasons seem like they could be replicable. Here is one idea (related to my own studying), although I’m sure I can come up with a better one if I thought deeply about this for longer:

Set up a room where you are given a limited amount of resources (say, a few academic papers, a computer without an internet connection, and a textbook). Set aside a four hour window where you’re not allowed to leave the room except to go to the bathroom (and some person explicitly checks in on you like twice to see whether you are doing what you say you are doing). Make it your goal to write a blog post explaining some technical concept. Afterwards, the blog post gets posted to Lesswrong (conditional on it being at least minimal quality). You set some goal, like it must acheive 30 upvote reputation after 3 days. Commit to paying $1 to a friend for each upvote you score below the target reputation. So, if your blog post is at +15, you must pay$15 to your friend.

I can see a few problems with this design:

1. You are optimizing for upvotes, not clarity or understanding. The two might be correlated but at the very least there’s a Goodhart effect.

2. Your “friend” could downvote the post. It can easily be hacked by other people who are interested, and it encourages vote manipulation etc.

Still, I think that I might be on the right track towards something that boosts productivity by a lot.

• These seem like reasonable things to try, but I think this is making an assumption that you could take a final exam all the time and have it work out fine. I have some sense that people go through phases of “woah I could just force myself to work hard all the time” and then it totally doesn’t work that way.

• I agree that it is probably too hard to “take a final exam all the time.” On the other hand, I feel like I could make a much weaker claim that this is an improvement over a lot of productivity techniques, which often seem to more-or-less be dependent on just having enough willpower to actually learn.

At least in this case, each action you do can be informed directly by whether you actually succeed or fail at the goal (like getting upvotes on a post). Whether or not learning is a good instrumental proxy for getting upvotes in this setting is an open question.

• From my own experience going through a similar realization and trying to apply it to my own productivity, I found that certain things I tried actually helped me sustainably work more productively but others did not.

What has worked for me based on my experience with exam-like situations is having clear goals and time boxes for work sessions, e.g. the blog post example you described. What hasn’t worked for me is trying to impose aggressively short deadlines on myself all the time to incentivize myself to focus more intensely. Personally, the level of focus I have during exams is driven by an unsustainable level of stress, which, if applied continuously, would probably lead to burnout and/​or procrastination binging. That said, occasionally artificially imposing deadlines has helped me engage exam-style focus when I need to do something that might otherwise be boring because it mostly involves executing known strategies rather than doing more open, exploratory thinking. For hard thinking though, I’ve actually found that giving myself conservatively long time boxes helps me focus better by allowing me to relax and take my time. I saw you mentioned struggling with reading textbooks above, and while I still struggle trying to read them too, I have found that not expecting miraculous progress helps me get less frustrated when I read them.

Related to all this, you used the term “deep work” a few times so you may already be familiar with Cal Newport’s work. But, if you’re not I recommend a few of his relevant posts (1, 2) describing how he produces work artifacts that act as a forcing function for learning the right stuff and staying focused.

• This seems similar to “pomodoro”, except instead of using your willpower to keep working during the time period, you set up the environment in a way that doesn’t allow you to do anything else.

The only part that feels wrong is the commitment part. You should commit to work, not to achieve success, because the latter adds of problems (not completely under your control, may discourage experimenting, a punishment creates aversion against the entire method, etc.).

• Yes, the difference is that you are creating an external environment which rewards you for success and punishes you for failure. This is similar to taking a final exam, which is my inspiration.

The problem with committing to work rather than success is that you can always just rationalize something as “Oh I worked hard” or “I put in my best effort.” However, just as with a final exam, the only thing that will matter in the end is if you actually do what it takes to get the high score. This incentivizes good consequentialist thinking and disincentivizes rationalization.

I agree there are things out of your control, but the same is true with final exams. For instance, the test-maker could have put something on the test that you didn’t study much for. This encourages people to put extra effort into their assigned task to ensure robustness to outside forces.

• I personally try to balance keeping myself honest by having some goal outside but also trusting myself enough to know when I should deprioritize the original goal in favor of something else.

For example, let’s say I set a goal to write a blog post about a topic I’m learning in 4 hours, and half-way through I realize I don’t understand one of the key underlying concepts related to the thing I intended to write about. During an actual test, the right thing to do would be to do my best given what I know already and finish as many questions as possible. But I’d argue that in the blog post case, I very well may be better off saying, “OK I’m going to go learn about this other thing until I understand it, even if I don’t end up finishing the post I wanted to write.”

The pithy way to say this is that tests are basically pure Goodhardt, and it’s dangerous to turn every real life task into a game of maximizing legible metrics.

• For example, let’s say I set a goal to write a blog post about a topic I’m learning in 4 hours, and half-way through I realize I don’t understand one of the key underlying concepts related to the thing I intended to write about.

Interesting, this exact same thing just happened to me a few hours ago. I was testing my technique by writing a post on variational autoencoders. Halfway through I was very confused because I was trying to contrast them to GANs but didn’t have enough material or knowledge to know the advantages of either.

During an actual test, the right thing to do would be to do my best given what I know already and finish as many questions as possible. But I’d argue that in the blog post case, I very well may be better off saying, “OK I’m going to go learn about this other thing until I understand it, even if I don’t end up finishing the post I wanted to write.”

I agree that’s probably true. However, this creates a bad incentive where, at least in my case, I will slowly start making myself lazier during the testing phase because I know I can always just “give up” and learn the required concept afterwards.

At least in the case I described above I just moved onto a different topic, because I was kind of getting sick of variational autoencoders. However, I was able to do this because I didn’t have any external constraints, unlike the method I described in the parent comment.

The pithy way to say this is that tests are basically pure Goodhardt, and it’s dangerous to turn every real life task into a game of maximizing legible metrics.

That’s true, although perhaps one could devise a sufficiently complex test such that it matches perfectly with what we really want… well, I’m not saying that’s a solved problem in any sense.

• Weirdly enough, I was doing something today that made me think about this comment. The thought I had is that you caught onto something good here which is separate from the pressure aspect. There seems to be a benefit to trying to separate different aspects of a task more than may feel natural. To use the final exam example, as someone mentioned before, part of the reason final exams feel productive is because you were forced to do so much prep beforehand to ensure you’d be able to finish the exam in a fixed amount of time.

Similarly, I’ve seen benefit when I (haphazardly since I only realized this recently) clearly segment different aspects of an activity and apply artificial constraints to ensure that they remain separate. To use your VAE blog post example, this would be like saying, “I’m only going to use a single page of notes to write the blog post” to force yourself to ensure you understand everything before trying to write.

YMMV warning: I’m especially bad about trying to produce outputs before fully understanding and therefore may get more bandwidth out of this than others.

• I think you might be goodhearting a bit (mistaking the measure for the goal) when you claim that final exam performance is productive. The actual product is the studying and prep for the exam, not the exam itself. The time limits and isolated environment is helpful in proctoring (it ensures the output is limited enough to be able to grade, and ensures that no outside sources are being used), not for productivity.

That’s not to say that these elements (isolation, concentration, time awareness, expectation of a grading/​scoring rubric) aren’t important, just that they’re not necessarily sufficient nor directly convertible from an exam setting.

• Related to: The Lottery of Fascinations, other posts probably

When you are older, you will learn that the first and foremost thing which any ordinary person does is nothing.

I will occasionally come across someone who I consider to be extraordinarily productive, and yet when I ask what they did on a particular day they will respond, “Oh I basically did nothing.” This is particularly frustrating. If they did nothing, then what was all that work that I saw!

I think this comes down to what we mean by doing nothing. There’s a literal meaning to doing nothing. It could mean sitting in a chair, staring blankly at a wall, without moving a muscle.

More practically, what people mean by doing nothing is that they are doing something unrelated to their stated task, such as checking Facebook, chatting with friends, browsing Reddit etc.

When productive people say that they are “doing nothing” it could just be that they are modest, and don’t want to signal how productive they really are. On the other hand, I think that there is a real sense in which these productive people truly believe that they are doing nothing. Even if their “doing nothing” was your “doing work”, to them it’s still a “doing nothing” because they weren’t doing the thing they explicitly set out to do.

I think, therefore, there is something of a “do nothing” differential, which helps explain why some people are more productive than others. For some people who are less productive than me, their “doing nothing” might just be playing video games. For me, my “doing nothing” is watching people debate the headline of a Reddit news article (and I’m not proud of this).

For those more productive than me, perhaps their “doing nothing” is reading blog posts that are tangentially related to what they are working on. For people more productive still, it might be obsessively re-reading articles directly applicable to their work. And for Terence Tao, his “doing nothing” might be reading math papers in fields other than the one he is supposed to be currently working in.

• So, in 2017 Eliezer Yudkowsky made a bet with Bryan Caplan that the world will end by January 1st, 2030, in order to save the world by taking advantage of Bryan Caplan’s perfect betting record — a record which, for example, includes a 2008 bet that the UK would not leave the European Union by January 1st 2020 (it left on January 31st 2020 after repeated delays).

What we need is a short story about people in 2029 realizing that a bunch of cataclysmic events are imminent, but all of them seem to be stalled, waiting for… something. And no one knows what to do. But by the end people realize that to keep the world alive they need to make more bets with Bryan Caplan.

• The case for studying mesa optimization

Early elucidations of the alignment problem focused heavily on value specification. That is, they focused on the idea that given a powerful optimizer, we need some way of specifying our values so that the powerful optimizer can create good outcomes.

Since then, researchers have identified a number of additional problems besides value specification. One of the biggest problems is that in a certain sense, we don’t even know how to optimize for anything, much less a perfect specification of human values.

Let’s assume we could get a utility function containing everything humanity cares about. How would we go about optimizing this utility function?

The default mode of thinking about AI right now is to train a deep learning model that performs well on some training set. But even if we were able to create a training environment for our model that reflected the world very well, and rewarded it each time it did something good, exactly in proportion to how good it really was in our perfect utility function… this still would not be guaranteed to yield a positive artificial intelligence.

This problem is not a superficial one either—it is intrinsic to the way that machine learning is currently accomplished. To be more specific, the way we constructed our AI was by searching over some class of models , and selecting those models which tended to do well on the training set. Crucially, we know almost nothing about the model which eventually gets selected. The most we can say is that our AI , but since was such a broad class, this provides us very little information about what the model is actually doing.

This is similar to the mistake evolution made when designing us. Unlike evolution, we can at least put some hand-crafted constraints, like a regularization penalty, in order to guide our AI into safe regions of . We can also open up our models and see what’s inside, and in principle simulate every aspect of their internal operations.

But now this still isn’t looking very good, because we barely know anything about what type of computations are safe. What would we even look for? To make matters worse, our current methods for ML transparency are abysmally ill equipped to the task of telling us what is going on inside.

The default outcome of all of this is that eventually, as grows larger with compute becoming cheaper and budgets getting bigger, gradient descent is bound to hit powerful optimizers who do not share our values.

• Signal boosting a Lesswrong-adjacent author from the late 1800s and early 1900s

Via a friend, I recently discovered the zoologist, animal rights advocate, and author J. Howard Moore. His attitudes towards the world reflect contemporary attitudes within effective altruism about science, the place of humanity in nature, animal welfare, and the future. Here are some quotes which readers may enjoy,

Oh, the hope of the centuries and the centuries and centuries to come! It seems sometimes that I can almost see the shining spires of that Celestial Civilisation that man is to build in the ages to come on this earth—that Civilisation that will jewel the land masses of this planet in that sublime time when Science has wrought the miracles of a million years, and Man, no longer the savage he now is, breathes Justice and Brotherhood to every being that feels.

But we are a part of Nature, we human beings, just as truly a part of the universe of things as the insect or the sea. And are we not as much entitled to be considered in the selection of a model as the part ‘red in tooth and claw’? At the feet of the tiger is a good place to study the dentition of the cat family, but it is a poor place to learn ethics.

Nature is the universe, including ourselves. And are we not all the time tinkering at the universe, especially the garden patch that is next to us—the earth? Every time we dig a ditch or plant a field, dam a river or build a town, form a government or gut a mountain, slay a forest or form a new resolution, or do anything else almost, do we not change and reform Nature, make it over again and make it more acceptable than it was before? Have we not been working hard for thousands of years, and do our poor hearts not almost faint sometimes when we think how far, far away the millennium still is after all our efforts, and how long our little graves will have been forgotten when that blessed time gets here?

The defect in this argument is that it assumes that the basis of ethics is life, whereas ethics is concerned, not with life, but with consciousness. The question ever asked by ethics is not, Does the thing live? but. Does it feel? It is impossible to do right and wrong to that which is incapable of sentient experience. Ethics arises with consciousness and is coextensive with it. We have no ethical relation to the clod, the molecule, or the scale sloughed off from our skin on the back of our hand, because the clod, the molecule, and the scale have no feeling, no soul, no anything rendering them capable of being affected by us [...] The fact that a thing is an organism, that it has organisation, has in itself no more ethical significance than the fact that it has symmetry, or redness, or weight.

In the ideal universe the life and happiness of no being are contingent on the suffering and death of any other, and the fact that in this world of ours life and happiness have been and are to-day so commonly maintained by the infliction of misery and death by some beings on others is the most painful fact that ever entered an enlightened mind.
• I agree with Wei Dai that we should use our real names for online forums, including Lesswrong. I want to briefly list some benefits of using my real name,

• It means that people can easily recognize me across websites, for example from Facebook and Lesswrong simultaneously.

• Over time my real name has been stable whereas my usernames have changed quite a bit over the years. For some very old accounts, such as those I created 10 years ago, this means that I can’t remember my account name. Using my real name would have averted this situation.

• It motivates me to put more effort into my posts, since I don’t have any disinhibition from being anonymous.

• It often looks more formal than a silly username, and that might make people take my posts more seriously than they otherwise would have.

• Similar to what Wei Dai said, it makes it easier for people to recognize me in person, since they don’t have to memorize a mapping from usernames to real names in their heads.

That said, there are some significant downsides, and I sympathize with people who don’t want to use their real names.

• It makes it much easier for people to dox you. There are some very bad ways that this can manifest.

• If you say something stupid, your reputation is now directly on the line. Some people change accounts every few years, as they don’t want to be associated with the stupid person they were a few years ago.

• Sometimes disinhibition from being anonymous is a good way to spur creativity. I know that I was a lot less careful in my previous non-real-name accounts, and my writing style was different—perhaps in a way that made my writing better.

• These days my reason for not using full name is mostly this: I want to keep my professional and private lives separate. And I have to use my real name at job, therefore I don’t use it online.

What I probably should have done many years ago, is make up a new, plausibly-sounding full name (perhaps keep my first name and just make up a new surname?), and use it consistently online. Maybe it’s still not too late; I just don’t have any surname ideas that feel right.

• Sometimes you need someone to give the naive view, but doing so hurts the reputation of the person stating it.

For example suppose X is the naive view and Y is a more sophisticated view of the same subject. For sake of argument suppose X is correct and contradicts Y.

Given 6 people, maybe 1 of them starts off believing Y. 2 people are uncertain, and 3 people think X. In the world where people have their usernames attached. The 3 people who believe X now have a coordination problem. They each face a local disincentive to state the case for X, although they definitely want _someone_ to say it. The equilibrium here is that no one makes the case for X and the two uncertain people get persuaded to view Y.

However if someone is anonymous and doesn’t care that much about their reputation, they may just go ahead and state the case for X, providing much better information to the undecided people.

This makes me happy there are some smart people posting under pseudonyms. I claim it is a positive factor for the epistemics of LessWrong.

• It makes it much easier for people to dox you. There are some very bad ways that this can manifest.

I agree with this, so my original advice was aimed at people who already made the decision to make their pseudonym easily linkable to their real name (e.g., their real name is easily Googleable from their pseudonym). I’m lucky in that there are lots of ethnic Chinese people with my name so it’s hard to dox me even knowing my real name, but my name isn’t so common that there’s more than one person with the same full name in the rationalist/​EA space. (Even then I do use alt accounts when saying especially risky things.)

On the topic of doxing, I was wondering if there’s a service that would “pen-test” how doxable you are, to give a better sense of how much risk one can take when saying things online. Have you heard of anything like that?

• Another issue I’d add is that real names are potentially too generic. Basically, if everyone used their real name, how many John Smiths would there be? Would it be confusing?

The rigidity around 1 username/​alias per person on most platforms forces people to adopt mostly memorable names that should distinguish them from the crowd.

• Bertrand Russell’s advice to future generations, from 1959

Interviewer: Suppose, Lord Russell, this film would be looked at by our descendants, like a Dead Sea scroll in a thousand years’ time. What would you think it’s worth telling that generation about the life you’ve lived and the lessons you’ve learned from it?
Russell: I should like to say two things, one intellectual and one moral. The intellectual thing I should want to say to them is this: When you are studying any matter or considering any philosophy, ask yourself only what are the facts and what is the truth that the facts bear out. Never let yourself be diverted either by what you wish to believe, or by what you think would have beneficent social effects if it were believed, but look only — and solely — at what are the facts. That is the intellectual thing that I should wish to say. The moral thing I should wish to say to them is very simple: I should say love is wise, hatred is foolish. In this world which is getting more and more closely interconnected, we have to learn to tolerate each other; we have to learn to put up with the fact that some people say things we don’t like. We can only live together in that way and if we are to live together and not die together, we must learn a kind of charity and a kind of tolerance, which is absolutely vital to the continuation of human life on this planet.
• When I look back at things I wrote a while ago, say months back, or years ago, I tend to cringe at how naive many of my views were. Faced with this inevitable progression, and the virtual certainty that I will continue to cringe at views I now hold, it is tempting to disconnect from social media and the internet and only comment when I am confident that something will look good in the future.

At the same time, I don’t really think this is a good attitude for several reasons:

• Writing things up forces my thoughts to be more explicit, improving my ability to think about things

• Allowing my ideas to be critiqued allows for a quicker transition towards correct beliefs

• I tend to learn a lot when writing things

• People who don’t understand the concept of “This person may have changed their mind in the intervening years”, aren’t worth impressing. I can imagine scenarios where your economic and social circumstances are so precarious that the incentives leave you with no choice but to let your speech and your thought be ruled by unthinking mob social-punishment mechanisms. But you should at least check whether you actually live in that world before surrendering.

• In real world, people usually forget what you said 10 years ago. And even if they don’t, saying “Matthew said this 10 years ago” doesn’t have the same power as you saying the thing now.

But the internet remembers forever, and your words from 10 years ago can be retweeted and become alive as if you said them now.

A possible solution would be to use a nickname… and whenever you notice you grew up so much that you no longer identify with the words of your nickname, pick up a new one. Also new accounts on social networks, and re-friend only those people you still consider worthy. Well, in this case the abrupt change would be the unnatural thing, but perhaps you could still keep using your previous account for some time, but mostly passively. As your real-life new self would have different opinions, different hobbies, and different friends than your self from 10 years ago, so would your online self.

• Related to: Realism about rationality

I have talked to some people who say that they value ethical reflection, and would prefer that humanity reflected for a very long time before colonizing the stars. In a sense I agree, but at the same time I can’t help but think that “reflection” is a vacuous feel-good word that has no shared common meaning.

Some forms of reflection are clearly good. Epistemic reflection is good if you are a consequentialist, since it can help you get what you want. I also agree that narrow forms of reflection can also be good. One example of a narrow form of reflection is philosophical reflection where we compare the details of two possible outcomes and then decide which one is better.

However, there are much broader forms of reflection which I’m less hesitant to endorse. Namely, the vague types of reflection, such as reflecting on whether we really value happiness, or whether we should really truly be worried about animal suffering.

I can perhaps sympathize with the intuition that we should really try to make sure that what we put into an AI is what we really want, rather than just what we superficially want. But fundamentally, I have skepticism that there is any canonical way of doing this type of reflection that leads to non-arbitrariness.

I have heard something along the lines of “I would want a reflective procedure that extrapolates my values as long as the procedure wasn’t deceiving me or had some ulterior motive” but I just don’t see how this type of reflection corresponds to any natural class. At some point, we will just have to put some arbitrariness into the value system, and there won’t be any “right answer” about how the extrapolation is done.

• The vague reflections you are referring to are analogous to somebody saying “I should really exercise more” without ever doing it. I agree that the mere promise of reflection is useless.

But I do think that reflections about the vague topics are important and possible. Actively working through one’s experiences, reading relevant books, discussing questions with intelligent people can lead to epiphanies (and eventually life choices), that wouldn’t have occurred otherwise.

However, this is not done with a push of a button and these things don’t happen randomly—they will only emerge if you are prepared to invest a lot of time and energy.

All of this happens on a personal level. To use your example, somebody may conclude from his own life experience that living a life of purpose is more important to him than to live a life of happiness. How to formalize this process so that an AI could use a canonical way to achieve it (and infer somebody’s real values simply by observing) is beyond me. It would have to know a lot more about us than is comfortable for most of us.

• It’s now been about two years since I started seriously blogging. Most of my posts are on Lesswrong, and the most of the rest are scattered about on my substack and the Effective Altruist Forum, or on Facebook. I like writing, but I have an impediment which I feel impedes me greatly.

In short: I often post garbage.

Sometimes when I post garbage, it isn’t until way later that I learn that it was garbage. And when that happens, it’s not that bad, because at least I grew as a person since then.

But the usual case is that I realize that it’s garbage right after I’m done posting it, and then I keep thinking, “oh no, what have I done!” as the replies roll in, explaining to me that it’s garbage.

Most times when this happens, I just delete the post. I feel bad when this happens because I generally spend a lot of time writing and reviewing the posts. Some of the time, I don’t delete the post because I still stand by the main thesis, although the delivery or logical chain of reasoning was not very good and so I still feel bad about it.

I’m curious how other writers deal with this problem. I’m aware of “just stop caring” and “review your posts more.” But, I’m sometimes in awe of some people who seem to consistently never post garbage, and so maybe they’re doing something right that can be learned.

• I have a hope that with more practice, this gets better.

Not just practice, but also noticing what other people do differently. For example, I often write long texts, which some people say is already a mistake. But even a long text can be made more legible if it contains section headers and pictures. Both of them break the visual monotonicity of the text wall. This is why section headers are useful even if they are literally: “1”, “2″, “3”. In some sense, pictures are even better, because too many headers create another layer of monotonicity, which a few unique pictures do not. Which again suggests that having 1 photo, 1 graph, and 1 diagram is better than having 3 photos. I would say, write the text first, then think about which parts can be made clearer by adding a picture.

There is some advice on writing, by Stephen King, or by Scott Alexander.

If you post a garbage, let it be. Write more articles, and perhaps at the end of a year (or a decade) make a list “my best posts” which will not include the garbage.

BTW, whatever you do, you will get some negative response. Your posts on LW are upvoted, so I assume they are not too bad.

Also, writing can be imbalanced. Even for people who only write great texts, some of them are more great and some of them are less great than the others. But if they deleted the worst one, guess what, now some other articles is the worst one… and if you continue this way, you will stop with one or zero articles.

• Sometimes I send a draft to a couple people before posting it publicly.

Sometimes I sit on an idea for a while, then find an excuse to post it in a comment or bring it up in a conversation, get some feedback that way, and then post it properly.

I have several old posts I stopped endorsing, but I didn’t delete them; I put either an update comment at the top or a bunch of update comments throughout saying what I think now. (Last week I spent almost a whole day just putting corrections and retractions into my catalog of old posts.) I for one would have a very positive impression of a writer whose past writings were full of parenthetical comments that they were wrong about this or that. Even if the posts wind up unreadable as a consequence.

• Should effective altruists be praised for their motives, or their results?

It is sometimes claimed, perhaps by those who recently read The Elephant in the Brain, that effective altruists have not risen above the failures of traditional charity, and are every bit as mired in selfish motives as non-EA causes. From a consequentialist view, however, this critique is not by itself valid.

To a consequentialist, it doesn’t actually matter what one’s motives are as long as the actual effect of their action is to do as much good as possible. This is the primary difference between the standard way of viewing morality, and the way that consequentialists view it.

Now, if the critique was that by engaging in unconsciously selfish motives, we are systematically biasing ourselves away from recognizing the most important actions, then this critique becomes sound. Of course then the conversation shifts immediately towards what we can do to remedy the situation. In particular, it hints that we should set up a system which corrects our systematic biases.

Just as a prediction market corrects for systematic biases by rewarding those who predict well, and punishing those who don’t, there are similar ways to incentivize exact honesty in charity. One such method is to praise people in proportion to how much good they really acheive.

Previously, it has been argued in the philosophical literature that consequentialists should praise people for motives rather than results, because punishing someone for accidentally doing something bad when they legitimately meant to help people would do nothing but discourage people from trying to do good. While clearly containing a kernel of truth, this argument is nonetheless flawed.

Similar to how rewarding a student for their actual grades on a final exam will be more effective in getting them to learn the material than rewarding them merely for how hard they tried, rewarding effective altruists for the real results of their actions will incentivize honesty, humility, and effectiveness.

The obvious problem with the framework I have just proposed is that there is currently no such way to praise effective altruists in exact proportion to how effective they are. However, there are ways to approach this ideal.

In the future, prediction markets could be set up to predict the counterfactual result of particular interventions. Effective altruists that are able discover the most effective of these interventions, and act to create them, could be rewarded accordingly.

It is already the case that we can roughly estimate the near-term effects of anti-poverty charities, and thus get a sense as to how many lives people are saving by donating a certain amount of money. Giving people praise in proportion to how many lives they really save could be a valuable endeavor.

• Similar to how rewarding a student for their actual grades on a final exam will be more effective in getting them to learn the material than rewarding them merely for how hard they tried

Evidence for this?

• Hmm, I sort of assumed this was obvious. I suppose it depends greatly on how you can inspect whether they are actually trying, or whether they are just “trying.” It’s indeed probable that with sufficient supervision, you can actually do better by incentivizing effort. However, this method is expensive.

• Sometimes people will propose ideas, and then those ideas are met immediately after with harsh criticism. A very common tendency for humans is to defend our ideas and work against these criticisms, which often gets us into a state that people refer to as “defensive.”

According to common wisdom, being in a defensive state is a bad thing. The rationale here is that we shouldn’t get too attached to our own ideas. If we do get attached, we become liable to become crackpots who can’t give an idea up because it would make them look bad if we did. Therefore, the common wisdom advocates treating ideas as being handed to us by a tablet from the clouds rather than a product of our brain’s thinking habits. Taking this advice allows us to detach ourselves from our ideas so that we don’t confuse criticism with insults.

However, I think the exact opposite failure mode is not often enough pointed out and guarded against. Specifically, the failure mode is being too willing to abandon beliefs based on surface level counterarguments. To alleviate this I suggest we shouldn’t be so ready to give up our ideas in the face of criticism.

This might sound irrational—why should we get attached to our beliefs? I’m certainly not advocating that we should actually associate criticism with insults to our character or intelligence. Instead, my argument is that the process of defensively defending against criticism generates a productive adversarial structure.

Consider two people. Person A desperately wants to believe proposition X, and person B desperately wants to believe not X. If B comes up to A and says, “Your belief in X is unfounded. Here are the reasons...” Person A can either admit defeat, or fall into defensive mode. If A admits defeat, they might indeed get closer to the truth. On the other hand, if A gets into defensive mode, they might also get closer to the truth in the process of desperately for evidence of X.

My thesis is this: the human brain is very good at selective searching for evidence. In particular, given some belief that we want to hold onto, we will go to great lengths to justify it, searching for evidence that we otherwise would not have searched for if we were just detached from the debate. It’s sort of like the difference between a debate between two people who are assigned their roles by a coin toss, and a debate between people who have spent their entire lives justifying why they are on one side. The first debate is an interesting spectacle, but I expect the second debate to contain much deeper theoretical insight.

• A couple of relevant posts/​threads that come to mind:

• Just like an idea can be wrong, so can be criticism. It is bad to give up the idea, just because..

• someone rounded it up to the nearest cliche, and provided the standard cached answer;

• someone mentioned a scientific article (that failed to replicate) that disproves your idea (or something different, containing the same keywords);

• someone got angry because it seems to oppose their political beliefs;

• etc.

My “favorite” version of wrong criticism is when someone experimentally disproves a strawman version of your hypothesis. Suppose your hypothesis is “eating vegetables is good for health”, and someone makes an experiment where people are only allowed to eat carrots, nothing more. After a few months they get sick, and the author of the experiment publishes a study saying “science proves that vegetables are actually harmful for your health”. (Suppose, optimistically, that the author used sufficiently large N, and did the statistics properly, so there is nothing to attack from the methodological angle.) From now on, whenever you mention that perhaps a diet containing more vegetables could benefit someone, someone will send you a link to the article that “debunks the myth” and will consider the debate closed.

So, when I hear about research proving that parenting /​ education /​ exercise /​ whatever doesn’t cause this or that, my first reaction is to wonder how specifically did the researchers operationalize such a general word, and whether the thing they studied even resembles my case.

(And yes, I am aware that the same strategy could be used to refute any inconvenient statement, such as “astrology doesn’t work”—“well, I do astrology a bit differently than the people studied in that experiment, therefore the conclusion doesn’t apply to me”.)

• I keep wondering why many AI alignment researchers aren’t using the alignmentforum. I have met quite a few people who are working on alignment who I’ve never encountered online. I can think of a few reasons why this might be,

• People find it easier to iterate on their work without having to write things up

• People don’t want to share their work, potentially because they think a private-by-default policy is better.

• It is too cumbersome to interact with other researchers through the internet. In-person interactions are easier

• They just haven’t even considered from a first person perspective whether it would be worth it

• I’ve often wished that conversation norms shifted towards making things more consensual. The problem is that when two people are talking, it’s often the case that one party brings up a new topic without realizing that the other party didn’t want to talk about that, or doesn’t want to hear it.

Let me provide an example: Person A and person B are having a conversation about the exam that they just took. Person A bombed the exam, so they are pretty bummed. Person B, however, did great and wants to tell everyone. So then person B comes up to person A and asks “How did you do?” fully expecting to brag the second person A answers. On it’s own, this question is benign. This happens frequently without question. On the other hand, if person B had said, “Do you want to talk about the exam?” person A might have said “No.”

This problem can be alleviated by simply asking people whether they want to talk about certain things. For sensitive topics, like politics and religion, this is already the norm in some places. I think it can be taken further. I suggest the following boundaries, and could probably think of more if pressed:

• Ask someone before sharing something that puts you in a positive light. Make it explicit that you are bragging. For example, ask “Can I brag about something?” before doing so.

• Ask someone before talking about something that you know there’s a high variance of difficulty and success. This applies to a lot of things: school, jobs, marathon running times.

• Have you read the posts on ask, tell, and guess culture? They feel highly related to this idea.

• The problem is, if a conversational topic can be hurtful, the meta-topic can be too. “do you want to talk about the test” could be as bad or worse than talking about the test, if it’s taken as a reference to a judgement-worthy sensitivity to the topic. And “Can I ask you if you want to talk about whether you want to talk about the test” is just silly.

Mr-hire’s comment is spot-on—there are variant cultural expectations that may apply, and you can’t really unilaterally decide another norm is better (though you can have opinions and default stances).

The only way through is to be somewhat aware of the conversational signals about what topics are welcome and what should be deferred until another time. You don’t need prior agreement if you can take the hint when an unusually-brief non-response is given to your conversational bid. If you’re routinely missing hints (or seeing hints that aren’t), and the more direct discussions are ALSO uncomfortable for them or you, then you’ll probably have to give up on that level of connection with that person.

• “do you want to talk about the test” could be as bad or worse than talking about the test, if it’s taken as a reference to a judgement-worthy sensitivity to the topic

I agree. Although if you are known for asking those types of questions maybe people will learn to understand you never mean it as a judgement.

And “Can I ask you if you want to talk about whether you want to talk about the test” is just silly.

True, although I’ll usually take silly over judgement any day. :)

• Reading through the recent Discord discussions with Eliezer, and reading and replying to comments, has given me the following impression of a crux of the takeoff debate. It may not be the crux. But it seems like a crux nonetheless, unless I’m misreading a lot of people.

Let me try to state it clearly:

The foom theorists are saying something like, “Well, you can usually-in-hindsight say that things changed gradually, or continuously, along some measure. You can use these measures after-the-fact, but that won’t tell you about the actual gradual-ness of the development of AI itself, because you won’t know which measures are gradual in advance.”

And then this addendum is also added, “Furthermore, I expect that the quantities which will experience discontinuities from the past will be those that are qualitatively important, in a way that is hard to measure. For example, ‘ability to manufacture nanobots’ or ‘ability to hack into computers’ are qualitative powers that we can expect AIs will develop rather suddenly, rather than gradually from precursor states, in the way that, e.g. progress in image classification accuracy was gradual over time. This means you can’t easily falsify the position by just pointing to straight lines on a million graphs.”

If you agree that foom is somewhat likely, then I would greatly appreciate if you think this is your crux, or if you think I’ve missed something.

If this indeed falls into one of your cruxes, then I feel like I’m in a position to say, “I kinda know what motivates your belief but I still think it’s probably wrong” at least in a weak sense, which seems important.

• I lean toward the foom side, and I think I agree with the first statement. The intuition for me is that it’s kinda like p-hacking (there are very many possible graphs, and some percentage of those will be gradual), or using a log-log plot (which makes everything look like a nice straight line, but are actually very broad predictions when properly accounting for uncertainty). Not sure if I agree with the addendum or not yet, and I’m not sure how much of a crux this is for me yet.

• There have been a few posts about the obesity crisis here, and I’m honestly a bit confused about some theories that people are passing around. I’m one of those people thinks that the “calories in, calories” (CICO) theory is largely correct, relevant, and helpful for explaining our current crisis.

I’m not actually sure to what extent people here disagree with my basic premises, or whether they just think I’m missing a point. So let me be more clear.

As I understand, there are roughly three critiques you can have against the CICO theory. You can think it’s,

(1) largely incorrect
(2) largely irrelevant
(3) largely just smugness masquerading as a theory

I think that (1) is simply factually wrong. In order for the calorie intake minus expenditure theory to be factually incorrect, scientists would need to be wrong about not only minor details, but the basic picture concerning how our metabolism works. Therefore, I assume that the real meat of the debate is in (2) and (3).

Yet, I don’t see how (2) and (3) are defensible either. As a theory, CICO does what it needs to do: compellingly explains our observations. It provides an answer to the question, “Why are people obese at higher rates than before?”, namely, “They are eating more calories than before, or expending fewer calories, or both.”

I fully admit that CICO doesn’t provide an explanation for why we eat more calories before, but it never needed to on its own. Theories don’t need to explain everything to be useful. And I don’t think many credible people are claiming that “calories in, calories out” was supposed to provide a complete picture of what’s happening (theories rarely explain what drives changes to inputs in the theory). Instead, it merely clarifies the mechanism of why we’re in the current situation, and that’s always important.

It’s also not about moral smugness, any more than any other epistemic theory. The theory that quitting smoking improves one’s health does not imply that people who don’t quit are unvirtuous, or that the speaker is automatically assuming that you simply lack willpower. Why? Because is and ought are two separate things.

CICO is about how obesity comes about. It’s not about who to blame. It’s not about shaming people for not having willpower. It’s not about saying that you have sinned. It’s not about saying that we ought to individually voluntarily reduce our consumption. For crying out loud, it’s an epistemic theory not a moral one!

To state the obvious, without clarifying the basic mechanism of how a phenomenon works in the world, you’ll just remain needlessly confused.

Imagine if people all around the world people were getting richer (as measured in net worth), and we didn’t know why. To be more specific, suppose we didn’t understand the “income minus expenses” theory of wealth, so instead we went around saying things like, “it could be the guns”, “it could be factories”, “it could be the that we have more computers.” Now, of course, all of these explanations could play a role in why we’re getting richer over time, but none of them make any sense without connecting them to the “income minus expenses theory.”

To state “wealth is income minus expenses” does not in any way mean that you are denying how guns, factories, and computers might play a role in wealth accumulation. It simply focuses the discussion on ways that those things could act through the basic mechanism of how wealth operates.

If your audience already understands that this is how wealth works, then sure, you don’t need to mention it. But in the case of the obesity debate, there are a ton of people who don’t actually believe in CICO; in other words, there are a considerable number of people who firmly believe critique (1). Therefore, refusing to clarify how your proposed explanation connects to calories, in my opinion, generates a lot of unnecessary confusion.

As usual, the territory is never mysterious. There are only brains who are confused. If you are perpetually confused by a phenomenon, that is a fact about you, and not the phenomenon. There does not in fact need to be a complicated, clever mechanism that explains obesity that all researchers have thus far missed. It could simply be that the current consensus is correct, and we’re eating too many calories. The right question to ask is what we can do to address that.

• How it seems to be typically used, literal CICO as an observation is the motte, and the corresponding bailey is something like: “yes, it is simple to lose weight, you just need to stop eating all those cakes and start exercising, but this is the truth you don’t want to hear so you keep making excuses instead”.

How do you feel about the following theory: “atoms in, atoms out”? I mean, this one should be scientifically even less controversial. So why do you prefer the version with calories over the version with atoms? From the perspective of “I am just saying it, because it is factually true, there is no judgment or whatever involved”, both theories are equal. What specifically is the advantage of the version with calories?

(My guess is that the obvious problem with the “atoms in, atoms out” theory is that the only actionable advice it hints towards is to poop more, or perhaps exhale more CO2… but the obvious problem with such advice is that the fat people do not have conscious control over extracting fat from their fat cells and converting it to waste. Otherwise, many would willingly convert and poop it out in one afternoon and have their problem solved. Well, guess what, the “calories in, calories out” has exactly the same problem, only in less obvious form: if your metabolism decides that it is not going to extract fat from your fat cells and convert it to useful energy which could be burned in muscles, there is little you can consciously do about it; you will spend the energy outside of your fat cells, then you are out of useful energy, end of story, some guy on internet unhelpfully reminding you that you didn’t spend enough calories.)

• What specifically is the advantage of the version with calories?

Well, let me consider a recent, highly upvoted post on here: A Contamination Theory of the Obesity Epidemic. In it, the author says that the explanation for the obesity crisis can’t be CICO,

“It’s from overeating!”, they cry. But controlled overfeeding studies (from the 1970′s—pre-explosion) struggle to make people gain weight and they loose it quickly once the overfeeding stops. (Which is evidence against a hysteresis theory.)

“It’s lack of exercise”, they yell. But making people exercise doesn’t seem to produce significant weight loss, and obesity is still spreading despite lots of money and effort being put into exercise.

If CICO is literally true, in the same way that the “atoms in, atoms out” theory is true, then this debunking is very weak. The obesity epidemic must be due to either overeating or lack of exercise, or both.

The real debate is, of course, over which environmental factors caused us to eat more, or exercise less. But if you don’t even recognize that the cause must act through this mechanism, then you’re not going to get very far in your explanation. That’s how you end up proposing that it must be some hidden environmental factor, as this post does, rather than more relevant things related to the modern diet.

My own view is that the most likely cause of our current crisis is that modern folk have access to more and a greater variety of addicting processed food, so we end up consistently overeating. I don’t think this theory is obviously correct, and of course it could be wrong. However, in light of the true mechanism behind obesity, it makes a lot more sense to me than many other theories that people have proposed, especially any that deny we’re overeating from the outset.

• Well, here is the point where we disagree. My opinion is that CICO, despite being technically true, focuses your attention on eating and exercise as the most relevant causes of obesity. I agree with the statement “calories in = calories out” as observation. I disagree with the conclusion that the most relevant things for obesity are how much you eat and how much you exercise. And my aversion against CICO is that it predictably leads people to this conclusion. As you have demonstrated right now.

I am not an expert, but here are a few questions that I think need to be answered in order to get a “gears model” of obesity. See how none of them contradicts CICO, but they all cast doubt on the simplistic advice to “just eat less and exercise more”.

• when you put food in your mouth, what mechanism decides which nutrients enter the bloodstream and which merely pass the digestive system and get out of the body?

• when the nutrient are in the blodstream, what mechanism decides which of them are used to build/​repair cells, which are stored as energy sources in muscles, and which are stored as energy reserves in fat cells?

• when the energy reserves are in the fat cells, what mechanism decides whether they get released into the bloodstream again?

• (probably some more important questions I forgot now)

When people talk about “metabolic privilege”, they roughly mean that some people are lucky that for some reason, even if they eat a lot, it does not result in storing fat in fat cells. I am not sure what exactly happens instead; whether the nutrients get expelled from the body, or whether the metabolism stubbornly stores them in muscles and refuses to store them in the fat cells, so that the person feels full of energy all day long. Those people can overeat as much as they can, and yet they don’t get weight.

Then you have the opposite type of people, whose metabolism stubbornly refuses to release the fat from fat cells, no matter how much they starve or how much they try to exercise. Eating just slightly more than appropriate results immediately in weight gain. (In extreme cases, if they try to starve, they will just get weak and maybe fall in coma, but they still won’t lose a single kilogram.)

The obvious question is what separates these two groups of people, and what can be done if you happen to be in the latter? The simplistic response “calories in, calories out” provides absolutely no answer to this, it is just a smug way to avoid the question and pretend that it does not matter.

Sometimes this changes with age. In my 20s, I could eat as much as I wanted, and I barely ever exercised, yet my body somehow handled the situation without getting much overweight. In my 40s, I can do cardio and weightlifting every day, and barely eat anything other than fresh vegetables, and the weight only goes down at a microscopic speed, and if I ever eat a big lunch again (not a cake, just a normal lunch) the weight immediately jumps back. The “calories in, calories out” model neither predicts this, nor offers a solution. It doesn’t even predict that when I try some new diet, sometimes I lose a bit weight during the first week, but then I get it back the next week, despite doing the same thing both weeks. I do eat less and exercise more than I did in the past, yet I keep gaining weight.

Now, this is generally known that age makes weight loss way more difficult. But the specific mechanism is something more than just eating more and exercising less, because it happens even if you eat less and exercise more. And if this works differently for the same person at a different age, it seems plausible that it can also work differently for two different people at the same age. In the search for the specific mechanism, the answer “calories in, calories out” is an active distraction.

• To clarify, there are two related but separate questions about obesity that are worth distinguishing,

1. What explains why people are more obese than 50 years ago? And what can we do about it?

2. What explains why some people are more obese than others, at a given point of time? And what can we do about it?

In my argument, I was primarily saying that CICO was important for explaining (1). For instance, I do not think that the concept of metabolic privilege can explain much of (1), since 50 years is far too little of time for our metabolisms to evolve in such a rapid and widespread manner. So, from that perspective, I really do think that overconsumption and/​or lack of exercise are the important and relevant mechanisms driving our current crisis. And further, I think that our overconsumption is probably related to processed food.

I did not say much about (2), but I can say a little about my thoughts now. I agree that people vary in how “fast” their metabolisms expend calories. The most obvious variation is, as you mentioned, the difference between the youthful metabolism and the metabolism found in older people.

However...

Then you have the opposite type of people, whose metabolism stubbornly refuses to release the fat from fat cells, no matter how much they starve or how much they try to exercise… (In extreme cases, if they try to starve, they will just get weak and maybe fall in coma, but they still won’t lose a single kilogram.)

I don’t think these people are common, at least in a literal sense. Obesity is very uncommon in pre-industrialized cultures, and in hunter-gatherer settings. I think this is very strong evidence that it is feasible for the vast majority of people to be non-obese under the right environmental circumstances (though feasible does not mean easy, or that it can be done voluntarily in our current world). I also don’t find personal anecdotes from people about the intractability of losing weight compelling, given this strong evidence.

Furthermore, in addition to the role of metabolism, I would also point to the role of cognitive factors like delayed gratification in explaining obesity. You can say that this is me just being “smug” or “blaming fat people for their own problems” but this would be an overly moral interpretation of what I view as simply an honest causal explanation. A utilitarian might say that we should only blame people for things that they have voluntary control over. So in light of the fact that cognitive abilities are largely outside of our control, I would never blame an obese person for their own condition.

Instead of being moralistic, I am trying to be honest. And being honest about the cause of a phenomenon allows us to invent better solutions than the ones that exist. Indeed, if weight loss is a simple matter of overconsumption, and we also admit that people often suffer from problems of delayed gratification, then I think this naturally leads us to propose medical interventions like bariatric surgery or weight loss medication—both of which have a much higher chance of working than solutions rooted in a misunderstanding of the real issue.

• Just shortly, because I am really not an expert on this, so debating longly feels inappropriate (it feels like suggesting that I know more than I actually do).

What explains why people are more obese than 50 years ago?

I still feel like there are at least two explanations here. Maybe it is more food and less hard work, in general. Or maybe it is something in the food that screws up many (but not all) people’s metabolism.

Like, maybe some food additive that we use because it improves the taste, also has an unknown side effect of telling people’s bodies to prioritize storing energy in fat cells over delivering it to muscles. And if the food additive is only added to some type of foods, or affects only people with certain genes, that might hypothetically explain why some people get fat and some don’t.

Now, I am probably not the first person to think about this—if it is about lifestyle, then perhaps we should see clear connection between obesity and profession. To put it bluntly, are people working in offices more fat than people doing hard physical work? I admit I never actually paid attention to this.

• Maybe it is more food and less hard work, in general. Or maybe it is something in the food that screws up many (but not all) people’s metabolism.

I’m with you that it probably has to do with what’s in our food. Unlike some, however, I’m skeptical that we can nail it down to “one thing”, like a simple additive, or ingredient. It seems most likely to me that companies have simply done a very good job optimizing processed food to be addicting, in the last 50 years. That’s their job, anyway.

Scott Alexander reviewed a book from Stephan Guyenet about this hypothesis, and I find it quite compelling.

Now, I am probably not the first person to think about this—if it is about lifestyle, then perhaps we should see clear connection between obesity and profession. To put it bluntly, are people working in offices more fat than people doing hard physical work? I admit I never actually paid attention to this.

That’s a good question. I haven’t looked into this, and may soon. My guess is that you’d probably have to adjust for cognitive confounders, but after doing so I’d predict that people in highly physically demanding professions tend to be thinner and more fit (in the sense of body fat percentage, not necessarily BMI). However, I’d also suspect that the causality may run in the reverse direction; it’s a lot easier to exercise if you’re thin.

• There are viruses that get people to gain weight. They might do that by getting people to eat more. They might also do that by getting people to burn less calories.

The hypothesis that viruses are responsible for the obesity epidemic is a possible one. If it would be the main cause literal CICO or Mass-In-Mass-out would still be correct but not very useful when thinking about how to combat the epidemic.

The virus hypothesis has for example the advantage that it explains why the lab animals with controlled diets also gained weight and not just the humans who have a free choice about what to eat in a world with more processed food.

Overeating due to addicting processed food also doesn’t explain why people fail so often at diets and regain their weight. In that model it would be easier to lose weight longterm by avoiding processed food.

The obesity epidemic must be due to either overeating or lack of exercise, or both.

No, the healthy body has plenty of different ways to burn calories then exercise and is willing to use them to stay at a constant weight.

• A lot of processes in the body are cybernetic in nature. There’s a target value and then the body tries to maintain that target. The body both has indirect ways to maintain the target by setting hunger, adrenalin or up/​down-regulate a variety of metabolic processes.

Herman Pontzer work about how exercising more often doesn’t result in net calorie burn because the body downregulates metabolic processes to safe energy.

Calorie-in-calorie-out also isn’t great at explaining the weight gain in lab animals with a controlled diet.

To state “wealth is income minus expenses” does not in any way mean that you are denying how guns, factories, and computers might play a role in wealth accumulation.

That model doesn’t explain why Jeff Bezos or Elon Musk are so rich because both have very little income compared to the wealth they have.

• On the one hand, CICO is obviously true, and any explanation of obesity that doesn’t contain CICO somewhere is missing an important dynamic.

But the reason why I think CICO is getting grilled so much lately, is that it’s far from the most important piece of the puzzle, and people often cite CICO as if it were the main factor. Biological and psychological explanations for why CI > CO at healthy BMIs (thereby leading BMI to increase until it becomes unhealthy) are more important than simply observing that weight will increase when CI > CO. Note that this can be formulated without any reference to CICO, although I used a formulation here that did use CICO.

• A common heuristic argument I’ve seen recently in the effective altruism community is the idea that existential risks are low probability because of what you could call the “People really don’t want to die” (PRDWTD) hypothesis. For example, see here,

People in general really want to avoid dying, so there’s a huge incentive (a willingness-to-pay measured in the trillions of dollars for the USA alone) to ensure that AI doesn’t kill everyone.

(Note that I hardly mean to strawman MacAskill here. I’m not arguing against him per se)

According to the PRDWTD hypothesis, existential risks shouldn’t be anything like war because in war you only kill your enemies, not yourself. Existential risks are rare events that should only happen if all parties made a mistake despite really really not wanting to. However, as plainly stated, it’s not clear to me whether this hypothesis really stands up to the evidence.

Strictly speaking, the thesis is obviously false. For example, how does the theory explain the facts that

• When you tell most people about life extension, even probably billionaires who could do something about it, they don’t really care and come up with excuses about why life extension wouldn’t be that good anyway. Same with cryonics, and note I’m not just talking about people who think that cryonics is low probability: there are many people who think that it’s a significant probability but still don’t care.

• The base rate of a leader dying is higher if they enter a war, yet historically leaders have been quite willing to join many conflicts. By this theory, Benito Mussolini, Hideki Tojo and Hitler apparently really really wanted to live, but entered a global conflict anyway that could have very reasonably (and in fact did) end in all of their deaths. I don’t think this is a one-off thing either.

• I have met very few people who have researched micromorts before and purposely used them to reduce the risk of their own deaths from activities. When you ask people to estimate the risks of certain activities, they will often be orders of magnitudes off, indicating that they don’t really care that much about accurately estimating these facts.

• As I said two days ago, few people seemed concerned by the coronavirus. Now I get it: there’s not much you can do to personally reduce your own death, and so actually stressing about it is pointless. But there also wasn’t much you could do to reduce your death after 9/​11 and that didn’t stop people from freaking out. Therefore, if the theory you appeal to is that people don’t care about things they have no control over then your theory is false.

• Obesity is a common concern in America, with 39.8% of adults here being obese, despite the fact that obesity is probably the number one contributor to death besides aging, and it’s much more controllable. I understand that it’s really hard for people to lose weight, and I don’t mean to diminish people’s struggles. There are solid reasons why it’s hard to avoid being obese for many people, but the same could also be true of existential risks.

I understand that you can clarify the hypothesis by talking about “artificially induced deaths” or some other reference class of events that fits the evidence I have above better. My point is just that you shouldn’t state “people really don’t want to die” without that big clarification, because otherwise I think it’s just false.

• People clearly DO want to die - 2.2 billion dollars of actual spending (not theoretical “willingness to pay”) on alcohol in the US in 2018. • Yeah similar to obesity, people seem quite willing to cave into their desires. I’d be interesting in knowing what the long-term effects of daily alcohol consumption are, though, because some sources have told me that it isn’t that bad for longevity. [ETA: The Wikipedia page is either very biased, or strongly rejects my prior sources!] • After writing the post on using transparency regularization to help make neural networks more interpretable, I have become even more optimistic that this is a potentially promising line of research for alignment. This is because I have noticed that there are a few properties about transparency regularization which may allow it to avoid some pitfalls of bad alignment proposals. To be more specific, in order for a line of research to be useful for alignment, it helps if • The line of research doesn’t require unnecessarily large amounts of computations to perform. This would allow the technique to stay competitive, reducing the incentive to skip safety protocols. • It doesn’t require human models to work. This is useful because • Human models are blackboxes and are themselves mesa-optimizers • We would be limited primarily to theoretical work in the present, since human cognition is expensive to obtain. • Each part of the line of research is recursively legible. That is, if we use the technique on our ML model, we should expect that the technique itself can be explained without appealing to some other black box. Transparency regularization meets these three criterion respectively, because • It doesn’t need to be astronomically more expensive than more typical forms of regularization • It doesn’t necessarily require human-level cognitive parts to get working. • It is potentially quite simple mathematically, and so definitely meets the recursively legible criterion. • Forgive me for cliche scientism, but I recently realized that I can’t think of any major philosophical developments in the last two centuries that occurred within academic philosophy. If I were to try to list major philosophical achievements since 1819, these would likely appear on my list, but none of them were from those trained in philosophy: • A convincing, simple explanation for the apparent design we find in the living world (Darwin and Wallace). • The unification of time and space into one fabric (Einstein) • A solid foundation for axiomatic mathematics (Zermelo and Fraenkel). • A model of computation, and a plausible framework for explaining mental activity (Turing and Church). By contrast, if we go back to previous centuries, I don’t have much of an issue citing philosophical achievements from philosophers: • The identification of the pain-pleasure axis as the primary source of value (Bentham). • Advanced notions of causality, reductionism, scientific skepticism (Hume) • Extension of moral sympathies to those in the animal kingdom (too many philosophers to name) • A highlight of the value of wisdom and learned debate (Socrates, and others) Of course, this is probably caused my by bias towards Lesswrong-adjacent philosophy. If I had to pick philosophers who have made major contributions, these people would be on my shortlist: John Stuart Mill, Karl Marx, Thomas Nagel, Derek Parfit, Bertrand Russell, Arthur Schopenhauer. • I would name the following: • My impression is that academic philosophy has historically produced a lot of good deconfusion work in metaethics (e.g. this and this), as well as some really neat negative results like the logical empiricists’ failed attempt to construct a language in which verbal propositions could be cached out/​analyzed in terms of logic or set theory in a way similar to how one can cache out/​analyze Python in terms of machine code. In recent times there’s been a lot of (in my opinion) great academic philosophy done at FHI. • Those are all pretty good. :) • Wow! You left out the whole of analytical philosophy! • I’m not saying that I’m proud of this fact. It is mostly that I’m ignorant of it. :) • The development of modern formal logic (predicate logic, modal logic, the equivalence of higher-order logics and set-theory, etc.), which is of course deeply related to Zermelo, Fraenkel, Turing and Church, but which involved philosophers like Quine, Putnam, Russell, Kripke, Lewis and others. • The model of scientific progress as proceeding via pre-paradigmatic, paradigmatic, and revolutionary stages (from Kuhn, who wrote as a philosopher, though trained as a physicist) • The identification of the pain-pleasure axis as the primary source of value (Bentham). I will mark that I think this is wrong, and if anything I would describe it as a philosophical dead-end. Complexity of value and all of that. So listing it as a philosophical achievement seems backwards to me. • I might add that I also consider the development of ethical anti-realism to be another, perhaps more insightful, achievement. But this development is, from what I understand, usually attributed to Hume. Depending on what you mean by “pleasure” and “pain” it is possible that you merely have a simple conception of the two words which makes this identification incompatible with complexity of value. The robust form of this distinction was provided by John Stuart Mill who identified that some forms of pleasure can be more valuable than others (which is honestly quite similar to what we might find in the fun theory sequence...). In its modern formulation, I would say that Bentham’s contribution was identifying conscious states as being the primary theater for which value can exist. I can hardly disagree, as I struggle to imagine things in this world which could possibly have value outside of conscious experience. Still, I think there are perhaps some, which is why I conceded by using the words “primary source of value” rather than “sole source of value.” To the extent that complexity of value disagrees with what I have written above, I incline to disagree with complexity of value :). • (I think you and habryka in fact disagree pretty deeply here) • Then I will assert that I would in fact appreciate seeing the reasons for disagreement, even as the case may be that it comes down to axiomatic intuitions. • Rationalists are fond of saying that the problems of the world are not from people being evil, but instead a result of the incentives of our system, which are such that this bad outcome is an equilibrium. There’s a weaker thesis here that I agree with, but otherwise I don’t think this argument actually follows. In game theory, an equilibrium is determined by both the setup of the game, and by the payoffs for each player. The payoffs are basically the values of the players in the game—their utility functions. In other words, you get different equilibria if players adopt different values. Problems like homelessness are caused by zoning laws, yes, but they’re also caused by people being selfish. Why? Because lots of people could just voluntarily donate their wealth to help homeless people. Anyone with a second house could decide to give it away. Those with spare rooms could simply rent them out for free. There are no laws saying you must spend your money on yourself. A simple economic model would predict that if we redistributed everyone’s extra housing, then this would reduce the incentive to create new housing. But look closer at the assumptions in that economic model. We say that the incentives to build new housing are reduced because few people will pay to build a house if they don’t get to live in it or sell it to someone else. That’s another way of assuming that people value their own consumption more than that of others—another way of saying that people are selfish. More fundamentally, what it means for something to be an incentive is that it helps people get what they want. Incentives, therefore, are determined by people’s values; they are not separate from them. A society of saints would have different equilibria than a society of sinners, even if both are playing the same game. So, it really is true that lots of problems are caused by people being bad. Of course, there’s an important sense in which rationalists are probably right. Assume that we can change the system but we can’t change people’s values. Then, pragmatically, the best thing would be to change the system, rather than fruitlessly try to change people’s values. Yet it must be emphasized that this hypothesis is contingent on the relative tractability of either intervention. If it becomes clear that we can genuinely make people less selfish, then that might be a good thing to try. My main issue with attempts to redesign society in order to make people less selfish or more cooperative is that you can’t actually change people’s innate preferences by very much. The most we can reasonably hope for is to create a system in which people’s selfish values are channeled to produce social good. That’s not to say it wouldn’t be nice if we could change people’s innate preferences. But we can’t (yet). (Note that I wrote this as a partial response to jimrandomh’s shortform post, but the sentiment I’m responding to is more general than his exact claim.) • The connection between “doing good” and “making a sacrifice” is so strong that people need to be reminded that “win/​win” is also a thing. The bad guys typically do whatever is best for them, which often involves hurting others (because some resources are limited). The good guys exercise restraint. This is complicated because there is also the issue of short-term and long-term thinking. Sometimes the bad guys do things that benefit them in short term, but contribute to their fall in long term; while the good guys increase their long-term gains by strategically giving up on some short-term temptations. But it is a just-world fallacy to assume that things always end up this way. Sometimes the bad guys murder millions, and then they live happily to old age. Sometimes the good guys get punished and laughed at, and then they die in despair. How could “good” even have evolved, given that “sacrifice” seems by definition incompatible with “maximizing fitness”? • being good to your relatives promotes your genes. • reciprocal goodness can be an advantage to both players. • doing good—precisely because it is a sacrifice—can become a signal of abundance, which makes other humans want to be my allies or mates. • people reward good and punish evil in others, because it is in their selfish interest to live among good people. The problems caused by the evolutionary origin of goodness are also well-known: people are more likely to be good towards their neighbors who can reciprocate or towards potential sexual partners, and they are more likely to do good when they have an audience who approves of it… and less likely to do good to low-status people who can’t reciprocate, or when their activities are anonymous. (Steals money from pension funds, polutes the environment, then donates millions to a prestigious university.) I assume that most people are “instinctively good”, that is that they kinda want to be good, but they simply follow their instincts, and don’t reflect much on them (other than rationalizing that following their instinct was good, or at least a necessary evil). Their behavior can be changed by things that affect their instincts—the archetypal example is the belief in an omniscient judging God, i.e. a powerful audience who sees all behavior, and rewards/​punishes according to social norms (so now the only problem is how to make those social norms actually good). I am afraid that this ship has sailed, and that we do not really have a good replacement—any non-omniscient judge can be deceived, and any reward mechanism will be Goodharted. Another problem is that by trying to make society more tolerant and more governed by law, we also take away people’s ability to punish evil… as long as the evil takes care to only do evil acts that are technically legal, or when there is not enough legal evidence of wrongdoing. Assuming we have a group of saints (who have the same values, and who trust each other to be saints), I am not even sure what would be the best strategy for them. Probably to cooperate with each other a lot, because there is no risk of being stabbed in the back. Try to find other saints, test them, and then admit them to the group. Notice good acts among non-saints and reward them somehow—maybe in form of lottery, when most good acts only get a “thank you”, but one in a million gets a million-dollar reward. (People overestimate their chances in lottery. This would lead them to overestimate how likely a good act is to be rewarded, which would make them do more good.) The obvious problem with rewarding good acts is that it rewards visibility; perhaps there should be a special rewards for good acts that were unlikely to get noticed. The good acts should get a social reward, i.e. telling other people about the good act and how someone was impressed. (The sad thing is that given that we live in a clickbait society, it would not take much time until someone would publish an article about how X-ist the saints are, because the proportion of Y’s they rewarded for good deeds is not the same as the proportion of Y’s in the society. Also, this specific person rewarded for this specific good deed also happens to hold some problematic opinions, does this mean that the saints secretly support the opinion, too?) I sometimes like to imagine a soft version of karma, like if people would be free to associate with people who are like them, then the good people would associate with other good people, the bad people would associate with other bad people, and then the bad people would suffer (because surrounded by bad people), and the good people would live nice lives (because surrounded by good people). The problem with this vision is that people are not so free to choose their neighbors (coordination is hard, moving is expensive), and also that the good people who suck at judging other people’s goodness would suffer. Not sure what is the right approach here, other than perhaps we should become a bit more judgmental, because it seems the pendulum has swung too much in the direction that you are not even allowed to criticize [an obviously horrible thing] out of concern that some culture might routinely [do the horrible thing], which would get you called out as intolerant, which is a sin much worse than [doing the horrible thing]. I’d like people to get some self-respect and say “hey, these are my values, if you disagree, fuck off”. But this of course assumes that the people who disagree actually have a place to go. Another problem is that you cannot build an archipelago, if the land is scarce, and your solution to conflicts is to walk away. (Also, a fraction of people are literally psychopaths, so even if we devised a set of nudges to make most people behave good, it would not apply to everyone. To make someone behave good out of mere rational self-interest, they would have to believe that almost all evil deeds get detected and punished, which is very difficult to achieve.) • I usually associate things like “being evil” more with something like “part of my payoff matrix has a negative coefficient on your payoff matrix”. I.e. actively wanting to hurt people and taking inherent interest in making them worse off. Selfishness feels pretty different from being evil emotionally, at least to me. • Judgement of evil follows the same pressures as evil itself. Selfishness feels different from sadism to you, at least in part because it’s easier to find cooperative paths with selfishness. And this question really does come down to “when should I cooperate vs defect”. • If your well-being has exactly zero value in my preference function, that literally means that I would kill you in a dark alley if I believed there was zero chance of being punished, because there is a chance you might have some money that I could take. I would call that “evil”, too. • You can’t hypothesize zeros and get anywhere. MANY MANY psychopaths exist, and very few of them find it more effective to murder people for spare change than to further their ends in other ways. They may not care about you, but your atoms are useful to them in their current configuration. • They may not care about you, but your atoms are useful to them in their current configuration. There are ways of hurting people other than stabbing them, I just used a simple example. I think there is a confusion about what exactly “selfish” means, and I blame Ayn Rand for it. The heroes in her novels are given the label “selfish” because they do not care about possibilities to actively do something good for other people unless there is also some profit for them (which is what a person with zero value for others in their preference function would do), but at the same time they avoid actively harming other people in ways that could bring them some profit (which is not what a perfectly selfish person would do). As a result, we get quite unrealistic characters who on one hand are described as rational profit maximizers who don’t care about others (except instrumentally), but on the other hand they follow an independently reinvented deontological framework that seems like designed by someone who actually cares about other people but is in deep denial about it (i.e. Ayn Rand). A truly selfish person (someone who truly does not care about others) would hurt others in situations where doing so is profitable (including second-order effects). A truly selfish person would not arbitrarily invent a deontological code against hurting other people, because such code is merely a rationalization invented by someone who already has an emotional reason not to hurt other people but wants to pretend that instead this is a logical conclusion derived from first principles. Interacting with a psychopath with likely get you hurt. It will likely not get you killed, because some other way of hurting you has a better risk:benefit profile. Perhaps the most profitable way is to scam you of some money and use you to get introduced to your friends. Only once in a while a situation will arise when raping someone is sufficiently safe, or killing someone is extremely profitable, e.g. because that person stands in a way of a grand business. • I’m not sure what our disagreement actually is—I agree with your summary of Ayn Rand, I agree that there are lots of ways to hurt people without stabbing. I’m not sure you’re claiming this, but I think that failure to help is selfish too, though I’m not sure it’s comparable with active harm. It may be that I’m reacting badly to the use of “truly selfish”—I fear a motte-and-bailey argument is coming, where we define it loosely, and then categorize actions inconsistently as “truly selfish” only in extremes, but then try to define policy to cover far more things. I think we’re agreed that the world contains a range of motivated behaviors, from sadistic psychopaths (who have NEGATIVE nonzero terms for others’ happiness) to saints (whose utility functions weight very heavily toward other’s happiness over their own). I don’t know if we agree that “second-order effect” very often dominate the observed behaviors over most of this range. I hope we agree that almost everyone changes their behavior to some extent based on visible incentives. I still disagree with your post that a coefficient of 0 for you in someone’s mind implies murder for pocket change. And I disagree with the implication that murder for pocket change is impossible even if the coefficient is above 0 - circumstances matter more than innate utility function. To the OP’s point, it’s hard to know how to accomplish “make people less selfish”, but “make the environment more conducive to positive-sum choices so selfish people take cooperative actions” is quite feasible. • I still disagree with your post that a coefficient of 0 for you in someone’s mind implies murder for pocket change. I believe this is exactly what it means, unless there is a chance of punishment or being hurt by victim’s self-defense or a chance of better alternative interaction with given person. Do you assume that there is always a more profitable interaction? (What if the target says “hey, I just realized that you are a psychopath, and I do not want to interact with you anymore”, and they mean it.) Could you please list the pros and cons of deciding whether to murder a stranger who refuses to interact with you, if there is zero risk of being punished, from the perspective of a psychopath? As I see it, the “might get some pocket change” in the pro column is the only nonzero item in this model. • unless there is a chance of punishment or being hurt by victim’s self-defense or a chance of better alternative interaction with given person. There always is that chance. That’s mostly our disagreement. Using real-world illustrations (murder) for motivational models (utility) really needs to acknowledge the uncertainty and variability, which the vast majority of the time “adds up to normal”. There really aren’t that many murders among strangers. And there are a fair number of people who don’t value others’ very highly. • Yes, I would make this distinction too. Yet, I submit that few people actually believe, or even say they believe, that the main problems in the world are caused by people being gratuitously or sadistically evil. There are some problems that people would explain this way: violent crime comes to mind. But I don’t think the evil hypothesis is the most common explanation given by non-rationalists for why we have, say, homelessness and poverty. That is to say that, insofar as the common rationalist refrain of “problems are caused by incentives dammit, not evil people” refers to an actual argument people generally give, it’s probably referring to the argument that people are selfish and greedy. And in that sense, the rationalists and non-rationalists are right: it’s both the system and the actors within it. • I’ve heard a surprising number of people criticize parenting recently using some pretty harsh labels. I’ve seen people call it a form of “Stockholm syndrome” and a breach of liberty, morally unnecessary etc. This seems kind of weird to me, because it doesn’t really match my experience as a child at all. I do agree that parents can sometimes violate liberty, and so I’d prefer a world where children could break free from their parents without penalties. But I also think that most children genuinely love their parents and so wouldn’t want to do so. I think if you deride this as merely “Stockholm syndrome” then you are unfairly undervaluing the genuine nature of the relationship in most cases, and I disagree with you here. As an individual, I would totally let an intent aligned AGI manage most of my life, and give me suggestions. Of course, if I disagreed with a course of action it suggested, I would want it to give a non-manipulative argument to persuade me that it knows best, rather than simply forcing me into the alternative. In other words, I’d want some sort of weak paternalism on the part of an AGI. So, as a person who wants this type of thing, I can really see the merits of having parents who care for children. In some ways they are intent aligned GIs. Now, some parents are much more strict, and freedom restricting, and less transparent than what we would want in a full blown guardian superintelligence—but this just seems like an argument that there exist bad parents, not that this type of paternalism is bad. • Yeah, that’s one argument for tradition: it’s simply not the pit of misery that its detractors claim it to be. But for parenting in particular, I think I can give an even stronger argument. Children aren’t little seeds of goodness that just need to be set free. They are more like little seeds of anything. If you won’t shape their values, there’s no shortage of other forces in the world that would love to shape your children’s values, without having their interests at heart. • Children aren’t little seeds of goodness that just need to be set free. They are more like little seeds of anything Toddlers, yes. If we’re talking about people over the age of say, 8, then it becomes less true. By the time they are a teen, it becomes pretty false. And yet people still say that legal separation at 18 is good. If you are merely making the argument that we should limit their exposure to things that could influence them in harmful directions, then I’d argue that this never stops being a powerful force, including for people well into adulthood and in old age. • Huh? Most 8 year olds can’t even make themselves study instead of playing Fortnite, and certainly don’t understand the issues with unplanned pregnancies. I’d say 16-18 is about the right age where people can start relying on internal structure instead of external. Many take even longer, and need to join the army or something. • [ETA: Apparently this was misleading; I think it only applied to one company, Alienware, and it was because they didn’t get certification, unlike the other companies.] In my post about long AI timelines, I predicted that we would see attempts to regulate AI. An easy path for regulators is to target power-hungry GPUs and distributed computing in an attempt to minimize carbon emissions and electricity costs. It seems regulators may be going even faster than I believed in this case, with new bans on high performance personal computers now taking effect in six US states. Are bans on individual GPUs next? • Is it possible to simultaneously respect people’s wishes to live, and others’ wishes to die? Transhumanists are fond of saying that they want to give everyone the choice of when and how they die. Giving people the choice to die is clearly preferable to our current situation, as it respects their autonomy, but it leads to the following moral dilemma. Suppose someone loves essentially every moment of their life. For tens of thousands of years, they’ve never once wished that they did not exist. They’ve never had suicidal thoughts, and have always expressed a strong interest to live forever, until time ends and after that too. But on one very unusual day they feel bad for some random reason and now they want to die. It happens to the best of us every few eons or so. Should this person be allowed to commit suicide? One answer is yes, because that answer favors their autonomy. But another answer says no, because this day is a fluke. In just one day they’ll recover from their depression. Why let them die when tomorrow they will see their error? Or, as some would put it, why give them a permanent solution to a temporary problem? There are a few ways of resolving the dilemma. First I’ll talk about a way that doesn’t resolve the dilemma. When I once told someone about this thought experiment, they proposed giving the person a waiting period. The idea was that if the person still wanted to die after the waiting period, then it was appropriate to respect their choice. This solution sounds fine, but there’s a flaw. Say the probability that you are suicidal on any given day is one in a trillion, and each day is independent. Every normal day you love life and you want to live forever. However, even if we make the waiting period arbitrarily long, there’s a one hundred percent chance that you will die one day, even given your strong preference not to. It is guaranteed that eventually you will express the desire to commit suicide, and then independently during each day of the waiting period continue wanting to commit suicide, until you’ve waited out every day. Depending on the size of your waiting period, it may take googols of years for this to happen, but it will happen eventually. So what’s a better way? Perhaps we could allow your current self to die but then after that, replace you with a backup copy from a day ago when you didn’t want to die. We could achieve this outcome by uploading a copy of your brain onto a computer each day, keeping it just in case future-you wants to die. This would solve the problem of you-right-now dying one day, because even if you decided to one day die, there would be a line of succession from your current self to future-you stretching out into infinity. Yet others still would reject this solution, either because they don’t believe that uploads are “really them” or because they think that this solution still disrespects your autonomy. I will focus on the second objection. Consider someone who says, “If I really, truly, wanted to die, I would not consider myself dead if a copy from a day ago was animated and given existence. They are too close to me, and if you animated them, I would no longer be dead. Therefore you would not be respecting my wish to die.” Is there a way to satisfy this person? Alternatively, we could imagine setting up the following system: if someone wants to die, they are able to, but they must be uploaded and kept on file the moment before they die. Then, if at some point in the distant future, we predict that the world is such that they would have counterfactually wished to have been around rather than not existing, we reanimate them. Therefore, we fully respect their interests. If such a future never comes, then they will remain dead. But if a future comes that they would have wanted to be around to see, then they will be able to see it. In this way, we are maximizing not only their autonomy, but also their hypothetical autonomy. For those who wished they had never been born, we can allow those people to commit suicide, and for those who do not exist but would have preferred existence if they did exist, we bring those people into existence. No one is dissatisfied with their state of affairs. There are still a number of challenges to this view. We could first ask what mechanism we are using to predict whether someone would have wanted to exist, if they did exist. One obvious way is to simulate them, and then ask them “Do you prefer existing, or do you prefer not to exist?” But by simulating them, we are bringing them into existence, and therefore violating their autonomy if they say “I do not want to exist.” There could be ways of prediction that do not rely on total simulation. But it is probably impossible to predict their answer perfectly if we did not perform a simulation. At best, we could be highly confident. But if we were wrong, and someone did want to come into existence, but we failed to predict that and so never did, this would violate their autonomy. Another issue arises when we consider that there might always be a future that the person would prefer to exist. Perhaps, in the eternity of all existence, there will always eventually come a time where even the death-inclined would have preferred to exist. Are we then disrespecting their ancient choice to remain nonexistent forever? There seem to be no easy answers. We have arrived at an Arrow’s impossibility theorem of sorts. Is there a way to simultaneously respect people’s wishes to live forever and respect people’s wishes to die, in a way that matches all of our intuitions? Perhaps not perfectly, but we could come close. • However, even if we make the waiting period arbitrarily long, there’s a one hundred percent chance that you will die one day, even given your strong preference not to. Not if the waiting period gets longer over time (e.g. proportional to lifespan). • Good point. Although, there’s still a nonzero chance that they will die, if we continually extend the waiting period in some manner. And perhaps given their strong preference not to die, this is still violating their autonomy? • A person could be split on two parts: one that wants to die and other which to live. Then the first part is turned off. • You don’t need it anywhere near as stark a contrast as this. In fact, it’s even harder if the agent (like many actual humans) has previously considered suicide, and has experienced joy that they didn’t do so, followed by periods of reconsideration. Intertemporal preference inconsistency is one effect of the fact that we’re not actually rational agents. Your question boils down to “when an agent has inconsistent preferences, how do we choose which to support?” My answer is “support the versions that seem to make my future universe better”. If someone wants to die, and I think the rest of us would be better off if that someone lives, I’ll oppose their death, regardless of what they “really” want. I’ll likely frame it as convincing them they don’t really want to die, and use the fact that they didn’t want that in the past as “evidence”, but really it’s mostly me imposing my preferences. There are some with whom I can have the altruistic conversation: future-you AND future-me both prefer you stick around. Do it for us? Even then, you can’t support any real person’s actual preferences, because they don’t exist. You can only support your current vision of their preferred-by-you preferences. • I think that human level capabilities in natural language processing (something like GPT-2 but much more powerful) is likely to occur in some software system within 20 years. Since human level capabilities in natural language processing is a very rich real-world task, I would consider a system with those capabilities to be adequately described as a general intelligence, though it would likely not be very dangerous due to its lack of world-optimization capabilities. This belief of mine is based on a few heuristics. Below I have collected a few claims which I consider to be relatively conservative, and which collectively combine to weakly imply my thesis. Since this is a short-form post I will not provide very specific lines of evidence. Still, I think that each of my claims could be substantially expanded upon and/​or steelmanned by adding detail from historical trends and evidence from current ML research. Claim 1: Current techniques, given enough compute, are sufficient to perform par-human at natural language processing tasks. This is in some sense trivially true since sufficiently complicated RNNs are Turing complete. In a more practical sense, I think that there is enough evidence that current techniques are sufficient to perform rudimentary • Summarization of text • Auto-completion of paragraphs • Q&A • Natural conversation Given more compute and more data, I don’t see why there would be a fundamental stumbling block for current ML models to scale to human level on the above tasks. Therefore, I think that human level natural language processes could be created today with enough funding. Claim 2: Given historical data and assumptions about future progress, it is quite likely that the cost for training ML systems will continue to go down in the next decades by significant amounts (more specifically: an order of magnitude). I don’t have much more to add to this other than the fact that I have personally followed hardware trends on websites like videocardbenchmark.net and my guess is that creating neural-network specific hardware will continue this trend in ML. Claim 3: Creating a system with human level capabilities in natural language processing will require a modest amount of funding, relative to the amount of money large corporations and governments have at their disposal. To be more specific, I estimate that it would cost less than five billion dollars in hardware costs in 2019 inflation adjusted dollars, and perhaps even less than one billion dollars. Here’s a rough sketch for an argument for this proposition: • The cost of replicating GPT-2 was50k. This is likely to be a large overestimate, given that the post noted that intrinsic costs are much lower.

• Given claim 2, this cost can be predicted to go down to about \$5k within 20 years.

• While the cost for ML systems does not scale linearly in the number of parameters, the parallelizability of architectures like the Transformer allow for near-linear scaling. This is my impression from reading posts like this one.

• Given the above three statements, the cost of running a Transformer with the same number of parameters as the high estimate for the number of synapses in a human brain would naively cost about one billion dollars.

Claim 4: There is sufficient economic incentive such that producing a human-level system in the domain of natural language is worth a multi-billion dollar investment. To me this seems quite plausible, given just how many jobs require writing papers, memos, or summarizing text. Compare this to a space-race type scenario where there becomes enough public hype surrounding AI such that governments are throwing around one hundred fifty billion dollars, which is what they did for the ISS. And relative to space, AI at least has very direct real world benefits!

I understand there’s a lot to justify these claims. And I haven’t done much work to justify them. But, I’m not presently interested in justifying these claims to a bunch of judges intent on finding flaws. My main concern is that they all seem likely to me, and there’s also a lot of current work in out-competing companies to be first on the natural language benchmarks. It just adds up to me.

Am I missing something? If not, then this argument at least pushes back on claims that there is a negligible chance of general intelligence emerging within the next few decades.

• I expect that human-level language processing is enough to construct human-level programming and mathematical research ability. Aka, complete a research diary the way a human would, by matching with patterns it has previously seen, just as human mathematicians do. That should be capability enough to go as foom as possible.

• If AI is limited by hardware rather than insight, I find it unlikely that a 300 trillion parameter Transformer trained to reproduce math/​CS papers would be able to “go foom.” In other words, while I agree that the system I have described would likely be able to do human-level programming (though it would still make mistakes, just like human programmers!) I doubt that this would necessarily cause it to enter a quick transition to superintelligence of any sort.

I suspect the system that I have described above would be well suited for automating some types of jobs, but would not necessarily alter the structure of the economy by a radical degree.

• It wouldn’t necessarily cause such a quick transition, but it could easily be made to. A human with access to this tool could iterate designs very quickly, and he could take himself out of the loop by letting the tool predict and execute his actions as well, or by piping its code ideas directly into a compiler, or some other way the tool thinks up.

• My skepticism is mainly that this would be quicker than normal human iteration, or that this would substantially improve upon the strategy of simply buying more hardware. However, as we see in the recent case of eg. roBERTa, there are a few insights which substantially improve upon a single AI system. I just remain skeptical that a single human-level AI system would produce these insights faster than a regular human team of experts.

In other words, my opinion of recursive self improvement in this narrow case is that it isn’t a fundamentally different strategy from human oversight and iteration. It can be used to automate some parts of the process, but I don’t think that foom is necessarily implied in any strong sense.

• The default argument that such a development would lead to a foom is that an insight-based regular doubling of speed mathematically reaches a singularity in finite time when the speed increases pay insight dividends. You can’t reach that singularity with a fleshbag in the loop (though it may be unlikely to matter if with him in the loop, you merely double every day).

For certain shapes of how speed increases depend on insight and oversight, there may be a perverse incentive to cut yourself out of your loop before the other guy cuts himself out.

• I generally agree with the heuristic that we should “live on the mainline”, meaning that we should mostly plan for events which capture the dominant share of our probability. This heuristic causes me to have a tendency to do some of the following things

• Work on projects that I think have a medium-to-high chance of succeeding and quickly abandon things that seem like they are failing.

• Plan my career trajectory based on where I think I can plausibly maximize my long term values.

• Study subjects only if I think that I will need to understand them at some point in order to grasp an important concept. See more details here.

• Avoid doing work that leverages small probabilities of exceptionally bad outcomes. For example, I don’t focus my studying on worst-case AI safety risk (although I do think that analyzing worst-case failure modes is useful from the standpoint of a security mindset).

I see a few problems with this heuristic, however, and I’m not sure quite how to resolve them. More specifically, I tend to float freely between different projects because I am quick to abandon things if I feel like they aren’t working out (compare this to the mindset that some game developers have when they realize their latest game idea isn’t very good).

One case where this shows up is when I change my beliefs about where the most effective ways to spend my time as far as long-term future scenarios are concerned. I will sometimes read an argument about how some line of inquiry is promising and for an entire day believe that this would be a good thing to work on, only for the next day to bring another argument.

And things like my AI timeline predictions vary erratically, much more than I expect most people’s: I sometimes wake up and think that AI might be just 10 years away and other days I wake up and wonder if most of this stuff is more like a century away.

This general behavior makes me into someone who doesn’t stay consistent on what I try to do. My life therefore resembles a battle between two competing heuristics: on one side there’s the heuristic of planning for the mainline, and on the other there’s the heuristic of committing to things even if they aren’t panning out. I am unsure of the best way to resolve this conflict.

• Some random thoughts:

• Startups and pivots. Startups require lots of commitment even when things feel like they’re collapsing – only by perservering through those times can you possibly make it. Still, startups are willing to pivot – take their existing infrastructure but change key strategic approaches.

• Escalating commitment. Early on (in most domains), you should pick shorter term projects, because the focus is on learning. Code a website in a week. Code another website in 2 months. Don’t stress too much on multi-year plans until you’re reasonably confident you sorta know what you’re doing. (Relatedly, relationships: early on it makes sense to date a lot to get some sense of who/​what you’re looking for in a romantic partner. But eventually, a lot of the good stuff comes when you actually commit to longterm relationships that are capable of weathering periods of strife and doubt)

• Alternately: Givewell (or maybe OpenPhil?) did mixtures of shallow dives, deep dives and medium dives into cause areas because they learned different sorts of things from each kind of research.

• Commitment mindset. Sort of how Nate Soares recommends separating the feeling of conviction from the epistemic belief of high-success… you can separate “I’m going to stick with this project for a year or two because it’s likely to work” from “I’m going to stick to this project for a year or two because sticking to projects for a year or two is how you learn how projects work on the 1-2 year timescale, including the part where you shift gears and learn from mistakes and become more robust about them.

• Mathematically, it seems like you should just give your heuristic the better data you already consciously have: If your untrustworthy senses say you aren’t on the mainline, the correct move isn’t necessarily to believe them, but rather to decide to put effort into figuring it out, because it’s important.

It’s clear how your heuristic would evolve. To embrace it correctly, you should make sure that your entire life lives in the mainline. If there’s a game with negative expected value, where the worst outcome has chance 10%, and you play it 20 times, that’s stupid. Budget the probability you are willing to throw away for the rest of your life now.

If you don’t think you can stay to your budget, if you know that always, you will tomorrow play another round of that game by the same reasoning as today, then realize that today’s reasoning decides today and tomorrow. Realize that the mainline of giving in to the heuristic is losing eventually, and let the heuristic destroy itself immediately.

• I see a few problems with this heuristic, however, and I’m not sure quite how to resolve them. More specifically, I tend to float freely between different projects because I am quick to abandon things if I feel like they aren’t working out (compare this to the mindset that some game developers have when they realize their latest game idea isn’t very good).

There are two big issues with the “living in the mainline” strategy:

1. Most of the highest EV activities are those that have low chance of success but big rewards. I suspect much of your volatile behavior is bouncing between chasing opportunities you see as high value, and then realizing you’re not on the mainline and correcting, then realizing there are higher EV opportunities and correcting again.

2. Strategies that work well on the mainline often fail spectacularly in the face of black swans. So they have a high probability of working but also very negative EV in unlikely situations (which you ignore if you’re only thinking about the mainline).

Two alternatives to the “living on the mainline” heuristic:

1. The Anti-fragility heuristic:

• Use the barbell strategy, to split your activities between surefire wins with low upsides and certainty, and risky moonshots with low downsides but lots of uncertainty around upsides.

• Notice the reasons that things fail, and make them robust to that class of failure in the future.

• Try lots of things, and stick with the ones that work over time.

2. The Effectuation Heuristic:

• Go into areas where you have unfair advantages.

• Spread your downside risk to people or organizations who can handle it.

• In generally, work to CREATE the mainline where you have an unfair advantage and high upside.

You might get some mileage out of reading the effectuation and anti-fragility sections of this post.

• In discussions about consciousness I find myself repeating the same basic argument against the existence of qualia constantly. I don’t do this just to be annoying: It is just my experience that

1. People find consciousness really hard to think about and has been known to cause a lot of disagreements.

2. Personally I think that this particular argument dissolved perhaps 50% of all my confusion about the topic, and was one of the simplest, clearest arguments that I’ve ever seen.

I am not being original either. The argument is the same one that has been used in various forms across Illusionist/​Eliminativist literature that I can find on the internet. Eliezer Yudkowsky used a version of it many years ago. Even David Chalmers, who is quite the formidable consciousness realist, admits in The Meta-Problem of Consciousness that the argument is the best one he can find against his position.

The argument is simply this:

If we are able to explain why you believe in, and talk about qualia without referring to qualia whatsoever in our explanation, then we should reject the existence of qualia as a hypothesis.

This is the standard debunking argument. It has a more general form which can be used to deny the existence of a lot of other non-reductive things: distinct personal identities, gods, spirits, libertarian free will, a mind-independent morality etc. In some sense it’s just an extended version of Occam’s razor, showing us that qualia don’t do anything in our physical theories, and thus can be rejected as things that actually exist out there in any sense.

To me this argument is very clear, and yet I find myself arguing it a lot. I am not sure how else to get people to see my side of it other than sending them a bunch of articles which more-or-less make the exact same argument but from different perspectives.

I think the human brain is built to have a blind spot on a lot of things, and consciousness is perhaps one of them. I think quite a bit how if humanity is not able to think clearly about this thing which we have spent many research years on, then it seems like there might be some other low hanging philosophical fruits still remaining.

Addendum: I am not saying I have consciousness figured out. However, I think it’s analogous to how atheists haven’t “got religion figured out” yet they have at the very least taken their first steps by actually rejecting religion. It’s not a full theory of religious belief, or even a theory at all. It’s just the first thing you do if you want to understand the subject. I roughly agree with Keith Frankish’s take on the matter.

• If we are able to explain why you believe in, and talk about qualia without referring to qualia whatsoever in our explanation, then we should reject the existence of qualia as a hypothesis.

And I assume your claim is that we can explain why I believe in Qualia without referring to qualia?

Edit: to be clear, I don’t really much why other people talk about qualia. I care why I perceive myself to experience things. If it’s an illusion, cool, but then why do I experience the illusion?

• If belief is construed as some sort of representation which stands for external reality (as in the case of some correspondence theories of truth), then we can take the claim to be strong prediction of contemporary neuroscience. Ditto for whether we can explain why we talk about qualia.

It’s not that I could explain exactly why you in particular talk about qualia. It’s that we have an established paradigm for explaining it.

It’s similar in the respect that we have an established paradigm for explaining why people report being able to see color. We can model the eye, and the visual cortex, and we have some idea of what neurons do even though we lack the specific information about how the whole thing fits together. And we could imagine that in the limit of perfect neuroscience, we could synthesize this information to trace back the reason why you said a particular thing.

Since we do not have perfect neuroscience, the best analogy would be analyzing the ‘beliefs’ and predictions of an artificial neural network. If you asked me, “Why does this ANN predict that this image is a 5 with 98% probability” it would be difficult to say exactly why, even with full access to the neural network parameters.

However, we know that unless our conception of neural networks is completely incorrect, in principle we could trace exactly why the neural network made that judgement, including the exact steps that caused the neural network to have the parameters that it has in the first place. And we know that such an explanation requires only the components which make up the ANN, and not any conscious or phenomenal properties.

• I can’t tell whether we’re arguing about the same thing.

Like, I assume that I am a neural net predicting things and deciding things and if you had full access to my brain you could (in principle, given sufficient time) understand everything that was going on in there. But, like, one way or another I experience the perception of perceiving things.

(I’d prefer to taboo ‘Qualia’ in case it has particular connotations I don’t share. Just ‘that thing where Ray perceives himself perceiving things, and perhaps the part where sometimes Ray has preferences about those perceptions of perceiving because the perceptions have valence.’ If that’s what Qualia means, cool, and if it means some other thing I’m not sure I care)

My current working model of “how this aspect of my perception works” is described in this comment, I guess easy enough to quote in full:

“Human brains contain two forms of knowledge: - explicit knowledge and weights that are used in implicit knowledge (admittedly the former is hacked on top of the later, but that isn’t relevant here). Mary doesn’t gain any extra explicit knowledge from seeing blue, but her brain changes some of her implicit weights so that when a blue object activates in her vision a sub-neural network can connect this to the label “blue”.”

The reason I care about any of this is that I believe that a “perceptions-having-valence” is probably morally relevant. (or, put in usual terms: suffering and pleasure seem morally relevant).

(I think it’s quite possibe that future-me will decide I was confused about this part, but it’s the part I care about anyhow)

Are you saying the my perceiving-that-I-perceive-things-with-valence is an illusion, and that I am in fact not doing that? Or some other thing?

(To be clear, I AM open to ‘actually Ray yes, the counterintuitive answer is that no, you’re not actually perceiving-that-you-perceive-things-and-some-of-the-perceptions-have-valence.’ The topic is clearly confusing and behind the veil of epistemic-ignorance it seems quite plausible I’m the confused one here. Just noting that so far that from way you’re phrasing things I can’t tell whether your claims map onto the things I care about )

• Like, I assume that I am a neural net predicting things and deciding things and if you had full access to my brain you could (in principle, given sufficient time) understand everything that was going on in there. But, like, one way or another I experience the perception of perceiving things.

To me this is a bit like the claim of someone who claimed psychic powers but still wanted to believe in physics who would say, “I assume you could perfectly well understand what was going on at a behavioral level within my brain, but there is still a datum left unexplained: the datum of me having psychic powers.”

There are a number of ways to respond to the claim:

• We could redefine psychic powers to include mere physical properties. This has the problem that psychics insist that psychic power is entirely separate from physical properties. Simple re-definition doesn’t make the intuition go away and doesn’t explain anything.

• We could alternatively posit new physics which incorporates psychic powers. This has the occasional problem that it violates Occam’s razor, since the old physics was completely adequate. Hence the debunking argument I presented above.

• Or, we could incorporate the phenomenon within a physical model by first denying that it exists and then explaining the mechanism which caused you to believe in it, and talk about it.

In the case of consciousness, the third response amounts to Illusionism, which is the view that I am defending. It has the advantage that it conservatively doesn’t promise to contradict known physics, and it also does justice to the intuition that consciousness really exists.

I’d prefer to taboo ‘Qualia’ in case it has particular connotations I don’t share. Just ‘that thing where Ray perceives himself perceiving things, and perhaps the part where sometimes Ray has preferences about those perceptions of perceiving because the perceptions have valence.’

To most philosophers who write about it, qualia is defined as the experience of what it’s like. Roughly speaking, I agree with thinking of it as a particular form of perception that we experience.

However, it’s not just any perception, since some perceptions can be unconscious perceptions. Qualia specifically refer to the qualitative aspects of our experience of the world: the taste of wine, the touch of fabric, the feeling of seeing blue, the suffering associated with physical pain etc. These are said to be directly apprehensible to our ‘internal movie’ that is playing inside our head. It is this type of property which I am applying the framework of illusionism to.

The reason I care about any of this is that I believe that a “perceptions-having-valence” is probably morally relevant.

I agree. That’s why I typically take the view that consciousness is a powerful illusion, and that we should take it seriously. Those who simply re-define consciousness as essentially a synonym for “perception” or “observation” or “information” are not doing justice to the fact that it’s the thing I care about in this world. I have a strong intuition that consciousness is what is valuable even despite the fact that I hold an illusionist view. To put it another way, I would care much less if you told me a computer was receiving a pain-signal (labeled in the code as some variable with suffering set to maximum), compared to the claim that a computer was actually suffering in the same way a human does.

Are you saying the my perceiving-that-I-perceive-things-with-valence is an illusion, and that I am in fact not doing that? Or some other thing?

Roughly speaking, yes. I am denying that that type of thing actually exists, including the valence claim.

• Or, we could incorporate the phenomenon within a physical model by first denying that it exists and then explaining the mechanism which caused you to believe in it, and talk about it.

It still feels very important that you haven’t actually explained this.

In the case of psychic powers, I (think?) we actually have pretty good explanations for where perceptions of psychic powers comes from, which makes the perception of psychic powers non-mysterious. (i.e. we know how cold reading works, and how various kinds of confirmation bias play into divination). But, that was something that actually had to be explained.

It feels like you’re just changing the name of the confusing thing from ‘the fact that I seem conscious to myself’ to ‘the fact that I’m experiencing an illusion of consciousness.’ Cool, but, like, there’s still a mysterious thing that seems quite important to actually explain.

• Also just in general, I disagree that skepticism is not progress. If I said, “I don’t believe in God because there’s nothing in the universe with those properties...” I don’t think it’s fair to say, “Cool, but like, I’m still praying to something right, and that needs to be explained” because I don’t think that speaks fully to what I just denied.

In the case of religion, many people have a very strong intuition that God exists. So, is the atheist position not progress because we have not explained this intuition?

• I agree that skepticism generally can be important progress (I recently stumbled upon this old comment making a similar argument about how saying “not X” can be useful)

The difference between God and consciousness is that the interesting bit about consciousness *is* my perception of it, full stop. Unlike God or psychic powers, there is no separate thing from my perception of it that I’m interested in.

• The difference between God and consciousness is that the interesting bit about consciousness *is* my perception of it, full stop.

If by perception you simply mean “You are an information processing device that takes signals in and outputs things” then this is entirely explicable on our current physical models, and I could dissolve the confusion fairly easily.

However, I think you have something else in mind which is that there is somehow something left out when I explain it by simply appealing to signal processing. In that sense, I think you are falling right into the trap! You would be doing something similar to the person who said, “But I am still praying to God!

• However, I think you have something else in mind which is that there is somehow something left out when I explain it by simply appealing to signal processing. In that sense,

I don’t have anything else in mind that I know of. “Explained via signal processing” seems basically sufficent. The interesting part is “how can you look at a given signal-processing-system, and predict in advance whether that system is the sort of thing that would talk* about Qualia, if it could talk?”

(I feel like this was all covered in the sequences, basically?)

*where “talk about qualia” is shorthand ‘would consider the concept of qualia important enough to have a concept for.’”

• I mean, I agree that this was mostly covered in the sequences. But I also think that I disagree with the way that most people frame the debate. At least personally I have seen people who I know have read the sequences still make basic errors. So I’m just leaving this here to explain my point of view.

Intuition: On a first approximation, there is something that it is like to be us. In other words, we are beings who have qualia.

Counterintuition: In order for qualia to exist, there would need to exist entities which are private, ineffable, intrinsic, subjective and this can’t be since physics is public, effable, and objective and therefore contradicts the existence of qualia.

Intuition: But even if I agree with you that qualia don’t exist, there still seems to be something left unexplained.

Counterintuition: We can explain why you think there’s something unexplained because we can explain the cause of your belief in qualia, and why you think they have these properties. By explaining why you believe it we have explained all there is to explain.

Intuition: But you have merely said that we could explain it. You have not have actually explained it.

Counterintuition: Even without the precise explanation, we now have a paradigm for explaining consciousness, so it is not mysterious anymore.

This is essentially the point where I leave.

• physics is public, effable, and objective and therefore contradicts the existence of qualia.

Physics as map is. Note that we can’t compare the map directly to the territory.

• We do not telepathically receive experiemnt results when they are performed. In reality you need ot intake the measumrent results from your first-person point of view (use eyes to read led screen or use ears to hear about stories of experiments performed). It seems to be taht experiments are intersubjective in that other observers will report having experiences that resemble my first-hand experiences. For most purposes shorthanding this to “public” is adequate enough. But your point of view is “unpublisable” in that even if you really tried there is no way to provide you private expereience to the public knowledge pool (“directly”). “I now how you feel” is a fiction it doesn’t actually happen.

Skeptisim about the experiencing of others is easier but being skeptical about your own experiences would seem to be ludicrous.

• I am not denying that humans take in sensory input and process it using their internal neural networks. I am denying that process has any of the properties associated with consciousness in the philosophical sense. And I am making an additional claim which is that if you merely redefine consciousness so that it lacks these philosophical properties, you have not actually explained anything or dissolved any confusion.

The illusionist approach is the best approach because it simultaneously takes consciousness seriously and doesn’t contradict physics. By taking this approach we also have an understood paradigm for solving the hard problem of consciousness: namely, the hard problem is reduced to the meta-problem (see Chalmers).

• It feels like you’re just changing the name of the confusing thing from ‘the fact that I seem conscious to myself’ to ‘the fact that I’m experiencing an illusion of consciousness.’ Cool, but, like, there’s still a mysterious thing that seems quite important to actually explain.

I don’t actually agree. Although I have not fully explained consciousness, I think that I have shown a lot.

In particular, I have shown us what the solution to the hard problem of consciousness would plausibly look like if we had unlimited funding and time. And to me, that’s important.

And under my view, it’s not going to look anything like, “Hey we discovered this mechanism in the brain that gives rise to consciousness.” No, it’s going to look more like, “Look at this mechanism in the brain that makes humans talk about things even though the things they are talking about have no real world referent.”

You might think that this is a useless achievement. I claim the contrary. As Chalmers points out, pretty much all the leading theories of consciousness fail the basic test of looking like an explanation rather than just sounding confused. Don’t believe me? Read Section 3 in this paper.

In short, Chalmers reviews the current state of the art in consciousness explanations. He first goes into Integrated Information Theory (IIT), but then convincingly shows that IIT fails to explain why we would talk about consciousness and believe in consciousness. He does the same for global workspace theories, first order representational theories, higher order theories, consciousness-causes-collapse theories, and panpsychism. Simply put, none of them even approach an adequate baseline of looking like an explanation.

I also believe that if you follow my view carefully you might stop being confused about a lot of things. Like, do animals feel pain? Well it depends on your definition of pain—consciousness is not real in any objective sense so this is a definition dispute. Same with asking whether person A is happier than person B, or asking whether computers will ever be conscious.

Perhaps this isn’t an achievement strictly speaking relative to the standard Lesswrong points of view. But that’s only because I think the standard Lesswrong point of view is correct. Yet even so, I still see people around me making fundamentally basic mistakes about consciousness. For instance, I see people treating consciousness as intrinsic, ineffable, private—or they think there’s an objectively right answer to whether animals feel pain and argue over this as if it’s not the same as a tree falling in a forest.

• And we know that such an explanation requires only the components which make up the ANN, and not any conscious or phenomenal properties.

That’s an argument against dualism not an argument against qualia. If mind brain identity is true, neural activity is causing reports, and qualia, along with the rest of consciousness are identical to neural activity, so qualia are also causing reports.

• If you identify qualia as behavioral parts of our physical models, then are you also willing to discard the properties philosophers have associated with qualia, such as

• Ineffable, as they can’t be explained using just words or mathematical sentences

• Private, as they are inaccessible to outside third-person observers

• Intrinsic, as they are fundamental to the way we experience the world

If you are willing to discard these properties, then I suggest we stop using the world “qualia” since you have simply taken all the meaning away once you have identified them with things that actually exist. This is what I mean when I say that I am denying qualia.

It is analogous to someone who denies that souls exist by first conceding that we could identify certain physical configurations as examples of souls, but then explaining that this would be confusing to anyone who talks about souls in the traditional sense. Far better in my view to discard the idea altogether.

• My orientation to this conversation seems more like “hmm, I’m learning that it is possible the word qualia has a bunch of connotations that I didn’t know it had”, as opposed to “hmm, I was wrong to believe in the-thing-I-was-calling-qualia.”

But I’m not yet sure that these connotations are actually universal – the wikipedia article opens with:

In philosophy and certain models of psychology, qualia (/​ˈkwɑːliə/​ or /​ˈkweɪliə/​; singular form: quale) are defined as individual instances of subjective, conscious experience. The term qualia derives from the Latinneuter plural form (qualia) of the Latin adjective quālis (Latin pronunciation: [ˈkʷaːlɪs]) meaning “of what sort” or “of what kind” in a specific instance, like “what it is like to taste a specific apple, this particular apple now”.
Examples of qualia include the perceived sensation of pain of a headache, the taste of wine, as well as the redness of an evening sky. As qualitative characters of sensation, qualia stand in contrast to “propositional attitudes”,[1] where the focus is on beliefs about experience rather than what it is directly like to be experiencing.
Philosopher and cognitive scientist Daniel Dennett once suggested that qualia was “an unfamiliar term for something that could not be more familiar to each of us: the ways things seem to us”.[2]
Much of the debate over their importance hinges on the definition of the term, and various philosophers emphasize or deny the existence of certain features of qualia. Consequently, the nature and existence of various definitions of qualia remain controversial because they are not verifiable.

Later on, it notes the three characteristics (ineffable/​private/​intrinsic) that Dennett listed.

But this looks more like an accident of history than something intrinsic to the term. The opening paragraphs defined qualia the way I naively expected it to be defined.

My impression looking at the various defintions and discussion is not that qualia was defined in this specific fashion, so much as various people trying to grapple with a confusing problem generated various possible definitions and rules for it, and some of those turned out to be false once we came up with better understanding.

I can see where you’re coming from with the soul analogy, but I’m not sure if it’s more like the soul analogy, or more like “One early philosopher defined ‘a human’ as a featherless biped, and then a later one said “dude, look at this featherless chicken I just made” and they realized the definition was silly.

I guess my question here is – do you have a suggestion for a replacement word for “the particular kind of observation that gets made by an entity that actually gets to experience the perception”? This still seems importantly different from “just a perception”, since very simple robots and thermostats or whatever can be said to have those. I don’t really care whether they are inherently private, ineffable or intrinsic, and whether Daniel Dennett was able to eff them seems more like a historical curiosity to me.

The wikipedia article specifically says that they people argue a lot over the definitions:

There are many definitions of qualia, which have changed over time. One of the simpler, broader definitions is: “The ‘what it is like’ character of mental states. The way it feels to have mental states such as pain, seeing red, smelling a rose, etc.”

That definition there is the one I’m generally using, and the one which seems important to have a word for. This seems more like a political/​coordination question of “is it easier to invent a new word and gain traction for it, or get everyone on page about ‘actually, they’re totally in principle effable, you just might need to be a kind of mind different than a current-generation-human to properly eff them.’

• It does seem to me something like “I expect the sort of mind that is capable of viewing qualia of other people would be sufficiently different from a human mind that it may still be fair to call them ‘private/​ineffable among humans.’”

• Thanks for engaging with me on this thing. :)

I know I’m not being as clear as I could possibly be, and at some points I sort of feel like just throwing “Quining Qualia” or Keith Frankish’s articles or a whole bunch of other blog posts at people and say, “Please just read this and re-read it until you have a very distinct intuition about what I am saying.” But I know that that type of debate is not helpful.

I think I have a OK-to-good understanding of what you are saying. My model of your reply is something like this,

“Your claim is that qualia don’t exist because nothing with these three properties exists (ineffability/​private/​intrinsic), but it’s not clear to me that these three properties are universally identified with qualia. When I go to Wikipedia or other sources, they usually identify qualia with ‘what it’s like’ rather than these three very specific things that Daniel Dennett happened to list once. So, I still think that I am pointing to something real when I talk about ‘what it’s like’ and you are only disputing a perhaps-strawman version of qualia.”

Please correct me if this model of you is inaccurate.

I recognize what you are saying, and I agree with the place you are coming from. I really do. And furthermore, I really really agree with the idea that we should go further than skepticism and we should always ask more questions even after we have concluded that something doesn’t exist.

However, the place I get off the boat is where you keep talking about how this ‘what it’s like’ thing is actually referring to something coherent in the real world that has a crisp, natural boundary around it. That’s the disagreement.

I don’t think it’s an accident of history either that those properties are identified with qualia. The whole reason Daniel Dennett identified them was because he showed that they were the necessary conclusion of the sort of thought experiments people use for qualia. He spends the whole first several paragraphs justifying them using various intuition pumps in his essay on the matter.

Point being, when you are asked to clarify what ‘what it’s like’ means, you’ll probably start pointing to examples. Like, you might say, “Well, I know what it’s like to see the color green, so that’s an example of a quale.” And Daniel Dennett would then press the person further and go, “OK could you clarify what you mean when you say you ‘know what it’s like to see green’?” and the person would say, “No, I can’t describe it using words. And it’s not clear to me it’s even in the same category of things that can be either, since I can’t possibly conceive of an English sentence that would describe the color green to a blind person.” And then Daniel Dennett would shout, “Aha! So you do believe in ineffability!”

The point of those three properties (actually he lists 4, I think), is not that they are inherently tied to the definition. It’s that the definition is vague, and every time people are pressed to be more clear on what they mean, they start spouting nonsense. Dennett did valid and good deconfusion work where he showed that people go wrong in these four places, and then showed how there’s no physical thing that could possibly allow those four things.

These properties also show up all over the various thought experiments that people use when talking about qualia. For example, Nagel uses the private property in his essay “What Is it Like to Be a Bat?” Chalmers uses the intrinsic property when he talks about p-zombies being physically identical to humans in every respect except for qualia. Frank Jackson used the ineffability property when he talked about how Mary the neuroscientist had something missing when she was in the black and white room.

All of this is important to recognize. Because if you still want to say, “But I’m still pointing to something valid and real even if you want to reject this other strawman-entity” then I’m going to treat you like the person who wants to believe in souls even after they’ve been shown that nothing soul-like exists in this universe.

• Spouting nonsense is different from being wrong. If I say that there are no rectangles with 5 angles that can be processed pretty straght forwardly because the concept of a rectangle is unproblematic. But if you seek why that statement was made and the person points to a pentagon you will find 5 angles. Now there are polygons with 5 angles. If you give a short word for 5 angle rectangle” it’s correct to say those don’t exists. But if you give an ostensive definition of the shape then it does exist and it’s more to the point to say that it’s not a rectangle rather that it doesn’t exist.

In the details when persons say “what it is like to see green” one could fail to get what they mean or point to. If someone says “look a unicorn” and one has proof that unicorns don’t exist that doesn’t mean that the unicorn reference is not referencing something or that the reference target does not exist. If you end up in a situation where you point at a horse and say “those things do not exist. Look no horn, doesn’t exist” you are not being helpful. If somebody is pointing to a horse and says “look, a unicorn!” and you go “where? I see only horses” you are also not being helpful. Being “motivatedly uncooperative in ostension receiving” is not cool. Say that you made a deal to sell a gold bar in exchange for a unicorn. Then refusing to accept any object as an unicorn woud let you keep your gold bar and you migth be tempted to play dumb.

When people are saying “what it feels like to see green” they are trying to communicate something and failing their assertion by sabotaging their communication doesn’t prove anything. Communication is hard yes but doing too much semantics substitution means you start talking past each other.

• I am not suggesting that qualia should be identified with neural activity in a way that loses any aspects of the philosophical definition… bearing in mind that the he philosophical definition does not assert that qualia are non physical.

• What are you experiencing right now? (E.g. what do you see in front of you? In what sense does it seem to be there?)

• I won’t lie—I have a very strong intuition that there’s this visual field in front of me, and that I can hear sounds that have distinct qualities, and simultaneously I can feel thoughts rush into my head as if there is an internal speaker and listener. And when I reflect on some visual in the distance, it seems as though the colors are very crisp and exist in some way independent of simple information processing in a computer-type device. It all seems very real to me.

I think the main claim of the illusionist is that these intuitions (at least insofar as the intuitions are making claims about the properties of qualia) are just radically incorrect. It’s as if our brains have an internal error in them, not allowing us to understand the true nature of these entities. It’s not that we can’t see or something like that. It’s just that the quality of perceiving the world has essentially an identical structure to what one might imagine a computer with a camera would “see.”

Analogy: Some people who claim to have experienced heaven aren’t just making stuff up. In some sense, their perception is real. It just doesn’t have the properties we would expect it to have at face value. And if we actually tried looking for heaven in the physical world we would find it to be little else than an illusion.

• What’s the difference between making claims about nearby objects and making claims about qualia (if there is one)? If I say there’s a book to my left, is that saying something about qualia? If I say I dreamt about a rabbit last night, is that saying something about qualia?

(Are claims of the form “there is a book to my left” radically incorrect?)

That is, is there a way to distinguish claims about qualia from claims about local stuff/​phenomena/​etc?

• Sure. There are a number of properties usually associated with qualia which are the things I deny. If we strip these properties away (something Kieth Frankish refers to as zero qualia) then we can still say that they exist. But it’s confusing to say that something exists when its properties are so minimal. Daniel Dennett listed a number of properties that philosophers have assigned to qualia and conscious experience more generally:

(1) ineffable (2) intrinsic (3) private (4) directly or immediately apprehensible

Ineffable because there’s something Mary the neuroscientist is missing when she is in the black and white room. And someone who tried explaining color to her would not be able to fully.

Intrinsic because it cannot be reduced to bare physical entities, like electrons (think: could you construct a quale if you had the right set of particles?).

Private because they are accessible to us and not globally available. In this sense, if you tried to find out the qualia that a mouse was experiencing as it fell victim to a trap, you would come up fundamentally short because it was specific to the mouse mind and not yours. Or as Nagel put it, there’s no way that third person science could discover what it’s like to be a bat.

Directly apprehensible because they are the elementary things that make up our experience of the world. Look around and qualia are just what you find. They are the building blocks of our perception of the world.

It’s not necessarily that none of these properties could be steelmanned. It is just that they are so far from being steelmannable that it is better to deny their existence entirely. It is the same as my analogy with a person who claims to have visited heaven. We could either talk about it as illusory or non-illusory. But for practical purposes, if we chose the non-illusory route we would probably be quite confused. That is, if we tried finding heaven inside the physical world, with the same properties as the claimant had proposed, then we would come up short. Far better then, to treat it as a mistake inside of our cognitive hardware.

• Thanks for the elaboration. It seems to me that experiences are:

1. Hard-to-eff, as a good-enough theory of what physical structures have which experiences has not yet been discovered, and would take philosophical work to discover.

2. Hard to reduce to physics, for the same reason.

3. In practice private due to mind-reading technology not having been developed, and due to bandwidth and memory limitations in human communication. (It’s also hard to imagine what sort of technology would allow replicating the experience of being a mouse)

4. Pretty directly apprehensible (what else would be? If nothing is, what do we build theories out of?)

It seems natural to conclude from this that:

1. Physical things exist.

2. Experiences exist.

3. Experiences probably supervene on physical things, but the supervenience relation is not yet determined, and determining it requires philosophical work.

4. Given that we don’t know the supervenience relation yet, we need to at least provisionally have experiences in our ontology distinct from physical entities. (It is, after all, impossible to do physics without making observations and reporting them to others)

Is there something I’m missing here?

• Here’s a thought experiment which helped me lose my ‘belief’ in qualia: would a robot scientist, who was only designed to study physics and make predictions about the world, ever invent qualia as a hypothesis?

Assuming the actual mouth movements we make when we say things like, “Qualia exist” are explainable via the scientific method, the robot scientist could still predict that we would talk and write about consciousness. But would it posit consciousness as a separate entity altogether? Would it treat consciousness as a deep mystery, even after peering into our brains and finding nothing but electrical impulses?

• Robots take in observations. They make theories that explain their observations. Different robots will make different observations and communicate them to each other. Thus, they will talk about observations.

After making enough observations they make theories of physics. (They had to talk about observations before they made low-level physics theories, though; after all, they came to theorize about physics through their observations). They also make bridge laws explaining how their observations are related to physics. But, they have uncertainty about these bridge laws for a significant time period.

The robots theorize that humans are similar to them, based on the fact that they have functionally similar cognitive architecture; thus, they theorize that humans have observations as well. (The bridge laws they posit are symmetric that way, rather than being silicon-chauvinist)

• I think you are using the word “observation” to refer to consciousness. If this is true, then I do not deny that humans take in observations and process them.

However, I think the issue is that you have simply re-defined consciousness into something which would be unrecognizable to the philosopher. To that extent, I don’t say you are wrong, but I will allege that you have not done enough to respond to the consciousness-realist’s intuition that consciousness is different from physical properties. Let me explain:

If qualia are just observations, then it seems obvious that Mary is not missing any information in her room, since she can perfectly well understand and model the process by which people receive color observations.

Likewise, if qualia are merely observations, then the Zombie argument amounts to saying that p-Zombies are beings which can’t observe anything. This seems patently absurd to me, and doesn’t seem like it’s what Chalmers meant at all when he came up with the thought experiment.

Likewise, if we were to ask, “Is a bat conscious?” then the answer would be a vacuous “yes” under your view, since they have echolocaters which take in observations and process information.

In this view even my computer is conscious since it has a camera on it. For this reason, I suggest we are talking about two different things.

• Mary’s room seems uninteresting, in that robot-Mary can predict pretty well what bit-pattern she’s going to get upon seeing color. (To the extent that the human case is different, it’s because of cognitive architecture constraints)

Regarding the zombie argument: The robots have uncertainty over the bridge laws. Under this uncertainty, they may believe it is possible that humans don’t have experiences, due to the bridge laws only identifying silicon brains as conscious. Then humans would be zombies. (They may have other theories saying this is pretty unlikely /​ logically incoherent /​ etc)

Basically, the robots have a primitive entity “my observations” that they explain using their theories. They have to reconcile this with the eventual conclusion they reach that their observations are those of a physically instantiated mind like other minds, and they have degrees of freedom in which things they consider “observations” of the same type as “my observations” (things that could have been observed).

• As a qualia denier, I sometimes feel like I side more with the Chalmers side of the argument, which at least admits that there’s a strong intuition for consciousness. It’s not that I think that the realist side is right, but it’s that I see the naive physicalists making statements that seem to completely misinterpret the realist’s argument.

I don’t mean to single you out in particular. However, you state that Mary’s room seems uninteresting because Mary is able to predict the “bit pattern” of color qualia. This seems to me to completely miss the point. When you look at the sky and see blue, is it immediately apprehensible as a simple bit pattern? Or does it at least seem to have qualitative properties too?

I’m not sure how to import my argument onto your brain without you at least seeing this intuition, which is something I considered obvious for many years.

• There is a qualitative redness to red. I get that intuition.

I think “Mary’s room is uninteresting” is wrong; it’s uninteresting in the case of robot scientists, but interesting in the case of humans, in part because of what it reveals about human cognitive architecture.

I think in the human case, I would see Mary seeing a red apple as gaining in expressive vocabulary rather than information. She can then describe future things as “like what I saw when I saw that first red apple”. But, in the case of first seeing the apple, the redness quale is essentially an arbitrary gensym.

I suppose I might end up agreeing with the illusionist view on some aspects of color perception, then, in that I predict color quales might feel like new information when they actually aren’t. Thanks for explaining.

• I predict color quales might feel like new information when they actually aren’t.

I am curious if you disagree with the claim that (human) Mary is gaining implicit information, in that (despite already knowing many facts about red-ness), her (human) optic system wouldn’t have successfully been able to predict the incoming visual data from the apple before seeing it, but afterwards can?

• That does seem right, actually.

Now that I think about it, due to this cognitive architecture issue, she actually does gain new information. If she sees a red apple in the future, she can know that it’s red (because it produces the same qualia as the first red apple), whereas she might be confused about the color if she hadn’t seen the first apple.

I think I got confused because, while she does learn something upon seeing the first red apple, it isn’t the naive “red wavelengths are red-quale”, it’s more like “the neurons that detect red wavelengths got wired and associated with the abstract concept of red wavelengths.” Which is still, effectively, new information to Mary-the-cognitive-system, given limitations in human mental architecture.

• A physicist might discover that you can make computers out of matter. You can make such computers produce sounds. In processing sounds “homonym” is a perfectly legimate and useful concept. Even if two words are stored in far away hardware locations knowing that they will “sound detection clash” is important information. Even if you slice it a little differently and use different kinds of computer architechtures it woudl still be a real phenomenon.

In technical terms there might be the issue whether its meaningful to differntiate between founded concepts and hypothesis. If hypotheses are required then you could have a physicist that didn’t ever talk about temperature.

• It seems to me that you are trying to recover the properties of conscious experience in a way that can be reduced to physics. Ultimately, I just feel that this approach is not likely to succeed without radical revisions to what you consider to be conscious experience. :)

Generally speaking, I agree with the dualists who argue that physics is incompatible with the claimed properties of qualia. Unlike the dualists, I see this as a strike against qualia rather than a strike against physics. David Chalmers does a great job in his articles outlining why conscious properties don’t fit nicely in our normal physical models.

It’s not simply that we are awaiting more data to fill in the details: it’s that there seems to be no way even in principle to incorporate conscious experience into physics. Physics is just a different type of beast: it has no mental core, it is entirely made up of mathematical relations, and is completely global. Consciousness as it’s described seems entirely inexplicable in that respect, and I don’t see how it could possibly supervene on the physical.

One could imagine a hypothetical heaven-believer (someone who claimed to have gone to heaven and back) listing possible ways to incorporate their experience into physics. They could say,

Hard-to-eff, as it’s not clear how physics interacts with the heavenly realm. We must do more work to find out where the entry points of heaven and earth are.
In practice private due to the fact that technology hasn’t been developed yet that can allow me to send messages back from heaven while I’m there.
Pretty directly apprehensible because how would it even be possible for me to have experienced that without heaven literally being real!

On the other hand, a skeptic could reply that:

Even if mind reading technology isn’t good enough yet, our best models say that humans can be described as complicated computers with a particular neural network architecture. And we know that computers can have bugs in them causing them to say things when there is no logical justification.

Also, we know that computers can lack perfect introspection so we know that even if it is utterly convinced that heaven is real, this could just be due to the fact that the computer is following its programming and is exceptionally stubborn.

Heaven has no clear interpretation in our physical models. Yes, we could see that a supervenience is possible. But why rely on that hope? Isn’t it better to say that the belief is caused by some sort of internal illusion? The latter hypothesis is at least explicable within our models and doesn’t require us to make new fundamental philosophical advances.

• It seems that doubting that we have observations would cause us to doubt physics, wouldn’t it? Since physics-the-discipline is about making, recording, communicating, and explaining observations.

Why think we’re in a physical world if our observations that seem to suggest we are are illusory?

This is kind of like if the people saying we live in a material world arrived at these theories through their heaven-revelations, and can only explain the epistemic justification for belief in a material world by positing heaven. Seems odd to think heaven doesn’t exist in this circumstance.

(Note, personally I lean towards supervenient neutral monism: direct observation and physical theorizing are different modalities for interacting with the same substance, and mental properties supervene on physical ones in a currently-unknown way. Physics doesn’t rule out observation, in fact it depends on it, while itself being a limited modality, such that it is unsurprising if you couldn’t get all modalities through the physical-theorizing modality. This view seems non-contradictory, though incomplete.)

• You seem to have similar characteristic in your beliefs I encountered on less wrong before.

https://​​www.lesswrong.com/​​posts/​​TniCuWCDxQeqFSxut/​​arguments-for-the-existence-of-qualia-1?commentId=Zwyh8Xt5uaZ4ZBYbP

There is the phenomenon of qualia and then there is the ontological extension. The word does not refer to the ontological extension.

It would be like explaining lightning with lightning. Sure when we dig down there are non-lightning parts. But lightning still zaps people.

Or it would be a category error like saying that if you can explain physics without coordinates by only positing that energy exists you should drop coordinates from your concepts. But coordinates are not a thing to believe in, it’s a conceptual tool to specify claims not a hypothesis in itself. When physists believe in a particular field theory they are not agreeing with the greek philosphers that think that the world is made of a type of number.

• There is the phenomenon of qualia and then there is the ontological extension. The word does not refer to the ontological extension.

My basic claim is that the way that people use the word qualia implicitly implies the ontological extensions. By using the term, you are either smuggling these extensions in, or you are using the term in a way that no philosopher uses it. Here are some intuitions:

Qualia are private entities which occur to us and can’t be inspected via third person science.

Qualia are ineffable; you can’t explain them using a sufficiently complex English or mathematical sentence.

Qualia are intrinstic; you can’t construct a quale if you had the right set of particles.

etc.

Now, that’s not to say that you can’t define qualia in such a way that these ontological extensions are avoided. But why do so? If you are simply re-defining the phenomenon, then you have not explained anything. The intuitions above still remain, and there is something still unexplained: namely, why people think that there are entities with the above properties.

That’s why I think that instead, the illusionist approach is the correct one. Let me quote Keith Frankish, who I think does a good job explaining this point of view,

Suppose we encounter something that seems anomalous, in the sense of being radically inexplicable within our established scientific worldview. Psychokinesis is an example. We would have, broadly speaking, three options.
First, we could accept that the phenomenon is real and explore the implications of its existence, proposing major revisions or extensions to our science, perhaps amounting to a paradigm shift. In the case of psychokinesis, we might posit previously unknown psychic forces and embark on a major revision of physics to accommodate them.
Second, we could argue that, although the phenomenon is real, it is not in fact anomalous and can be explained within current science. Thus, we would accept that people really can move things with their unaided minds but argue that this ability depends on known forces, such as electromagnetism.
Third, we could argue that the phenomenon is illusory and set about investigating how the illusion is produced. Thus, we might argue that people who seem to have psychokinetic powers are employing some trick to make it seem as if they are mentally influencing objects.

In the case of lightning, I think that the first approach would be correct, since lightning forms a valid physical category under which we can cast our scientific predictions of the world. In the case of the orbit of Uranus, the second approach is correct, since it was adequately explained by appealing to understood Newtonian physics. However, the third approach is most apt for bizarre phenomena that seem at first glance to be entirely incompatible with our physics. And qualia certainly fit the bill in that respect.

• When I say “qualia” I mean individual instances of subjective, conscious experience full stop. These three extensions are not what I mean when I say “qualia”.

Qualia are private entities which occur to us and can’t be inspected via third person science.

Not convinced of this. There are known neural correlates of consciousness. That our current brain scanners lack the required resolution to make them inspectable does not prove that they are not inspectable in principle.

Qualia are ineffable; you can’t explain them using a sufficiently complex English or mathematical sentence.

This seems to be a limitation of human language bandwidth/​imagination, but not fundamental to what qualia are. Consider the case of the conjoined twins Krista and Tatiana, who share some brain structure and seem to be able “hear” each other’s thoughts and see through each other’s eyes.

Suppose we set up a thought experiment. Suppose that they grow up in a room without color, like Mary’s room. Now knock out Krista and show Tatiana something red. Remove the red thing before Krista wakes up. Wouldn’t Tatiana be able to communicate the experience of red to her sister? That’s an effable quale!

And if they can do it, then in principle, so could you, with a future brain-computer interface.

Really, communicating at all is a transfer of experience. We’re limited by common ground, sure. We both have to be speaking the same language, and have to have enough experience to be able to imagine the other’s mental state.

Qualia are intrinstic; you can’t construct a quale if you had the right set of particles.

Again, not convinced. Isn’t your brain made of particles? I construct qualia all the time just by thinking about it. (It’s called “imagination”.) I don’t see any reason in principle why this could not be done externally to the brain either.

• The Tatiana and krista experiment is quite interesting but stretches the concept of communication to it’s limits. I am inclined to say that having a shared part of your conciousness is not communication in the same way that sharing a house is not traffic. It does strike me that communication involves directed construction of thoughts and it’s easy to imagine that the scope of what this construction is capable would be vastly smaller than what goes on in the brain in other processes. Extending the construction to new types of thoughts might be a soft border rather than a hard one. With enough verbal sentences it should be in principle to be able to reconstruct an actual graphical image, but even with overtly descriptive prose this level is not really reached (I presume) but remains within the realm of sentence-like data structures.

In the example Tatiana directs the visual cortex and Krista can just recall the representation later. But in a single conciouness brain nothing can be made “ready” but it must be assembled by the brain itself from sensory inputs. That is cognitive space probably has small funnels and for signficant objects they can’t travel them as themselfs but must be chopped off into pieces and reassembled after passing the tube.

• Let’s extend the thought experiment a bit. Suppose technology is developed to separate the twins. They rely on their shared brain parts for vital functions, so where we cut nerve connections we replace them with a radio transceiver and electrode array in each twin.

Now they are communicating thoughts via a prosthesis. Is that not communication?

Maybe you already know what it is like to be a hive mind with a shared consciousness, because you are one: cutting the corpus callosum creates a split-brained patient that seems to have two different personalities that don’t always agree with each other. Maybe there are some connections left, but the bandwidth has been drastically reduced. And even within hemispheres, the brain seems to be composed of yet smaller modules. Your mind is made of parts that communicate with each other and share experience, and some of it is conscious.

I think the line dividing individual persons is a soft one. A sufficiently high-bandwidth communication interface can blur that boundary, even to the point of fusing consciousness like brain hemispheres. Shared consciousness means shared qualia, even if that connection is later severed, you might still remember what it was like to be the other person. And in that way, qualia could hypothetically be communicated between individuals, or even species.

• If you would copy my brain but make it twice as large that copy would be as “lonely” as I would be and this would remain after arbitrary doublings. Single individuals can be extended in space without communicating with other individuals.

The “extended wire” thought experiement doesn’t specify enough how that physical communication line is used. It’s plausible that there is no “verbalization” process like there is an step to write an email if one replaces sonic communication with ip-packet communication. With huge relative distance would come speed of light delays, if one twin was on earth and another on the moon there would be a round trip latency of seconds which probably would distort how the combined brain works. (And I guess with doublign in size would need to come with proportionate slowing to have same function).

I think there is a difference between a information system being spatially extended and having two information systems interface with each other. Say that you have 2 routers or 10 routers on the same length of line. It makes sense to make a distinction that each routers functions “independently” even if they have to be able to suggest each other enough that packets flow throught. To the first router the world “downline” seems very similar whether or not intermediate routers exist. I don’t count information system internal processing as communicating thus I don’t count “thinking” into communicating. Thus the 10 router version does more communicating than the 2 router version.

I think the “verbalization” step does mean that even highbandwidth connection doesn’t automatically mean qualia sharing. I am thinking of plugings that allow programming languages to share code. Even if there is a perfect 1-to-1 compatibility between the abstractions of the languages I think still each language only ever manipulates their version of that representation. Cross-using without translation would make it illdefined what would be correct function but if you do translation then it loses the qualities of the originating programming language. A C sharp integer variable will never contain a haskel integer even if a C sharp integer is constructed to represent the haskel integer. (I guess it would be possible to make a super-language that has integer variables that can contain haskel-integers and C-sharp integers but that language would not be C sharp or haskel). By being a spesific kind of cognitive architechture you are locked into certain representation types which are unescaable outside of turning into another kind ot architechture.

• I am assuming that the twins communicating thoughts requires an act of will like speaking does. I do have reasons for this. Watching their faces when they communicate thoughts makes it seem voluntary.

But most of what you are doing when speaking is already subconscious: One can “understand” the rules of grammar well enough to form correct sentences on nearly all attempts, and yet be unable to explain the rules to a computer program (or to a child or ESL student). There is an element of will, but it’s only an element.

It may be the case that even with a high-bandwidth direct-brain interface it would take a lot of time and practice to understand another’s thoughts. Humans have a common cognitive architecture by virtue of shared genes, but most of our individual connectomes are randomized and shaped by individual experience. Our internal representations may thus be highly idiosyncratic, meaning a direct interface would be ad-hoc and only work on one person. How true this is, I can only speculate without more data.

In your programming language analogy, these data types are only abstractions built on top of a more fundamental CPU architecture where the only data types are bytes. Maybe an implementation of C# could be made that uses exactly the same bit pattern for an int as Haskell does. Human neurons work pretty much the same way across individuals, and even cortical columns seem to use the same architecture.

I don’t think the inability to communicate qualia is primarily due to the limitation of language, but due to the limitation of imagination. I can explain what a tesseract is, but that doesn’t mean you can visualize it. I could give you analogies with lower dimensions. Maybe you could understand well enough to make a mental model that gives you good predictions, but you still can’t visualize it. Similarly, I could explain what it’s like to be a tetrachromat, how septarine and octarine are colors distinct from the others, and maybe you can develop a model good enough to make good predictions about how it would work, but again you can’t visualize these colors. This failing is not on English.

• Sure the difference between hearing about a tesseract and being able to visualise it is significant but I think the difference might not be an impossibility barrier but just skill level of imagination.

Having learned some echolocation my qualia involved in hearing have changed and it makes it seem possible to be able to make a similar transition from a trichromat visual space into a tetrachromat visual space. The weird thing about it is that my ear receives as much information that it did before but I just pay attention to it differently. Having deficient understanding in the sense of getting things wrong is easy line to draw. But it seems at some point the understanding becomes vivid instead of theorethical.

• Qualia are intrinstic; you can’t construct a quale if you had the right set of particles.

I’m pretty sure that’s not what “intrinisc” is supposed to mean. From “The Qualities of Qualia” by David de Leon.

Within philosophy there is a distinction, albeit a contentious one, between intrinsic and extrinsic properties. Roughly speaking “extrinsic” seems to be synonymous with “relational.” The property of being an uncle, for example, is a property which depends on (and consists of) a relation to something else, namely a niece or a nephew. Intrinsic properties, then, are those which do not depend on this kind of relation. That qualia are intrinsic means that their qualitative character can be isolated from everything else going on in the brain (or elsewhere) and is not dependent on relations to other mental states, behaviour or what have you. The idea of the independence of qualia on any such relation may well stem from the conceivability of inverted qualia: we can imagine two physically identical brains having different qualia, or even that qualia are absent from one but not the other.

• I find it important in philosophy to be on the clear what you mean. It is one thing to explain and another to define what you mean. You might point to a yellow object and say yellow and somebody that misunderstood might think that you mean “roundness” by yellow. The accuracy is most important when the views are radical and talk in very different worlds. And “disproving” yellow by not being able to pick it out from ostensive differentation is not an argumentative victory but a communicative failure.

Even if we use some other term I think that meaning is important to have. “Plogiston” might sneak in claims but that is just the more reason to have terms that have as little room for smuggling as possible. And we still need good terms to talk about burning. “oxygen” literally means “black maker” but we nowadays understand it as a term to refer to a element which has definitionally very little to do with the color black.

I think the starting point that generated the word refers to a genuine problem. Having qualia in category three would mean that you claim that I do not have experiences. And if qualia is a bad loaded word to refer to the thing to be explained it would be good to make up a new term that refers to that. But to me qualia was just that word. I word like “dark matter” might experience similar “highjack pressure” by having wild claims thrown around about it. And there having things like “warm dark matter”, “wimpy dark matter” makes the classification more fine making the conceptual analysis proceed. But requirements of clear thinking are different from tradition preservance. If you say that “warm dark matter” can’t be the answer the question of dark matter still stands. Even if you succesfully argue that “qualia” can’t be a attractive concept the issue of me not being a p-zombie still remains and it would be expected that some theorethical bending over backwards would happen.

• If we are able to explain why you believe in, and talk about qualia without referring to qualia whatsoever in our explanation, then we should reject the existence of qualia as a hypothesis

That argument has an inverse: “If we are able to explain why you believe in, and talk about an external without referring to an external world whatsoever in our explanation, then we should reject the existence of an external world as a hypothesis”.

People want reductive explanation to be unidirectional,so that you have an A and a B, and clearly it is the B which is redundant and can be replaced with A. But not all explanations work in that convenient way...sometimes A and B are mutually redundant, in the sense that you don’t need both.

The moral of the story being to look for the overall best explanation, not just eliminate redundancy.

• It’s a strong argument, but there are strong arguments on the other side as well.

• “Immortality is cool and all, but our universe is going to run down from entropy eventually”

I consider this argument wrong for two reasons. The first is the obvious reason, which is that even if immortality is impossible, it’s still better to live for a long time.

The second reason why I think this argument is wrong is because I’m currently convinced that literal physical immortality is possible in our universe. Usually when I say this out loud I get an audible “what” or something to that effect, but I’m not kidding.

It’s going to be hard to explain my intuitions for why I think real immortality is possible, so bear with me. First, this is what I’m not saying:

• I’m not saying that we can outlast the heat death of the universe somehow

• I’m not saying that we just need to shift our conception of immortality to be something like, “We live in the hearts of our countrymen” or anything like that.

• I’m not saying that I have a specific plan for how to become immortal personally, and

• I’m not saying that my proposal has no flaws whatsoever and that this is a valid line of research to be conducting at the moment.

So what am I saying?

A typical model of our life as humans is that we are something like a worm in 4 dimensional space. On one side of the worm there’s our birth, and on the other side of the worm is our untimely death. We ‘live through’ this worm, and that is our life. The length of our life is measured by considering the length of the worm in 4 dimensional space, measured just like a yardstick.

Now just change the perspective a little bit. If we could somehow abandon our current way of living, then maybe we can alter the geometry of this worm so that we are immortal. Consider: a circle has no starting point and no end. If someone could somehow ‘live through’ a circle, then their life would consist of an eternal loop through experiences, repeating endlessly.

The idea is that we somehow construct a physical manifestation of this immortality circle. I think of it like an actual loop in 4 dimensional space because it’s difficult to visualize without an analogy. A superintelligence could perhaps predict what type of actions would be necessary to construct this immortal loop. And once it is constructed, it’ll be there forever.

From an outside view in our 3d mind’s eye, the construction of this loop would look very strange. It could look like something popping into existence suddenly and getting larger, and then suddenly popping out of existence. I don’t really know; that’s just the intuition.

What matters is that within this loop someone will be living their life on repeat. True Déjà vu. Each moment they live is in their future, and in their past. There are no new experiences and no novelty, but the superintelligence can construct it so that this part is not unenjoyable. There would be no right answer to the question “how old are you.” And in my view, it is perfectly valid to say that this person is truly, actually immortal.

Perhaps someone who valued immortality would want one of these loops to be constructed for themselves. Perhaps for some reason constructing one of these things is impossible in our universe (though I suspect that it’s not). There are anthropic reasons that I have considered for why constructing it might not be worth it… but that would be too much to go into for this shortform post.

To close, I currently see no knockdown reasons to believe that this sort of scheme is impossible.

• In one scene in Egan’s Permutation City, the Peer character experienced “infinity” when he set himself up in an infinite loop such that his later experience matched up perfectly with the start of the loop (walking down the side of an infinitely tall building, if I recall). But he also experienced the loop ending.

• I don’t know of physics rules ruling this out. However, I suspect this doesn’t resolve the problems that the people I know who care most about immortality are worried about. (I’m not sure – I haven’t heard them express clear preferences about what exactly they prefer on the billions/​trillions year timescale. But they seem more concerned running out of ability to have new experiences than not-wanting-to-die-in-particular.)

My impression is many of the people who care about this sort of thing also tend to think that if you have multiple instances of the exact same thing, it just counts as a single instance. (Or, something more complicated about many worlds and increasing your measure)

• I agree with the objection. :) Personally I’m not sure whether I’d want to be stuck in a loop of experiences repeating over and over forever.

However, even if we considered “true” immortality, repeat experiences are inevitable simply because there’s a finite number of possible experiences. So, we’d have to start repeating things eventually.

• Virtual particles “pop into existence” in matter/​antimatter pairs and then “pop out” as they annihilate each other all the time. In one interpretation, an electron positron pair (for example) can be thought of as one electron that loops around and goes back in time. Due to CPS symmetry, this backward path looks like a positron. https://​​www.youtube.com/​​watch?v=9dqtW9MslFk

• It sounds like you’re talking about time travel. These “worms” are called “worldlines”. Spacetime is not simply R^4. You can rotate in the fourth dimension—this is just acceleration. But you can’t accelerate enough to turn around and bite your own tail because rotations in the fourth dimension are hyperbolic rather than circular. You can’t exceed or even reach light speed. There are solutions to General Relativity that contain closed timelike curves, but it’s not clear if they correspond to anything physically realizable.

• I have a previous high impliciation uncertainty about this (that would be a crux?). ” you can’t accelerate enough to turn around ” seems false to me. The mathematical rotation seems like it ought to exist. The prevoius reasons I thought such a mathematical rotation would be impossible I have signficantly less faith in. If I draw a unit sphere analog in spacetime having a visual observation from the space-time diagram drawn on euclid paper is not sufficient to conclude that the future cone is far from past cone. And thinking that a sphere is “all within r distance” it would seem it should be continuous and simply connected under most instances. I think there also should exist a transformation that when repeated enough times returns to the original configuration. And I find it surprising that a boost like transformation would fail to be like that if it is a rotation analog.

I have started to believe that the standrd reasoning why you can’t go faster than light relies on a kind of faulty logic. With normal euclidean geometry it would go like: there is a maximum angle you can reach by increasing the y-coordinate and slope is just the ratio of x to y so at that maximum y maximum slope is reached so maximum angle that you can have is 90 degrees. So if you try to go at 100 degrees you have lesser y and are actually going slower. And in a way 90 degrees is kind of the maximum amount you can point in another direction. But normally degrees go up to 180 or 360 degrees.

In the relativity side c is the maximum ratio but that is for coordinate time. If somebodys proper time would start pointing in a direction that would project negatively on the coordinate time axis the comparison between x per coordinate time and x per proper time would become significant.

There is also a trajectory which seems to be timelike in all segments. A=(0,0,0,0),(2,1,0,0),B=(4,2,0,0),(2,3,0,0),C=(0,4,0,0),(2,5,0,0),D=(4,6,0,0). It would seem awfully a lot like the “corner” A B C would be of equal magnitude but opposite sign from B C D. Now I get why physcially such a trajectory would be challenging. But from a mathematical point of view it is hard to understand why it would be ill-defined. It would also be very strange if there is no boost you can make at B to go from direction AB to direction BC. I get why you can’t rotate from AB to BD (can’t rotate a timelike distance to spacelike distance if rotation preserves length).

I also kind of get why yo woudl need infninte energy make such “impossibly sharp” turns. But as energy is the conserved charge of time translation, the definition of time might depend on which time you choose to derive it from. If you were to gain energy from an external source it would have to be tachyon or going backwards in time (which are either impossible or hard to produce). But if you had a thruster with you with fuel the “proper time energy” might behave differently. That is if you are going at signficant C and the whole universe is frozen and whissing by you should still be able to fire your rockets according to your time (1 second of your engines might take the entire age of the universe to external observers but does that prevent things happening from your perspective?). If acceleration “turns your time direction” and not “increases displacement per spent second” at some finite amount of acceleration experienced you would come full circle or atleast long enough that you are now going to the negative direction that you started in.

• I agree I would not be able to actually accomplish time travel. The point is whether we could construct some object in Minkowski space (or whatever General Relativity uses, I’m not a physicist) that we considered to be loop-like. I don’t think it’s worth my time to figure out whether this is really possible, but I suspect that something like it may be.

Edit: I want to say that I do not have an intuition for physics or spacetime at all. My main reason for thinking this is possible is mainly that I think my idea is fairly minimal: I think you might be able to do this even in R^3.

• I now have a Twitter account that tweets my predictions.

I don’t think I’m willing to bet on every prediction that I make. However, I pledge the following: if, after updating on the fact that you want to bet me, I still disagree with you, then I will bet. The disagreement must be non-trivial though.

For obvious reasons, I also won’t bet on predictions that are old, and have already been replaced by newer predictions. I also may not be willing to bet on predictions that have unclear resolution criteria, or are about human extinction.

• I have discovered recently that while I am generally tired and groggy in the morning, I am well rested and happy after a nap. I am unsure if this matches other people’s experiences, and haven’t explored much research. Still, I think this is interesting to think about fully.

What is the best way to apply this knowledge? I am considering purposely sabotaging my sleep so that I am tired enough to take a nap by noon, which would refresh me for the entire day. But this plan may have some significant drawbacks, including being excessively tired for a few hours in the morning.

• I’m assuming from context you’re universally groggy in the morning no matter how much sleep you get? (i.e. you’ve tried the obvious thing of just ‘sleep more’?)

• Pretty much, yes. Even with 10+ hours of sleep I am not as refreshed as a nap. It’s weird, but I think it’s a real effect.

• Two easy things you can try to feel less groggy in the morning are:

• Drinking a full glass of water as soon as you wake up.

• Listening to music or a podcast (bluetooth earphones work great here!). Music does the trick for me, although I’m usually not in the mood and I prefer a podcast.

About taking naps, while it seems to work for some people, I’m generally against it since it usually impairs my circadian clock greatly (I cannot keep consistent times and meddles with my schedule too much).

At nights, I take melatonin and it seems to have been of great help to keep consistent times at which I go to sleep (taking it with L-Theanine seems to be better for me somehow). Besides that, I do pay a lot of attention to other zeitgebers such as exercise, eating behavior, light exposure, and coffee. This is to say—regulating your circadian clock may be what you’re looking for.

A link of interest is gwern’s post about vitamin d experiment and other posts about sleep also.