I’d contend that a post can be “in good faith” in the sense of being a sincere attempt to communicate your actual beliefs and your actual reasons for them, while nonetheless containing harmful patterns such as logical fallacies, misleading rhetorical tricks, excessive verbosity, and low effort to understand your conversational partner. Accusing someone of perpetuating harmful dynamics doesn’t necessarily imply bad faith.

In fact, I see this distinction as being central to the OP. Duncan talks about how his brain does bad things on autopilot when his focus slips, and he wants to be called on them so that he can get better at avoiding them.

• And why would a good and sane person ever want to impose costs on third parties ever except like in… revenge because we live in an anarchic horror world, or (better) as punishment after a wise and just proceeding where rehabilitation would probably fail but deterrence might work?

This paragraph sounds to me like when you say “costs” you are actually thinking of “punishments”, with an implication of moral wrongdoing. I’m uncertain that Duncan intended that implication (and if he did, I’d like to request that both of you use the more specific term).

If you continue to endorse the quoted paragraph as applied to “costs” that are not necessarily punishments, then I further contend that costs are useful in several scenarios where there is no implication of wrongdoing:

The SUPER obvious example is markets, where costs are used as a mechanism to control resource allocation.

Another common scenario is as a filter for seriousness. Suppose you are holding a contest where you will award a prize to the best foozle. If entry in the contest is free of charge, you might get a lot of entries from amateur foozle artists who know that they have negligible chance of winning, but who see no downside to entering; you will then be forced to spend resources judging those entries. If you instead charge a 10 entrance fee, then most of those people will not bother to enter, and you’ll only have to judge the foozles from artists who self-evaluate as having a real shot. There is no implication that entering the contest is an evil thing that no one should ever do. It’s just a mechanism for transferring some of the transactional costs onto the party that is best able to judge the value of the transaction, so that we end up with better decisions about which transactions to perform. Another use for costs is for creating precommitments. If you want to avoid a certain future action, attaching a cost to that action can be helpful both for generating the willpower to stick to your decision and also for convincing other people that you will stick to it. There exist services people voluntarily sign up for where you agree to pay money if you break a precommitment. Additionally, I feel you are unfairly maligning deterrence. You imply it should only be used where rehabilitation would probably fail, but rehabilitation only prevents that offender from repeating the offense, whereas deterrence discourages anyone from repeating the offense; this creates many scenarios where deterrence might be desirable in addition to rehabilitation (or where rehabilitation is irrelevant, e.g. because that particular offender will never have a similar opportunity again). You also imply deterrence should only be used after meeting an extremely high standard of evidence; most people only consider this necessary for extreme forms of deterrence (e.g. jail) but permit a much weaker standard of evidence for mild forms (e.g. verbal chastisement; leaving a negative review). I think this common view is probably correct on cost/​benefit grounds (less caution is required in situations where a mistake causes less harm). • Please don’t make this place worse again by caring about points for reasons other than making comments occur in the right order on the page. I wish this statement explaining what goal your advice is designed to optimize had appeared at the top of the advice, rather than the bottom. My current world-model predicts that this is not what most people believe points are for, and that getting people to treat points in this way would require a high-effort coordinated push, probably involving radical changes to the UI to create cognitive distance from how points are used on other sites. Specifically, I think the way most people actually use points is as a prosthetic for nonverbal politics; they are the digital equivalent of shooting someone a smile or a glower. Smiles/​glowers, in turn, are a way of informing the speaker that they are gaining/​losing social capital, and informing bystanders that they could potentially gain/​lose social capital depending on which side of the issue they support. My model says this is a low-level human instinctive social behavior, with the result that it is very easy to make people behave this way, but simultaneously very hard for most people to explain exactly what they are doing or why. This self-opacity, combined with a common slightly-negative valence attached to the idea of using social disapproval as a way to sculpt the behavior of others, results in many people leaving out the “social capital” part of this explanation and describing upvotes/​downvotes as meaning “I want to see more/​fewer posts like this”. Which I think is still importantly different from “I want this exact comment to appear higher/​lower on this page,” in that it implies the primary purpose is about sculpting future content rather than organizing existing content. (Note that all of the above is an attempt at description, not prescription.) You’ve framed this issue as one of educating people about points. I think a better framing would be that the Internet already has an established norm, and that norm is a natural attractor due to deep human instincts, and you are proposing a new, incompatible, and significantly less-stable norm to replace it. I would be willing to entertain arguments that this is somehow a worthwhile switch, but my prior is against it. Also, I find it mildly alarming that your personal strategy for reinforcing this norm involves explicitly refusing the benefits that it could have provided to you (by starting at the bottom of the page, reading comments in the inverse of the order you want the point system to recommend). Norms that do not benefit their own defenders are less stable, and the fact that you are discarding at least some of the potential value makes it harder to argue that the whole thing is net-positive. • In the second-to-last paragraph, you have an “either” with no “or”. • This comment makes me think of the novel Twig, which is set in a world that I’ve described as “there was a biotech revolution instead of an infotech revolution”. (Though the story simply takes this as a given and does not explore the question of how that might have happened.) • That particular problem appears to apply only to auctions where a single party is both placing a bid and also receiving (at least some of) the price that the winner pays? (Note: I believe I understand how second-price auctions work in simple cases, but I’m not clear exactly how that’s being applied to the rent-splitting scenario in the OP, so you may be assuming some context that I haven’t picked up.) • No offense taken. It appeared to me that your post was at least partially motivated by questioning the philosophical robustness of the first song, and that the song could plausibly be criticized for implying that bigger circles are inherently better, even if the people singing it don’t reflectively endorse that position. (I admit that songs need to choose a trade-off between brevity and nuance that is further towards the “brevity” side than most philosophy, but I don’t feel like “bigger is better” is much good even as an approximation.) • Second-price auctions are brilliant for one-shot transactions, and for cases where nobody is likely to use the knowledge of your true price against you. But imagine if it wasn’t among friends, but strangers, and if it were repeated every month. Now strategy again matters. How so? • For context, (a) I have read the sequences (b) The reason I found this post is that your coordination frontier intro got me interested enough that I looked for a way to be notified of future posts in that sequence, and the best option I could find was subscribing to all your posts • Even when people talk as if “growing the circle” is a good thing, I have the impression that they really mean “drawing the boundary correctly is a good thing, and the correct place to draw the boundary happens to be bigger than people drew it in the past”. If someone said we should extend rights to rocks and sand, and we need to stop using beaches recreationally because it’s unfairly exploiting the sand...I think you’d probably think that was a bad idea, even if it “grows the circle”. If you think that growing the circle to include foreigners, animals, and AI is good, but that growing the circle to include rocks and sand is bad, that sounds to me like your personal circle already includes foreigners, animals, and AI but not rocks and sand, and you’re wishing everyone else would change their circle to match yours. Not saying you’d be wrong to want that. But I think it requires a more sophisticated justification than “because bigger circles are better”. • Kinda sounds like the important part is not the blankets themselves, but the relationships between them? That is, a Markov blanket is just any partition of the graph, but it’s important that you can assert that is “separating” and . (Whereas if you just took 3 random partitions, none of them would necessarily separate the other 2.) Or is it more like—we don’t actually have any explicit representation of the entire causal model, so we can’t necessarily use a partition to calculate all the edges that cross that partition, and the Markov blanket is like a list of the edges, rather than a list of the nodes? Every partition describes a Markov blanket, but not every set of edges does, so saying that this particular set of edges forms a Markov blanket is a non-trivial statement about those edges? • Is every line you can draw through the causal model a Markov blanket? It seems like you’re interested in Markov blankets because the information on one side is independent from the other side given the values of the edges that pass through. But it also looks like the edges in the original graph represent “has any effect on”. Which makes it sound like you’re saying one side is independent from the other except for all of the ways in which it’s not, which seems trivial. What am I missing? • Minor editing error: new positive skills we and habits we can gain Either an editing error, or I’m just confused by what it means: Bob is having a similar experience re: Theft.xx TODO • I attended a university where the median entering student had a perfect score on the math SAT and all students were required to take several calculus classes regardless of major (the first class involved epsilon-delta proofs). I never felt I had any particular problem with calculus; mostly got As. Majored in a math-adjacent field (though not one that uses calculus) and graduated with honors. I’m not sure I could answer ANY of your 3 questions. Possibly if I spent a considerable time carefully thinking about them. It’s possible that I could have answered them at the time and have since forgotten, but I don’t feel like that’s the case. • Yet another counter-response is that even if the response were true, the false model could be much too high, but it can only be slightly too low, since 1-10^-9 is quite close to 1. This is contingent upon the scale you have chosen for representing the answer. If you measure chances in log odds, they range from negative infinity to positive infinity, so any answer you come up with could have an unbounded error in either direction. See https://​​www.lesswrong.com/​​posts/​​QGkYCwyC7wTDyt3yT/​​0-and-1-are-not-probabilities But I’m uncertain why this would be significant anyway? An asymmetry of maximum error does not necessarily imply an asymmetry of expected error. But then, that’s before you have looked at the number. Why does looking at the number matter? If you have a prior expectation about what the number is likely to be, then you might reason that the true answer is likely to be closer to your prior than farther from it. But that’s essentially the answer Scott already gave in the essay—that any argument is pushing us away from our prior, and our confidence in the argument determines how far it is able to push us. Your phrasing seems to imply you believe you are giving a different reason for thinking that the expected error is asymmetrical than the one Scott gave. If that is the case, then I don’t understand your implied reasoning. • Thanks. After thinking about your explanation for a while, I have made a small update in the direction of FDT. This example makes FDT seem parsimonious to me, because it makes a simpler precommitment. I almost made a large update in the direction of FDT, but when I imagined explaining the reason for that update I ran into a snag. I imagined someone saying “OK, you’ve decided to precommit to one-boxing. Do you want to precommit to one-boxing when (a) Omega knows about this precommitment, or (b) Omega knows about this precommitment, AND the entangled evidence that Omega relied upon is ‘downstream’ of the precommitment itself? For example, in case (b), you would one-box if Omega read a transcript of this conversation, but not if Omega only read a meeting agenda that described how I planned to persuade you of option (a).” But when phrased that way, it suddenly seems reasonable to reply: “I’m not sure what Omega would predict that I do if he could only see the meeting agenda. But I am sure that the meeting agenda isn’t going to change based on whether I pick (a) or (b) right now, so my choice can’t possibly alter what Omega puts into the box in that case. Thus, I see no advantage to precommiting to one-boxing in that situation.” If Omega really did base its prediction just on the agenda (and not on, say, a scan of the source code of every living human), this reply seems correct to me. The story’s only interesting because Omega has god-like predictive abilities. Which I guess shouldn’t be surprising, because if there were a version of Newcomb’s problem that cleanly split FDT from CDT without invoking extreme abilities on Omega’s part, I would expect that to be the standard version. I’m left with a vague impression that FDT and CDT mostly disagree about “what rigorous mathematical model should we take this informal story-problem to be describing?” rather than “what strategy wins, given a certain rigorous mathematical model of the game?” CDT thinks you are choosing between1K and $0, while FDT thinks you are choosing between$1K and $1M. If we could actually run the experiment, even in simulation, then that disagreement seems like it should have a simple empirical resolution; but I don’t think anyone knows how to do that. (Please correct me if I’m wrong!) • I agree that figuring out what you “should have” precommitted can be fraught. One possible response to that problem is to set aside some time to think about hypotheticals and figure out now what precommitments you would like to make, instead of waiting for those scenarios to actually happen. So the perspective is “actual you, at this exact moment”. I sometimes suspect you could view MIRI’s decision theories as an example of this strategy. Alice: Hey, Bob, have you seen this “Newcomb’s problem” thing? Bob: Fascinating. As we both have unshakable faith in CDT, we can easily agree that two-boxing is correct if you are surprised by this problem, but that you should precommit to one-boxing if you have the opportunity. Alice: I was thinking—now that we’ve realized this, why not precommit to one-boxing right now? You know, just in case. The premise of the problem is that Omega has some sort of access to our actual decision-making algorithm, so in principle we can precommit just by deciding to precommit. Bob: That seems unobjectionable, but not very useful in expectation; we’re very unlikely to encounter this exact scenario. It seems like what we really ought to do is make a precommitment for the whole class of problems of which Newcomb’s problem is just one example. Alice: Hm, that seems tricky to formally define. I’m not sure I can stick to the precommitment unless I understand it rigorously. Maybe if... --Alice & Bob do a bunch of math, and eventually come up with a decision strategy that looks a lot like MIRI’s decision theory, all without ever questioning that CDT is absolutely philosophically correct?-- Possibly it’s not that simple; I’m not confident that I appreciate all the nuances of MIRI’s reasoning. • Suppose you run your twins scenario, and the twins both defect. You visit one of the twins to discuss the outcome. Consider the statement: “If you had cooperated, your twin would also have cooperated, and you would have received$1M instead of $1K.” I think this is formally provable, given the premises. Now consider the statement: “If you had cooperated, your twin would still have defected, and you would have received$0 instead of \$1K.” I think this is also formally provable, given the premises. Because we have assumed a deterministic AI that we already know will defect given this particular set of inputs! Any statement that begins “if you had cooperated...” is assuming a contradiction, from which literally anything is formally provable.

You say in the post that only the cooperate-cooperate and defect-defect outcomes are on the table, because cooperate-defect is impossible by the scenario’s construction. I think that cooperate-cooperate and defect-defect aren’t both on the table, either. Only one of those outcomes is consistent with the AI program that you already copied. If we can say you don’t need to worry about cooperate-defect because it’s impossible by construction, then in precisely what sense are cooperate-cooperate and defect-defect both still “possible”?

I feel like most people have a mental model for deterministic systems (billiard balls bouncing off each other, etc.) and a separate mental model for agents. If you can get your audience to invoke both of these models at once, you have probably instantiated in their minds a combined model with some latent contradiction in it. Then, by leading your audience down a specific path of reasoning, you can use that latent contradiction to prove essentially whatever you want.

(To give a simple example, I’ve often seen people ask variations of “does (some combinatorial game) have a 5050 win rate if both sides play optimally?” A combinatorial game, played optimally, has only one outcome, which must occur 100% of the time; but non-mathematicians often fail to notice this, and apply their usual model of “agents playing a game” even though the question constrained the “agents” to optimal play.)

I notice this post uses a lot of phrases like “it actually works” and “try it yourself” when talking about the twins example. Unless there’s been a recent breakthrough in mind uploading that I haven’t heard about, this wording implies empirical confirmation that I’m pretty confident you don’t have (and can’t get).

If you were forced to express your hypothetical scenarios in computer source code, instead of informal English descriptions, I think it would probably be pretty easy to run some empirical tests and see which strategies actually get better outcomes. But I don’t know, and I suspect you don’t know, how to “faithfully” represent any of these examples as source code. This leaves me suspicious that perhaps all the interesting results are just confusions, rather than facts about the universe.

• In your “active termite blackmail” example, you say that the “it’s too late” objection still applies. That might be true as regards this year’s termites, but you specified this recurs every year. It seems to me there’s plenty of room for this year’s decision to (causally) influence next year’s chances of termites; whether you pay this year seems like strong evidence about whether you will pay next year.

(EDIT: Fixed typo.)