Truthseeking, EA, Simulacra levels, and other stuff


I’ve been on this big kick talking about truthseeking in effective altruism. I started with vegan advocacy because it was the most legible, but now need to move on to the deeper problems. Unfortunately those problems are still not that legible, and I end up having to justify a lot of what I previously took as basic premises, and it’s all kind of stuck. I’m going to lay out a few threads in my mind, in the hopes that you will be curious about some of them and we can untangle from there.

Some threads:

  • There is a subtle distinction between “when should an individual operate on object level vs social reality?” and “when should a group of people invest together in facilitating operating in object level over social reality,?” I’m way more interested in the latter, although obviously it intersects with the former.

  • Up until now my posts have assumed that truthseeking is good without particularly justifying it. Seems like if I’m so excited about asking inconvenient questions I should be willing to turn that on itself.

  • I think the better version of that question is “when is truthseeking the most important thing the most important thing to invest in, and when can you stop?” Because obviously truthseeking is useful and important, the interesting question is how that trades off against capacities a group could invest in, like fundraising or operational competence.

  • Part of my argument here is “do you know how to solve AI alignment? because that sounds like a knowledge problem to me, and knowledge problems are solved in object level reality not social reality”. But it’s possible to be operating entirely in object level reality but just be kind of bad at it. The word “truthseeking” maybe combines “object level reality” with “good at it” in confusing ways, and maybe I should separate those.

  • I think EA would benefit from understanding simulacra levels and aiming for level 1, but I think rationalists would benefit from more understanding of what purpose higher levels serve. I think moving down simulacra levels requires replacing the function that higher levels serve (e.g. social support) in ways that don’t require so much sacrifice of truthseeking.


I end up having to justify a lot of what I previously took as basic premises, and it’s all kind of stuck

My initial attention is mostly on this point, which is maybe also your second point (of turning truthseeking on the question of how good truthseeking is). [The main alternative to justification, to me, is something like ‘conditional posts’; a post that has “truthseeking is good” as its premise and then people can just reject that premise to reject the post.]

The thread that I don’t see yet that I’m interested in is “what’s the best medium for truthseeking?”. Like, it seems to me like the higher simulacra levels are responses to different situations, and it may be the case that trying to get truthseeking thru collective virtue is worse than trying to get truthseeking thru good design of how people interact with each other. (Community Notes, for example, seem to be a somewhat recent success here.)

Actually, that gives me an idea of something object-level we could look at. Prediction markets are already pretty popular around here, but rely on lots of other layers to get it right. If there were a market on vegan nutrition, it would need someone to resolve it, and for people to trust that resolution process, and for that market to have become the consensus market instead of others, and so on.


what’s the best medium for truthseeking

One popular answer is “public writing”, which has some obvious advantages. But I think forcing things to be too public degrades truthseeking where it matters.

For example: the discourse on EAG admissions. I’m confident that even a perfectly virtuous org would be very constrained in what they could say in public. They can’t announce red flags in applications because then people know to hide them. People feel free to publicly complain about rejections, but it would be bullying for CEA to release the specific reason that person was rejected. If they make a strong case for merit in the abstract, they’ll lose donors.

I think you can make the case that we shouldn’t view sharing rejection reasons for someone who raised the issue in public as bullying, and that CEA should be wiling to let donors who can’t handle the truth go. I think it’s true on the margin that CEA should be more open about its models even if it costs them money, and I’d love to see EA go full “you can’t handle the truth” and drive away everyone for whom “we’re extremely picky and you didn’t make the cut” isn’t actively reassuring. But I think that even if you did that the environment would still excessively penalize certain true, virtuous statements, and if you tank that cost you still have the thing where applications are partially adversarial. If you say “donating to the Make A Wish Foundation is disqualifying” people will stop telling you they’ve donated to Make A Wish.

So the discussion has to be private, and small enough that it doesn’t leak too badly. But drawing the circle of trust is naturally going to bias you towards protecting yourself and the circle at the expense of people outside the circle.


I did make a small attempt to get a market going on vegan nutrition, in a way that could have been objectively resolved on a timescale of weeks. I didn’t get any takers and my sample size ended up being small enough that the bet could never have been resolved, but ignoring that… It took some cleverness to figure out a metric I could imagine both sides agreeing on. The obvious bets wouldn’t have worked.


I think we basically agree on the merits of public vs. private discussion /​ writing. It might be interesting to look at mechanisms besides writing/​reading.

In particular, I’m reminded of Bryan Caplan’s focus on Social Desirability Bias (like here); he argues that we should move more things out of the venue of politics /​ discussion because people are rewarded for saying things that sound nice but aren’t actually nice and not (directly) penalized, whereas decision-makers in other contexts both 1) don’t have to justify themselves as much, and so can care less about sounding nice, and 2) have to deal with the other variables, and so are penalized for being delusional.

Thinking about veganism in particular, I think it’s much easier /​ more rewarding to talk about things like animal suffering than things like convenience or taste, and so we should expect different things in what people say and what they actually end up eating (or what would be best for them to end up eating).

Having said all that, I feel uncertain whether or not looking at other mechanisms is a distraction. The case for it being a distraction is that we started off talking about talking about things; how to move the local conversation in a truthseeking direction is perhaps more on-topic than how to move ‘human behavior as a whole’ in a more truthseeking direction (or whatever the target is). I still have some lingering sense that there’s some way to tie the conversation more directly to the world in a way that is both good for the conversation and good for the world.

[That is, I think if we want people to spend more time at lower simulacra levels, they need to be incentivized to do that, and those incentives need to come from somewhere. We could maybe construct them out of the higher simulacra levels—”I’m object-oriented, like the cool kids!”—but I think this is a recipe for ending up with the presentation of object-orientation instead of the content.]


I think people ignore higher simulacra levels at their peril (humans have emotional needs), but agree that doubling down on them is probably not going to solve the problem.

The favored solutions to this right now are betting markets, retroactive funding, and impact certificates. I love these for the reasons you’d expect, but they’ve been everyone’s favored solutions for years and haven’t solved anything yet.

OTOH they have made some progress, and maybe this was the most progress that could realistically be made in the time available. Maybe rewiring an egregore to change its values and processes at a fundamental level takes more than 18 months. It seems plausible.

So we could talk about ways to improve the existing tech for solving the Caplan problem, or go looking for new ideas. I’m at a Manifold hack-a-thon right now, so there’s some poetry in digging into doing existing solutions better.


On a super practical level: what if every time someone encouraged someone to apply for a grant or job, they had to pay $1? Would cut down on a lot of hot air, and you could adjust the fee over time as you calibrated.


That seems good (like the dating site where message-senders paid message-recipients); here’s another practical idea for a Manifold hack-a-thon: Bayesian Truth Serum. It’s a way of eliciting answers on questions where we don’t have ground-truth and don’t want to just do a Keynesian Beauty Contest.

In situations where you have a debate, you might want to have a market over who “won” the debate; right now on Manifold it would have to be something like Vaniver running a market on who Vaniver thought won the debate, and Elizabeth running another market about who Elizabeth thought won it, and then some sort of aggregation of those markets; or you could have a poll over who won, and a market on the poll results. Both of these methods seem like they have serious defects but there might be simple approaches that have fewer (or subtler) defects.

[Bayesian Truth Serum assumes that you ask everyone once, and loses some of its power if respondents see answers-so-far before they give theirs; it’s not obvious that there’s a good way to market-ize it, such that people can trade repeatedly and update. My guess is the thing you want is a market on how a BTS poll will resolve, and the BTS poll run on Manifold also so that people can be incentivized with mana to be honest.]


Another super practical suggestion: should we strongly encourage people to have positions in markets related to their posts? (Like, the way to do this for scientists is for journal editors to read papers, come up with some price of how much replication-yes in the replication prediction market, and then if the scientist puts up the stake the paper gets published. Replication is less obvious for blog posts or positions on topics like vegan nutrition.)

More broadly, I guess the intuition behind this is “insurance everywhere”; if I suggest to you a movie you might want to watch, that could be recast as a bet where you get paid if you don’t like it and I get paid if you do like it.

This raises the obvious practical counterpoint: why do this legibly with money instead of illegibly with intuitive models? (Presumably if I recommend enough bad movies, you’ll stop watching movies because I recommend them, and by halo/​horns this will probably bleed over into other parts of our relationship, and so I’m incentivized to guess well.)

Eliezer has (I think in Facebook comments somewhere, tragically) a claim that you need a system with enough meta-levels of criticism that the ground-level discussion will be kept honest. [If I say something dumb, criticism of my statements needs to matter in order to punish that; but if criticism of my statements is itself not criticized, then dumb criticisms will be indistinguishable from smart ones, and so it won’t actually just punish dumb statements; and this problem is saved from an infinite regress by an identification between levels, where at some point “Contra contra contra Vaniver” (or w/​e) becomes its own object-level take.] I wonder if there’s something that’s the equivalent of this for the truthtrackingness of discussions.

[That is, we still need mechanism design, but the mechanism design can be of the social interactions that people have /​ what social roles as viewed as needed for conversations, instead of just financial transactions that we could add to a situation.]


I don’t want to overweight my personal pain points, but I would love if commenters on LW argued more amongst themselves instead of having positive and negative comments in separate threads.

Betting/​insurance everywhere sounds really clever, but in practice people mostly don’t. I tried to bet someone a particular medical test would find something, and that still wasn’t enough to get them to do it (test was conclusive and treatment was easy). Admittedly I wanted good odds, since if the test+treatment worked it would have had huge benefits for him, but I don’t think that was the hold-up (he didn’t dispute that if test+treatment worked the benefit would be enormous). People are just pretty effort constrained.

Back to debates: I think even if the truth serum works as defined, it fails to contain the power of defining the question. Selecting the question and defining the sides bakes in a lot of assumptions, and there’s no way for the betting system to capture that.


Selecting the question and defining the sides bakes in a lot of assumptions, and there’s no way for the betting system to capture that.

Incidentally, this is one of my other big complaints about science-as-done-now; with the standard being null hypothesis significance testing, you don’t have to argue that your interpretation of observations is different than any real person’s, just that it’s different from what you think the ‘baseline’ view is. (I wrote a little about this a while ago.)

I get that there’s a bunch of frame warfare where side A wants to cast things as ‘good vs. evil’ and side B wants to cast them as ‘order vs. chaos’, but it seems like often the result is you get ‘good vs. order’, since groups of people can mostly determine their own labels. Are there examples you’re thinking of where that doesn’t happen in smaller discussions?

[I notice I’m not thinking very much about ‘person vs. egregore’ fights; the motivating conflict isn’t that you and a specific vegan disagree about nutrition, it’s that you’re interested in lots of nutrition issues and when that touches on veganism it’s you against an egregore of lots of people who are very interested in veganism but only slightly if at all interested in nutrition. That said, it’s not obvious to me which side has more ‘frame control’ here; presumably the discussions are happening on your posts and so you get to set the frame locally, but also presumably this doesn’t really work because people are coming in with preconceptions set by their egregore and so if your post can’t be readily interpreted in that frame it will be misinterpreted instead.]


Yeah I think the debate framing is high-coupling and prone to bucket errors, and thus misses any gains from trade.

E.g. a neighborhood debating new housing. The anti-building side has multiple concerns, one of which is parking. The pro-building side would only be too delighted to create more housing without a right to street parking. Framing the debate as “build this apartment complex yes or no”, at a minimum, pushes people towards the standard cases for and against building, rather than collaborating to find the “yes housing no parking” solution. And if they do somehow discover it, how should someone vote? You could vote for the person who came up with the solution, even if you like the proposition they’re officially advocating for. But moving towards voting for people instead of positions seems unlikely to advance our goal of operating mostly on the object level.

You could break the vote up into smaller propositions and let people add their own, but my guess is naive implementations of that go pretty poorly.


I keep rewriting a comment about the vegan nutrition argument, because finding a legibly fair framing is hard. But you know what I would I would have paid thousands of dollars for in that argument? For every commenter to have to fill out a five question survey on my cruxes, saying whether they agreed, disagreed, or something more complicated (with an explanation). Because people are allowed to disagree with me on facts, and on frames, and I would have happily double cruxed with almost anyone who agreed to[1]. But we couldn’t really make progress when people were only arguing for their conclusion, and not where they disagreed with me.

That description is of course subjective, I’m sure they feel like they stated their disagreement very clearly, “double crux is the best way to resolve disagreements” is itself a claim that can be debated. But I feel better at imposing double cruxing via quiz than “I will delete comments that don’t accept my frame”.

@habryka ^ new feature idea.

  1. ^

    I made three offers to Dialogue. One is in progress but may not publish, one is agreed to but we’re trying to define the topic, and the third was turned down.


Somehow the vegan nutrition argument is reminding me of a recurring issue with the Alignment Forum (as of a few years ago—things are a bit different now). There were several different camps of thinking about alignment, and each roughly thought that the others were deeply mistaken. How do you have a discussion site that can both talk about broader frame questions and the narrower detailed questions that you can get into when everyone agrees about the big picture?

Here I have some suspicion that there’s some statement in your post near the beginning, where you say something like “vegan advocacy has a responsibility to be upfront about nutritional costs even if that reduces the effectiveness of the advocacy”, which people will predictably want to object to (because they care strongly about the effectiveness of the advocacy, for example, or have a model of their audience as willing to grasp at straws when it comes to excuses to not do the thing, or so on). There was an Arbital feature that I think was on LW but never saw that much use, whose name I don’t actually remember, which was the ability to add statements that people could assign probabilities to (and then readers could hover over and see the distribution of community probabilities and who believed what).

My guess is that this sort of polling could serve a similar role to agree/​disagree-votes, which is that they both 1) are a source of info that people want to provide and 2) reduce the amount of that info that bleeds into other channels. (Someone writing a comment that’s not very good but everyone agrees with used to get lots of karma, and now will get little karma but lots of agreement.) Maybe a lot of the annoying comments would have just turned into low probabilities /​ agreements on your axiom (and the same for annoying comments questioning the whole premise of an alignment agenda).

For every commenter to have to fill out a five question survey on my cruxes, saying whether they agreed, disagreed, or something more complicated (with an explanation).

I worry about how the third option will go in practice. It does seem good to have the three-valued answer of “yes/​no/​mu”, so that people can object to the frame of a question (“have you stopped beating your wife?”), and mu is a special case of your third more-complicated option. [It’s also the easiest such explanation to write, and so I think there will be a temptation to just explain ‘mu’ to all of your quiz questions in order to state that their conclusion differs with yours. Of course, if they do this, it’s probably easier to write them off /​ more visible to everyone else and them that they’re being annoying about it.]

It’s also not obvious you want this for generic comments (should I have to share whether or not I agree with your premises to point out a typo, or locally invalid reasoning step, or so on?), but as a thing that the author can turn on for specific posts it seems worth thinking about more.