the gears to ascension comments on Alignment Fellowship

the gears to ascension 26 Dec 2025 6:24 UTC
20 points
5
My takes for the qs in order,
- Good idea: Probably yes if done well
- Existing list: I suspect none really good now
- How much: No strong opinion; perhaps ask each, “How much of the total shared budget do you need to work unimpeded? Prefer less but pick enough to work well. We hope to fund 3 to 6 people for two years, and have selected the following list of budget-gated grantees: [list goes here]”—this could have weird social effects but the positive weird effects seem maybe worth it to me.
- Voting/other approach: I’d worry about plain popularity contest on alignmentforum due to not giving tools to explicitly hedge over opinion clusters.
  
  I’d suggest: take fixed point of asking a naturally-transitive person-taste-rating question and a non-transitive research-taste-rating question:
  
  > Q1 (transitive): “Hello A. Please rate B on peer taste by giving upper and lower bounds on P(C has [good taste in doom-reducing]^[1] people, strategies, and research | B rates C as a 1 on this question), as well as any freeform commentary you have on the bounds you gave”.
  > Q2 (terminal): “Hello A. Please rate B on agenda taste by giving upper and lower bounds on P(doom by 2200|B’s research agenda continues)/P(doom by 2200|B’s research agenda stops), as well as any freeform commentary you have on the bounds you gave.”
  
  Propagating ratings like Q1 is also known as Eigentrust^[2] (or perhaps here we could call it Eigentaste); doing this allows focusing on all-factors doom, makes cliques visible but doesn’t favor them, and easily puts significant weight on newbies who someone well-known is impressed by.
  
  I’d make it invite-based—starting ratings highly influence transitive ratings, I want to trace trust graph of well known researchers, out to lesser-known researchers. Start from initial highly-skilled people across a range of alignment-difficulty and intervention-approach opinions; ask them to invite the people they consider high competence to come rate and be rated in this poll.
  
  I’d also want to make it public, so that others can use the results—eg, ask participants to select from dropdown “are you ok with this review being made available to {the internet | other participants in this review process | just the person I’m reviewing and the review admins | just the review admins | nobody please delete my review}”.
  
  I’ve prototyped this with airtable + fillout forms + google cloud run. That stack doesn’t work great because fillout forms is clunky UI for many-to-many; airtable forms couldn’t do many2many at all, but I’d be open to build a better UI as a volunteer and then either run this or hand over code for you to run this. Claude is now able to spit out a working eigentrust impl zero-shot, so it’s mostly a setup question to do anything more, should be a few hours of Claude Code. I’ll see what happens this weekend.
Original motivation: A few months ago, I was drawn to this for making common knowledge of which groups of people cross-rate highly; I suspect most people give pretty low scores for most people, and there will be a few peaks of actual relevance. I’d guess a force-directed layout of the rating review graph would show those clusters, for example. Ideally it would also allow people to select their own taste ratings as a starting set and see what the resulting transitive distribution is. That said, maybe making it public is less important now—with high quality disagreement-bridging posts like Steven Byrnes’ 6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa, the urgency of achieving common knowledge of who’s competent according to what camp seems slightly less. still significant, though, and I think the value of making it public is in general pretty high.
Possible drawbacks: It would be some effort for voters to fill this out even if the UI was nice, but my hunch is that the output will be enough better than single-round unnormalized voting to be worth it. And it might not be much better to do this than just ask alignmentforum, in which case it’d be effort for not much value.
1. ^
  The bracketed part needs refinement—I’d want this to be precise enough that god could resolve a prediction market about it. Needs to connect to both transitive Q1 and also to Q2, and be a clear enough question to be answerable by a skilled alignment researcher.
2. ^
  Claude intro and critique of my proposal with your post and my comment as context; Eigentrust (wikipedia); Eigentrust paper
- rich_anon 26 Dec 2025 15:47 UTC
  10 points
  2
  Parent
  Eigentrust sounds awesome. I was really excited about it when first discovered, but thought that people won’t have a good reason to fill out their trust weights. Grant money allocation could be a perfect motivator.
  I wonder if it’s good to pre-fill the trust weights (e.g. based on AF upvotes history), to make it easier for users (and motivate those who strongly disagree with their defaults)
  Thank you for offering to volunteer with this, I’ll definitely reach out if I decide to run the fellowship with Eigentrust.
  - the gears to ascension 28 Dec 2025 12:16 UTC
    10 points
    0
    Parent
    I’ve made solid progress on putting together a site I’d be happy with, not quite there yet, eta another 24 to 48h. Let me know if you end up not wanting to couple your donations and eigentaste; there’s still clear value in running it, so I’m going to get it done regardless.
  - the gears to ascension 31 Dec 2025 14:28 UTC
    3 points
    0
    Parent
    Sent you a demo. Reasonably close to ready for real use by motivated users, but the human-facing prompting still need refining in order to be properly meaningful.
  - mako yass 26 Dec 2025 20:19 UTC
    3 points
    0
    Parent
    but thought that people won’t have a good reason to fill out their trust weights
    Yeah, I notice that using a transitive quality as the endorsement criterion, and making votes public, produces an incentive for a person to give useful endorsements: Failing to issue informative endorsements would indicate them as not having this transitive quality and so not being worthy of endorsement themselves.
    We can also make it prominent in a person’s profile if, for instance, they’ve strongly endorsed themselves, or if they’ve only endorsed a few people without also doing any abstention endorsements (which redistribute trust back to the current distribution). Some will have an excuse for doing this, most will be able to do better.
    I wonder if it’s good to pre-fill the trust weights (e.g. based on AF upvotes history), to make it easier for users (and motivate those who strongly disagree with their defaults)
    True. Doing that by default, and also doing some of the aforementioned abstention endorsements by default, would address accidental overconfident votes pretty well.
    (Also, howdy, I should probably help with this, I was R&Ding web of trust systems for a while before realising there didn’t seem to be healthy enough hosts for them (they can misbehave if placed in the wrong situations), so I switched to working on extensible social software/forums, to build better hosts. It wasn’t clear to me that the alignment community needed this kind of thing, but I guess it probably does at this point.)