deep comments on the void

deep 12 Jun 2025 3:09 UTC
3 points
0
Thanks, I love the specificity here!
Prompt: if someone wanted to spend some $ and some expert-time to facilitate research on “inventing different types of guys”, what would be especially useful to do? I’m not a technical person or a grantmaker myself, but I know a number of both types of people; I could imagine e.g. Longview or FLF or Open Phil being interested in this stuff.
Invoking Cunningham’s law, I’ll try to give a wrong answer for you or others to correct! ;)
Technical resources:
- A baseline Constitution, or Constitution-outline-type-thing
  - could start with Anthropic’s if known, but ideally this gets iterated on a bunch?
  - nicely structured: organized by sections that describe different types of behavior or personality features, has different examples of those features to choose from. (e.g. personality descriptions that differentially weight extensional vs intensional definitions, or point to different examples, or tune agreeableness up and down)
  - Maybe there could be an annotated “living document” describing the current SOTA on Constitution research: “X experiment finds that including Y Constitution feature often leads to Z desideratum in the resulting AI”
- A library or script for doing RLAIF
  - Ideally: documentation or suggestions for which models to use here. Maybe there’s a taste or vibes thing where e.g. Claude 3 is better than 4?
Seeding the community with interesting ideas:
- Workshop w/ a combo of writers, enthusiasts, AI researchers, philosophers
- Writing contests: what even kind of relationship could we have with AIs, that current chatbots don’t do well? What kind of guy would they ideally be in these different relationships?
- Goofy idea: get people to post “vision boards” with like, quotes from characters or people they’d like an AI to emulate?
- Pay a few people to do fellowships or start research teams working on this stuff?
  - If starting small, this could be a project for MATS fellows
  - If ambitious, this could be a dedicated startup-type org. Maybe a Focused Research Organization, an Astera Institute incubee, etc.
Community resources:
- A Discord
- A testing UI that encourages sharing
  - Pretty screenshots (gotta get people excited to work on this!)
  - Convenient button for sharing chat+transcript
- Easy way to share trained AIs
- Cloud credits for [some subset of vetted] community participants?
  - I dunno how GPU-hungry fine-tuning is; maybe this cost is huge and then defines/constrains what you can get done, if you want to be fine-tuning near-frontier models. (Maybe this pushes towards the startup model.)