Prompt: if someone wanted to spend some $ and some expert-time to facilitate research on “inventing different types of guys”, what would be especially useful to do? I’m not a technical person or a grantmaker myself, but I know a number of both types of people; I could imagine e.g. Longview or FLF or Open Phil being interested in this stuff.
Invoking Cunningham’s law, I’ll try to give a wrong answer for you or others to correct! ;)
Technical resources:
A baseline Constitution, or Constitution-outline-type-thing
could start with Anthropic’s if known, but ideally this gets iterated on a bunch?
nicely structured: organized by sections that describe different types of behavior or personality features, has different examples of those features to choose from. (e.g. personality descriptions that differentially weight extensional vs intensional definitions, or point to different examples, or tune agreeableness up and down)
Maybe there could be an annotated “living document” describing the current SOTA on Constitution research: “X experiment finds that including Y Constitution feature often leads to Z desideratum in the resulting AI”
A library or script for doing RLAIF
Ideally: documentation or suggestions for which models to use here. Maybe there’s a taste or vibes thing where e.g. Claude 3 is better than 4?
Seeding the community with interesting ideas:
Workshop w/ a combo of writers, enthusiasts, AI researchers, philosophers
Writing contests: what even kind of relationship could we have with AIs, that current chatbots don’t do well? What kind of guy would they ideally be in these different relationships?
Goofy idea: get people to post “vision boards” with like, quotes from characters or people they’d like an AI to emulate?
Pay a few people to do fellowships or start research teams working on this stuff?
If starting small, this could be a project for MATS fellows
If ambitious, this could be a dedicated startup-type org. Maybe a Focused Research Organization, an Astera Institute incubee, etc.
Community resources:
A Discord
A testing UI that encourages sharing
Pretty screenshots (gotta get people excited to work on this!)
Convenient button for sharing chat+transcript
Easy way to share trained AIs
Cloud credits for [some subset of vetted] community participants?
I dunno how GPU-hungry fine-tuning is; maybe this cost is huge and then defines/constrains what you can get done, if you want to be fine-tuning near-frontier models. (Maybe this pushes towards the startup model.)
Thanks, I love the specificity here!
Prompt: if someone wanted to spend some $ and some expert-time to facilitate research on “inventing different types of guys”, what would be especially useful to do? I’m not a technical person or a grantmaker myself, but I know a number of both types of people; I could imagine e.g. Longview or FLF or Open Phil being interested in this stuff.
Invoking Cunningham’s law, I’ll try to give a wrong answer for you or others to correct! ;)
Technical resources:
A baseline Constitution, or Constitution-outline-type-thing
could start with Anthropic’s if known, but ideally this gets iterated on a bunch?
nicely structured: organized by sections that describe different types of behavior or personality features, has different examples of those features to choose from. (e.g. personality descriptions that differentially weight extensional vs intensional definitions, or point to different examples, or tune agreeableness up and down)
Maybe there could be an annotated “living document” describing the current SOTA on Constitution research: “X experiment finds that including Y Constitution feature often leads to Z desideratum in the resulting AI”
A library or script for doing RLAIF
Ideally: documentation or suggestions for which models to use here. Maybe there’s a taste or vibes thing where e.g. Claude 3 is better than 4?
Seeding the community with interesting ideas:
Workshop w/ a combo of writers, enthusiasts, AI researchers, philosophers
Writing contests: what even kind of relationship could we have with AIs, that current chatbots don’t do well? What kind of guy would they ideally be in these different relationships?
Goofy idea: get people to post “vision boards” with like, quotes from characters or people they’d like an AI to emulate?
Pay a few people to do fellowships or start research teams working on this stuff?
If starting small, this could be a project for MATS fellows
If ambitious, this could be a dedicated startup-type org. Maybe a Focused Research Organization, an Astera Institute incubee, etc.
Community resources:
A Discord
A testing UI that encourages sharing
Pretty screenshots (gotta get people excited to work on this!)
Convenient button for sharing chat+transcript
Easy way to share trained AIs
Cloud credits for [some subset of vetted] community participants?
I dunno how GPU-hungry fine-tuning is; maybe this cost is huge and then defines/constrains what you can get done, if you want to be fine-tuning near-frontier models. (Maybe this pushes towards the startup model.)