I have a weird AI-related idea that might be relevant to capabilities, alignment, or both. It has to do with how to get current AI systems to interact with the world in a more humanlike way, without novel AI architectures. I’m not inclined to post it publicly, because it might actually be a capabilities advancement. But I’m skeptical of the thought that I could have actually come up with a capabilities advancement. I’m aware of the crank attractor. My prior is that if I described this idea to someone who actually works in the field, they would say “oh yeah, we tried that, it didn’t do anything interesting.” But maybe not.
Should I —
Post it publicly here
Tell a friend who is closer to AI research than I am
Email it to MIRI with a lot of exclamation marks and caps lock
Ask Claude about it, and do whatever Claude says to do
Spend a week+ coming up with ways to test this idea myself
5 is obviously the ‘best’ answer, but is also a pretty big imposition on you, especially for something this speculative. 6 is a valid and blameless—if not actively praiseworthy—default. 2 is good if you have a friend like that and are reasonably confident they’d memoryhole it if it’s dangerous and expect them to be able to help (though fwiw I’d wager you’d get less helpful input this way than you’d expect: no one person knows everything about the field so you can’t guarantee they’d know if/how it’s been done, and inferential gaps are always larger than you expect so explaining it right might be surprisingly difficult/impossible).
I think the best algorithm would be along the lines of:
5 iff you feel like being nice and find yourself with enough spare time and energy
. . . and if you don’t . . .
7, where the ‘something else’ is posting the exact thing you just posted and seeing if any trustworthy AI scientists DM you about it
6 isn’t always the best answer, but it is sometimes the best answer, and we are sorely lacking an emotional toolkit to feel good about picking 6 intentionally when it’s the best answer. In particular, we don’t have any way of measuring how often the world has been saved by quiet, siloed coordination around 6- probably even the people, if they exist, who saved the world via 6 don’t know that they did so. Part of the price of 6 is never knowing. You don’t get to be a lone hero either, many people will have any given idea and they all have to dismiss it, or the defector gets much money and praise. However, many is smaller than infinity- maybe 30 people in the 80s spotted the same brilliant trick with nukes or bioweapons with concerning sequelae, none defected, life continued. We got through a lot of crazy discoveries in the cold war pretty much unscathed, which is a point of ongoing confusion.
I have a weird AI-related idea that might be relevant to capabilities, alignment, or both. It has to do with how to get current AI systems to interact with the world in a more humanlike way, without novel AI architectures. I’m not inclined to post it publicly, because it might actually be a capabilities advancement. But I’m skeptical of the thought that I could have actually come up with a capabilities advancement. I’m aware of the crank attractor. My prior is that if I described this idea to someone who actually works in the field, they would say “oh yeah, we tried that, it didn’t do anything interesting.” But maybe not.
Should I —
Post it publicly here
Tell a friend who is closer to AI research than I am
Email it to MIRI with a lot of exclamation marks and caps lock
Ask Claude about it, and do whatever Claude says to do
Spend a week+ coming up with ways to test this idea myself
Do nothing and forget about it
Something else
5 is obviously the ‘best’ answer, but is also a pretty big imposition on you, especially for something this speculative. 6 is a valid and blameless—if not actively praiseworthy—default. 2 is good if you have a friend like that and are reasonably confident they’d memoryhole it if it’s dangerous and expect them to be able to help (though fwiw I’d wager you’d get less helpful input this way than you’d expect: no one person knows everything about the field so you can’t guarantee they’d know if/how it’s been done, and inferential gaps are always larger than you expect so explaining it right might be surprisingly difficult/impossible).
I think the best algorithm would be along the lines of:
5 iff you feel like being nice and find yourself with enough spare time and energy
. . . and if you don’t . . .
7, where the ‘something else’ is posting the exact thing you just posted and seeing if any trustworthy AI scientists DM you about it
. . . and if they don’t . . .
6
I’m curious to see what other people say.
The answer I followed ended up being 2 into 6.
6 isn’t always the best answer, but it is sometimes the best answer, and we are sorely lacking an emotional toolkit to feel good about picking 6 intentionally when it’s the best answer. In particular, we don’t have any way of measuring how often the world has been saved by quiet, siloed coordination around 6- probably even the people, if they exist, who saved the world via 6 don’t know that they did so. Part of the price of 6 is never knowing. You don’t get to be a lone hero either, many people will have any given idea and they all have to dismiss it, or the defector gets much money and praise. However, many is smaller than infinity- maybe 30 people in the 80s spotted the same brilliant trick with nukes or bioweapons with concerning sequelae, none defected, life continued. We got through a lot of crazy discoveries in the cold war pretty much unscathed, which is a point of ongoing confusion.