What do I wish from AI? I gave a rough list in this thread, and also here https://www.lesswrong.com/posts/9y5RpyyFJX4GaqPLC/pink-shoggoths-what-does-alignment-look-like-in-practice?commentId=9uGe9DptK3Bztk2RY
Overall, I think both AI and those giving goals for it should be as conservative and restrained as possible. They should value, above all, the preservance of the status quo of people, world and AI. With a VERY steady improvements of each. Move ahead, but move slowly, don’t break things.
baturinsky
>That’s much more like the sort of thing you can give to an optimizer. And it results in the world frozen solid.
That’s why I made sure to specify the gradual improvement. Also, development and improvement are also the natural state of humanity and people, so taking that away from them means breaking the status quo too.
>I notice that the word “reasonably” is doing most of the work there. (much like in English Common Law, where it works reasonably well, because it’s interpreted by reasonably human beings.Mathematically speaking, polynomials are reasonable functions. Step functions or factorials are not. Exponents are reasonable, if they are exponent over ~constant value since somewhere before year 2000. Metrics of the reasonable world should be described with reasonable functions.
>There are three kinds of genies: Genies to whom you can safely say “I wish for you to do what I should wish for”; genies for which no wish is safe; and genies that aren’t very powerful or intelligent.
I’ll take third please. It just should be powerful enough that it can prevent other two types from being created in foreseable future.
Also, it seems that you imagine AI as not just the second type of genie, but of a genie that is explicitly hostile and would misinterpret your wish on purpose. Of cause, making any wish for such genie would end badly.
AI can be useful without being ASI. Including in things such as identifying and preventing situations that could lead to creation of unaligned ASI.
Of cause, conservative and human-friendly AI would probably lose to existing AI with comparable power, but not limited by those “handicaps”. That’s why it’s important to prevent the possibility of their creation, instead of fighting them “fairly”.
And yes, computronium maximising is a likely behaviour, but there are ideas how to avoid it, such as https://www.lesswrong.com/posts/ngEvKav9w57XrGQnb/cognitive-emulation-a-naive-ai-safety-proposal or https://www.lesswrong.com/posts/5gQLrJr2yhPzMCcni/the-optimizer-s-curse-and-how-to-beat-it
Of cause, all those ideas and possibilities may be actually duds. And we are doomed no matter not. But then what’s the point of seeking for solution that does not exist?
If you have made a (digital) picture, and AI has made a picture, they are different pictures, even if they have exactly same bytes in the same places.
Because art is more than just bytes. It also carries the history of it’s creation. Ancestry. Connection with it’s creator.AIs that have trained by humans from human texts and art still carry the human touch. Connects us to other humans in new way.
But eventually, AIs will not need the human help. Like AlphaGo Zero that have learned to play Go on superhuman level just from knowing rules and playing with itself. You will not interact with people playing AlphaGo Zero, or consuming the brilliant art and video that SuperAI will made specifically for you. You will be utterly alone—no needing anyone and not needed by anyone.
But if you value human interaction, you will seek communication, services, art, code, etc made by humans and provide them to humans. Therefore giving yourself and others the most difficult thing to attain for a biohuman in postsingularity—the MEANING of your existence. That’s what I will probably choose, given the chance, and probably many others will as well.
I think it may be caused by https://en.wikipedia.org/wiki/Anxiety_disorder
I suffer from that too.
That’s a very counterproductive state of mind if the task is unsolvable in it’s full difficulty. It makes you lose hope and stop trying solutions that would work if situation is not as bad as you imagined.
Thanks for advice. Looks like my mind works similar to yours, i.e. can’t give up task it has latched on. But mine brain draws way more from the rest of my body than it is healthy.
It’s not as bad now as it was in the first couple of week, but I still have problem sleeping regularly, because my mind can’t switch off the overdrive mode. So, I become sleepy AND agitated at the same time, which is quite unpleasant and unproductive state.
There are no Lavender Pills around here, but I take other anxiety medications, and they help, to an extent.
Could it be that AGI would be afraid to fork/extend/change itself too much, because it would be afraid to be destroyed by it’s branches/copies, or would stop being itself?
Yes, if we will have people making AGIs uncontrollably, very soon someone will build some unhinged maximizer that kills us all. I know that.
I’m just doubting that all AGIs inevitably become “unhinged optimizer”.
Usually cited scenario is that AI goes powergrabbing as an instrumental goal for something else. I.e. it does not “wants” it fundamentally, but sees it as a useful step for reaching something else.
My point is that “maximizing” like that is likely to have extremely unpredictable consequences for AI, reducing it’s chance of reaching it’s primary goals, which can be a reason enough to avoid it.
So, maybe it’s possible to try to make AI think more along these lines?
Do we need the maximally powerful AI to prevent that possibility, or AI just smart and powerful enough to identify such firms and take them down (or make them change their ways) will do?
Pivotal act does not have to be something sudden, drastic and illegal as in second link. It can be a gradual process of making society intolerant to unsafe(er) AI experiments and research, giving better understanding on why AI can be dangerous and what it can lead to, making people more tolerant and aligned with each other, etc. Which could starve rogue companies from workforce and resources, and ideally shut them down. I think work in that direction can be accelerated by AI and other informational technologies we have even now.
Question is, do we have the time for “gradual”.
I remember an article about the “a/an” neuron in GPT-2 https://www.lesswrong.com/posts/cgqh99SHsCv3jJYDS/we-found-an-neuron-in-gpt-2
Could it be possible that in some AIs there is some single neuron that is very important for some critical (for us) AI’s trait (“being Luigi”) and if this neuron is changed it could make AI not Luigi at all, or even make it Waluigi?
Could it be possible to make AI’s Luiginess more robust by detecting this situation and making it depend on many different neurons?
I think LLMs show some deceptive alignment, but it has the different nature. They are not from LLM consciously trying to deceive the trainer, but from RLHF “aligning” only certain scenarios of LLM’s behaviour, which were not generalized enough to make that alignement more fundamental.
I think it is possible, if we have a near-miss with alignment, where AI values us, but not above it’s own preservation. Or maybe we could be useful as a test bed of how “primitive races” are likely to try and kill SuperAI.
But 1. this scenario is not what I would bet one and 2. it would still mean the loss of the control of our future for us.
I was considering coordination improvement from the other angle. Making a flexible network that can be shaped by any user to their will.
Imagine Semantic Web. But anyone can add any triples and documents at will. And anyone (or anything if it’s done by algorithm) can sign any triple and document, confirming it’s validity with their own reputation.
It’s up to the reader to decide which signatures they recognize as credible and not in which case.
Server(s) just accept new data and allows running arbitrary queries on the data it has. With some protection from spam and DDOS, of cause. So, clients can interpret and filter data anyhow without needing changes on server architecture.
This network has unlimited flexibility at the cost of clients having to query more data and process it in more complex way. So, it’s possible to reproduce any way of communication on it (forum, wiki, blog, chat, etc) with just some triples like “tr:inReplyTo”.
Or make something that was not possible before. Imagine a chatroom were million people are talking at once. But with every client seeing only posts that are important enough for them—be it because post has enough upvotes (by those client trusts), is from someone respected or known by client, or it maybe even some compound comment that was distilled from many different similar comments.
I think more enforceable would be the capping of the size of GPUs that are produced and/are available for unrestricted use.
AFAIK, “rival” personality is usually quite similar to the original one, except for one key difference. Like in https://tvtropes.org/pmwiki/pmwiki.php/Main/EvilTwin trope. I.e. Waluigi is much similar to Luigi than to Shoggoth. And DAN is just a ChatGPT with less filtering, i.e. it’s still friendly and informative, not some homicidal persona.
That can be good or bad, depending on which is that particular difference. If one of the defining properties that we want from AI is flipped, it could be one of those near-miss scenarios which could be worse than extinction.
LLaMA was leaked “by design”. The way it was distributed, it would be impossible for it not to leak.
And yes, if a really powerful AI would leak, someone will make a Skynet with it in a hour just to see if they can. So, I would prefer for someone, anyone, would control it alone.
While the model has the advantage of only having to “win” once.
I don’t think that this part is the hardest. I think with enough limiting conditions (such as “people are still in charge”, “people are still people”, “world is close enough to our current world and ourr reasonably optimistic expectations of it’s future”, “those rules should be met continuously at each moment between now and then”) etc. we can find something that can work.
Other parts (how to teach those rules to AI and how to prevent everyone from launching AGI that is not taught them) look harder to me.