I am hearing something related to decoupling my self-worth from choosing to act in the face of x-risk (or any other moral action). Does that sound right?
I feel like this pairs pretty well with the concept of the inner child in psychology, where you basically give your own “inner child”, which represents your emotions and very basic needs, a voice and try to take care of it. But on a higher level you still make rational decisions. In this context it would basically be “be your own god” I suppose? Accept that your inner child is scared of x-risk, and then treat yourself like you would a child that is scared like that.
I think this is one of those weird things where social pressure can direct you towards the right thing but corrupt your internal prioritzation process in ways that kind of ruin it.
Its kind of interesting how you focus on the difference between inner needs and societal needs. Personally I have never felt a big incentive to follow societal needs, and while I can not recommend that, does not help mental health, I do not feel the x-risk as much as others. I know its there, I know we should work against it and I try to dedicate my work to fighting that, but I dont really think about it emotionally?
I personally think a bit along the lines of “whatever happens happens, I will do my best and not care much about the rest”. And for that its important to properly internalize the goals you have. Most humans main goal is a happy life somehow. Lowering x-risk is important for that, but so is maintaining a healthy work-life balance, mental health, physical health… They all work towards the big goals. I think thats important to realize on a basic level.
And lastly, two more small questions, what is wave and planecrash? And how do you define normie, I feel like thats kind of a tough term.
Nice comprehension of the different takeoff scenarios!
I am no researcher in this area, and I also know I might be wrong about many things in the following. But have doubts about the two above statements.
Evaluating alignment is still manageable right now. We are still smarter than the AI, at least somewhat. However, I do not see a viable path to evaluate the true level of capabilities of AI once it is smarter than us. Once that point is reached, we will only be able to ask questions we do not know the answers to to evaluate how smart the model is, but by definition we also do not know how smart you have to be to answer the questions. Is solving the riemann hypothesis something that is just outside our grasp or is 1000x more intelligence than ours needed? We cant reliably say.
I might be wrong and there is some science or theory that does exactly that, but I do not know of one.
And the same is true with alignment. Once the AI is smarter than us we can not assume that our tests of the model output work anymore. Considering that even right now our tests are seemingly not very good (At least according to the youtube video from AI Explained) and we did not notice for this long, I do not think we will be able to rely on the questionaires we use right now anymore, as it might behave differently if it notices we test it. And it might notice we test it from the first question we ask it.
This means, evaluating alignment research is in fact also incredibly hard. We need to outwit a smarter entity or directly interpret what happens inside the model. To know we missed no cases during that process is harder than devising a test that covers most cases.
The second part I personally wonder a bit about. On the one hand it might be possible that we can use many different AIs for every field that are highly specialized. But that would struggle with connections between those fields, so if we have a chemistry and biology AI we might not fully cover biochemistry. If we have a biochemistry AI we might not fully cover medicine. Then there is food. Once we get to food, we also need to watch physics, like radiation or material sciences.
And in all of that we might still want certain things like let it write a book about Oppenheimer and how he built a nuclear bomb, so it also needs to be able to look at context in terms of an artistic standpoint. It needs to be able to evaluate if it is wrong on purpose out of context, such as a book, or if it seriously attempted to do it but got it wrong this time.
I feel like this can only be achived with another general AI, not a narrow AI that is much more narrow than the tested AI. Else the tested AI might simply get away with a wider “scheme”.
Another slight note about claiming lower models will evaluate higher models, if the current trend of aligned AIs being less capabale than unaligned AIs stays this way, this is a bad idea. You showed lots of linear curves here, but the y axis should be logarithmic in terms of capabilites. This means the distance between gpt5 and gpt6 might be 10x or in a similar region, especially if the smarter model is yet to be aligned and the other model is already reigned in.
As explained earlier, external testing of the model by a less intelligent entity becomes almost impossible in my opinion. I am unsure about how much a finetuned version might be able to close the gap, but my other point also shows that finetuning will only get us so far, as we cant narrow it down too far. For better than AGI (with AGI being as smart as the best human experts in every tasks) we very likely need to fully understand what happens inside the model to align it. But I really hope this is not the case, as I do not see people pausing long enough to seriously put the effort into that.