This is an important question. To what degree are both of these (naturally conflicting) goals important to you? How important is making money? How important is increasing AI-safety?
Max TK
I think that’s not an implausible assumption.
However this could mean that some of the things I described might still be too difficult for it to pull them off successfully, so in the case of an early breakout dealing with it might be slightly less hopeless.
Good addition! I even know a few of those “AI rights activists” myself.
Since this here is my first post—would it be considered bad practice to edit my post to include it?
One very problematic aspect of this view that I would like to point out is that in a sense, most ‘more aligned’ AGIs of otherwise equal capability level seem to be effectively ‘more tied down’ versions, so we should assume them to have a lower effective power level than a less aligned AGI that has a shorter list of priorities.
If we imagine both as competing players in a strategy game, it seems that the latter has to follow fewer rules.
Maybe if it happens early there is a chance that it manages to become an intelligent computer virus but is not intelligent enough to further scale its capabilities or produce effective schemes likely to result in our complete destruction. I know I am grasping at straws at this point, but maybe it’s not absolutely hopeless.
The result could be a corrupted infrastructure and a cultural shock strong enough for the people to burn down OpenAI’s headquarters (metaphorically speaking) and AI-accelerating research to be internationally sanctioned.
In the past I have thought a lot about “early catastrophe scenarios”, and while I am not convinced it seemed to me that these might be the most survivable ones.
weakly suggested that more dimensions do reduce demon formation
This also makes a lot of sense intuitively, as it should become more difficult in higher dimensions to construct walls (hills / barriers without holes).
Interesting insight. Sadly there isn’t much to be done against the beliefs of someone who is certain that god will save us.
Maybe the following: Assuming the frame of a believer, the signs of AGI being a dangerous technology seem obvious on closer inspection. If god exists, then we should therefore assume that this is an intentional test he has placed in front of us. God has given us all the signs. God helps those who help themselves.
Isn’t that a response to a completely different kind of argument? I am probably not going to discuss this here, since it seems very off-topic, but if you want I can consider putting it on my list for arguments I might discuss in this form in a future article.
About point 1: I think you are right with that assumption, though I believe that many people repeat this argument without having really a stance on (or awareness of) brain physicalism. That’s why I didn’t hesitate to include it. Still, if you have a decent idea of how to improve this article for people who are sceptical of physicalism, I would like to add it.
About point 2: Yeah you might be right … a reference to OthelloGPT would make it more convincing—I will add it later!
Edit: Still, I believe that “mashup” isn’t even a strictly false characterization of concept composition. I think I might add a paragraph explicitly explaining that and how I think about it.
Good point. I think I will add it later.
I don’t really know what to make of this objection, because I have never seen the stochastic parrot argument applied to a specific, limited architecture as opposed to the general category.
Edit: Maybe make a suggestion of how to rephrase to improve my argument.
the delta for power efficiency is currently ~1000 times in favor of brains ⇒ brain: ~20 W, AGI: ~20kW, kWh in Germany: 0,33 Euro 20 kWh: ~6 Euro ⇒ running our AGI would, if we are assuming that your description of the situation is correct, cost around 6 Euros in energy per hour, which is cheaper than a human worker.
So … while I don’t assume that such estimates need to be correct or apply to an AGI (that doesn’t exist yet) I don’t think you are making a very convincing point so far.
LLMs use 1 or more inner layers, so shouldn’t the proof apply to them?
Based on your phrasing I sense you are trying to object to something here, but it doesn’t seem to have much to do with my article. Is this correct or am I just misunderstanding your point?
Usually between people in international forums, there is a gentlemen’s agreement to not be condescending over things like language comprehension or spelling errors, and I would like to continue this tradition, even though your own paragraphs would offer wide opportunities for me to do the same.
Of the universal approximation theorem
You were the one who made that argument, not me. 🙄
My argument does not depend on the AI being able to survive inside a bot net. I mentioned several alternatives.
#parrotGang
I would be the last person to dismiss the potential relevance understanding value formation and management in the human brain might have for AI alignment research, but I think there are good reasons to assume that the solutions our evolution has resulted in would be complex and not sufficiently robust.
Humans are [Mesa-Optimizers](https://www.alignmentforum.org/tag/mesa-optimization) and the evidence is solid that as a consequence, our alignment with the implicit underlying utility function (reproductive fitness) is rather brittle (i.e. sex with contraceptives, opiate abuse etc. are examples of such “failure points”).
Like others have expressed here before me I would also argue that human alignment has to perform in a very narrow environment which is shared with many very similar agents that are all on (roughly) the same power level. The solutions the human evolution has produced to ensure human semi-alignment is therefore to a significant degree not just a purely neurological one but also a social one.
Whatever these solutions are we should not expect that they will generalize well or that they would be reliable in a very different environment like one of an intelligent actor who has an absolute power monopoly.
This suggests that researching the human mind alone would not yield a technology that is robust enough to use when we have only exactly one shot at getting it right. We need solutions to the aforementioned abstractions and toy models because we probably should try to find a way to build a system that is theoretically safe and not just “probably safe in a narrow environment”.