The statements “LLMs are a normal technology” and “Advanced AI is a normal technology” are completely different. If you think LLMs are not very advanced, it is perfectly valid to believe both that LLMs are a normal technology and that advanced AI is not.
arete
I’ve found a lighting solution that gets you 24000 lumens, requires no installation, can be placed anywhere with an outlet, and looks tolerable. It uses a temporary sale though.
Just combine this sale of 2x 12000 lumen lightbulbs for $30:https://www.amazon.com/dp/B0D7VKXF4R
with 2x of these floor stands for 35$:https://www.walmart.com/ip/Mainstays-71-Black-Floor-Lamp-Modern-Design/12173437
Hopefully others find this useful.
Upon reflection, I agree that my previous comment describes fragility of value.
My mental model is that the standard MIRI position[1] claims the following [2]:
1. Because of the way AI systems are trained, will be large even if we knew humanity’s collective utility function and could target that (this is inner misalignment)
2. Even if were fairly small, this would still result in catastrophic outcomes if is an extremely powerful optimizer (this is fragility of value)
A few questions:
3. Are the claims (1) and (2) accurate representations of inner misalignment and fragility of value?
4. Is the “misgeneralization” claim just ” will be much larger than ”?
If the answer to (4) is yes, I am confused as to why the misgeneralization claim is brought up. It seems that (1) and (2) are sufficient to argue for AI risk.. By contrast, it seems that the misgeneralization claim is neither sufficient nor necessary to make a case for AI risk. Furthermore, the misgeneralization claim seems less likely to be true than (1) and (2).
Also let me know if I am thinking about things in a completely wrong framework and should scrap my made up notation.
Here’s my attempted phrasing, which I think avoids some of the common confusions:
Suppose we have a model with utility function , where is not capable of taking over the world. Assume that thanks to a bunch of alignment work, is within (by some metric) of humanity’s collective utility function. Then in the process of maximizing , ends up doing a bunch of vaguely helpful stuff.
Then someone releases model with utility function , where is capable of taking over the world. Suppose that our alignment techniques generalize perfectly. That is, is also within of humanity’s collective utility function, where Then in the process of maximizing , gets rid of humans and rearranges their molecules to satisfy better.
Does this phrasing seem accurate and helpful?
I found this essay very useful, thank you for writing it!
However, the CPU/GPU analogy doesn’t make sense to me. There are two main points of confusion:
- I don’t know what you mean by “training your GPU software”. What is GPU software? What does it mean to train software?
- If I understood correctly, you think logic is useful for verification and communication of intuitions. However, in analogizing logic to a CPU, you say that logic is useful to facilitate the training of your intuition. It doesn’t sound like verification and communication is part of facilitating training. Am I missing something?
Here’s my attempt at an alternate analogy: “Your intuition is the giant, high-performing neural net while your logic is the interpretable version of it that is derived from the original and sometimes performs worse, but is more communicable and can be inspected more easily”. Maybe this is closer to what you meant? Let me know.