arete

Karma: 33

arete 27 Oct 2025 3:10 UTC
2 points
0
on: Give Me Your Data: The Rationalist Mind Meld
I found this essay very useful, thank you for writing it!

However, the CPU/GPU analogy doesn’t make sense to me. There are two main points of confusion:
- I don’t know what you mean by “training your GPU software”. What is GPU software? What does it mean to train software?
- If I understood correctly, you think logic is useful for verification and communication of intuitions. However, in analogizing logic to a CPU, you say that logic is useful to facilitate the training of your intuition. It doesn’t sound like verification and communication is part of facilitating training. Am I missing something?

Here’s my attempt at an alternate analogy: “Your intuition is the giant, high-performing neural net while your logic is the interpretable version of it that is derived from the original and sometimes performs worse, but is more communicable and can be inspected more easily”. Maybe this is closer to what you meant? Let me know.

arete 25 Oct 2025 18:59 UTC
15 points
12
on: arete’s Shortform
The statements “LLMs are a normal technology” and “Advanced AI is a normal technology” are completely different. If you think LLMs are not very advanced, it is perfectly valid to believe both that LLMs are a normal technology and that advanced AI is not.

arete 21 Oct 2025 16:37 UTC
5 points
0
on: arete’s Shortform
I’ve found a lighting solution that gets you 24000 lumens, requires no installation, can be placed anywhere with an outlet, and looks tolerable. It uses a temporary sale though.

Just combine this sale of 2x 12000 lumen lightbulbs for $30:
https://www.amazon.com/dp/B0D7VKXF4R

with 2x of these floor stands for 35$:
https://www.walmart.com/ip/Mainstays-71-Black-Floor-Lamp-Modern-Design/12173437

Hopefully others find this useful.

arete 1 Oct 2025 5:02 UTC
2 points
1
in reply to: MalcolmMcLeod’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Upon reflection, I agree that my previous comment describes fragility of value.

My mental model is that the standard MIRI position^[1] claims the following ^[2]:
1. Because of the way AI systems are trained, $δ, δ^{'}$ will be large even if we knew humanity’s collective utility function and could target that (this is inner misalignment)
2. Even if $δ^{'}$ were fairly small, this would still result in catastrophic outcomes if $M^{'}$ is an extremely powerful optimizer (this is fragility of value)

A few questions:
3. Are the claims (1) and (2) accurate representations of inner misalignment and fragility of value?
4. Is the “misgeneralization” claim just ” $δ^{'}$ will be much larger than $δ$ ”?

If the answer to (4) is yes, I am confused as to why the misgeneralization claim is brought up. It seems that (1) and (2) are sufficient to argue for AI risk.. By contrast, it seems that the misgeneralization claim is neither sufficient nor necessary to make a case for AI risk. Furthermore, the misgeneralization claim seems less likely to be true than (1) and (2).

Also let me know if I am thinking about things in a completely wrong framework and should scrap my made up notation.
1. ^
  There’s probably a better name for this. Please suggest one!
2. ^
  Non-exhaustive list.

arete 30 Sep 2025 3:05 UTC
6 points
−5
in reply to: Eliezer Yudkowsky’s comment on: A non-review of “If Anyone Builds It, Everyone Dies”
Here’s my attempted phrasing, which I think avoids some of the common confusions:

Suppose we have a model $M$ with utility function $ϕ$ , where $M$ is not capable of taking over the world. Assume that thanks to a bunch of alignment work, $ϕ$ is within $δ$ (by some metric) of humanity’s collective utility function. Then in the process of maximizing $ϕ$ , $M$ ends up doing a bunch of vaguely helpful stuff.

Then someone releases model $M^{'}$ with utility function $ϕ^{'}$ , where $M^{'}$ is capable of taking over the world. Suppose that our alignment techniques generalize perfectly. That is, $ϕ^{'}$ is also within $δ^{'}$ of humanity’s collective utility function, where $δ^{'} \leq δ$ Then in the process of maximizing $ϕ^{'}$ , $M^{'}$ gets rid of humans and rearranges their molecules to satisfy $ϕ^{'}$ better.

Does this phrasing seem accurate and helpful?