Signer

Karma: 574

Signer Jan 3, 2025, 12:19 PM
1 point
0
on: AI #97: 4

It is learning helpfulness now, while the best way to hit the specified ‘helpful’ target is to do straightforward things in straightforward ways that directly get you to that target. Doing the kinds of shenanigans or other more complex strategies won’t work.

Best by what metric? And I don’t think it was shown, that complex strategies won’t work—learning to change behaviour from training to deployment is not even that complex.

Signer Dec 28, 2024, 1:18 PM
1 point
−1
on: The Field of AI Alignment: A Postmortem, and What To Do About It

But it is important, and this post just isn’t going to get done any other way.

Speaking about streetlighting...

Signer Dec 18, 2024, 3:06 PM
2 points
0
in reply to: Roko’s comment on: Is AI alignment a purely functional property?
What makes it rational is that there is an actual underlying hypothesis about how weather works, instead of vague “LLMs are a lot like human uploads”. And weather prediction outputs numbers connected to reality we actually care about. And there is no alternative credible hypothesis that implies weather prediction not working.

I don’t want to totally dismiss empirical extrapolations, but given the stakes, I would personally prefer for all sides to actually state their model of reality and how they think evidence changed it’s plausibility, as formally as possible.

Signer Dec 15, 2024, 11:47 PM
4 points
2
on: Is AI alignment a purely functional property?
There is no such disagreement, you just can’t test all inputs. And without knowledge of how internals work, you may me wrong about extrapolating alignment to future systems.

Signer Nov 25, 2024, 2:15 AM
5 points
6
on: Are You More Real If You’re Really Forgetful?
Yes, except I would object to phrasing this anthropic stuff as “we should expect ourselves to be agents that exist in a universe that abstracts well” instead of “we should value universe that abstracts well (or other universes that contain many instances of us)”—there is no coherence theorems that force summation of your copies, right? And so it becomes apparent that we can value some other thing.

Also even if you consider some memories a part of your identity, you can value yourself slightly less after forgetting them, instead of only having threshold for death.

Signer Nov 24, 2024, 10:30 AM
1 point
0
in reply to: Jonah Wilberg’s comment on: Ethical Implications of the Quantum Multiverse
It doesn’t matter whether you call your multiplier “probability” or “value” if it results in your decision to not care about low-measure branch. The only difference is that probability is supposed to be about knowledge, and Wallace’s argument involving arbitrary assumption, not only physics, means it’s not probability, but value—there is no reason to value knowledge of your low-measure instances less.

this makes decision theory and probably consequentialist ethics impossible in your framework

It doesn’t? Nothing stops you from making decisions in a world where you are constantly splitting. You can try to maximize splits of good experiences or something. It just wouldn’t be the same decisions you would make without knowledge of splits, but why new physical knowledge shouldn’t change your decisions?

Signer Nov 23, 2024, 1:24 PM
1 point
0
in reply to: Jonah Wilberg’s comment on: Ethical Implications of the Quantum Multiverse

Things like lions, and chairs are other examples.

And counted branches.

This is how Wallace defines it (he in turn defines macroscopically indistinguishable in terms of providing the same rewards). It’s his term in the axiomatic system he uses to get decision theory to work. There’s not much to argue about here?

His definition leads to contradiction with informal intuition that motivates consideration of macroscopical indistinguishability in the first place.

We should care about low-measure instances in proportion to the measure, just as in classical decision theory we care about low-probability instances in proportion to the probability.

Why? Wallace’s argument is just “you don’t care about some irrelevant microscopic differences, so let me write this assumption that is superficially related to that preference, and here—it implies the Born rule”. Given MWI, there is nothing wrong physically or rationally in valuing your instances equally whatever their measure is. Their thoughts and experiences don’t depend on measure the same way they don’t depend on thickness or mass of a computer implementing them. You can rationally not care about irrelevant microscopic differences and still care about number of your thin instances.

Signer Nov 22, 2024, 12:19 PM
7 points
2
on: LLM chatbots have ~half of the kinds of “consciousness” that humans believe in. Humans should avoid going crazy about that.
How many notions of consciousness do you think are implementable by a short Python program?

Signer Nov 22, 2024, 9:18 AM
3 points
2
in reply to: Jonah Wilberg’s comment on: Ethical Implications of the Quantum Multiverse
Because scale doesn’t matter—it doesn’t matter if you are implemented on thick or narrow computer.

First of all, macroscopical indistinguishability is not fundamental physical property—branching indifference is additional assumption, so I don’t see how it’s not as arbitrary as branch counting.

But more importantly, branching indifference assumption is not the same as informal “not caring about macroscopically indistinguishable differences”! As Wallace showed, branching indifference implies the Born rule implies you almost shouldn’t care about you in a branch with a measure of 0.000001 even though it may involve drastic macroscopic difference for you in that branch. You being macroscopic doesn’t imply you shouldn’t care about your low-measure instances.

Signer Nov 21, 2024, 4:20 PM
4 points
2
in reply to: Jonah Wilberg’s comment on: Ethical Implications of the Quantum Multiverse
But why would you want to remove this arbitrariness? Your preferences are fine-grained anyway, so why retain classical counting, but deny counting in the space of wavefunction? It’s like saying “dividing world into people and their welfare is arbitrary—let’s focus on measuring mass of a space region”. The point is you can’t remove all decision-theoretic arbitrariness from MWI—“branching indifference” is just arbitrary ethical constraint that is equivalent to valuing measure for no reason, and without it fundamental physics, that works like MWI, does not prevent you from making decisions as if quantum immortality works.

Signer Nov 20, 2024, 4:28 PM
2 points
0
on: Ethical Implications of the Quantum Multiverse

“Decoherence causes the Universe to develop an emergent branching structure. The existence of this branching is a robust (albeit emergent) feature of reality; so is the mod-squared amplitude for any macroscopically described history. But there is no non-arbitrary decomposition of macroscopically-described histories into ‘finest-grained’ histories, and no non-arbitrary way of counting those histories.”

Importantly though, on this approach it is still possible to quantify the combined weight (mod-squared amplitude) of all branches that share a certain macroscopic property, e.g. by saying:

“Tomorrow, the branches in which it is sunny will have combined weight 0.7”

There is no non-arbitrary definition of “sunny”. If you are fine with approximations, then you can also decide on decomposition of wavefunction into some number of observers—it’s the same problem as decomposing classical world that allows physical splitting of thick computers according to macroscopic property “number of people”.

Signer Nov 2, 2024, 11:29 AM
3 points
2
on: Set Theory Multiverse vs Mathematical Truth—Philosophical Discussion

Even if we can’t currently prove certain axioms, doesn’t this just reflect our epistemological limitations rather than implying all axioms are equally “true”?

It doesn’t and they are fundamentally equal. The only reality is the physical one—there is no reason to complicate your ontology with platonically existing math. Math is just a collection of useful templates that may help you predict reality and that it works is always just a physical fact. Best case is that we’ll know true laws of physics and they will work like some subset of math and then axioms of physics would be actually true. You can make a guess about what axioms are compatible with true physics.

Also there is Shoenfield’s absoluteness theorem, which I don’t understand, but which maybe prevents empirical grounding of CH?

Signer Nov 1, 2024, 6:31 PM
4 points
3
in reply to: Simon Lermen’s comment on: The Compendium, A full argument about extinction risk from AGI
It sure doesn’t seem to generalize in GPT-4o case. But what’s the hypothesis for Sonnet 3.5 refusing in 85% of cases? And CoT improving score and o1 being better in browser suggests the problem is in models not understanding consequences, not in them not trying to be good. What’s the rate of capability generalization to agent environment? Are we going to conclude that Sonnet is just demonstrates reasoning, instead of doing it for real, if it solves only 85% of tasks it correctly talks about?

Also, what’s the rate of generalization of unprompted problematic behaviour avoidance? It’s much less of a problem if your AI does what you tell it to do—you can just don’t give it to users, tell it to invent nanotechnology, and win.

Signer Oct 31, 2024, 7:52 PM
1 point
0
on: AI #88: Thanks for the Memos

GPT-4 is insufficiently capable, even if it were given an agent structure, memory and goal set to match, to pull off a treacherous turn. The whole point of the treacherous turn argument is that the AI will wait until it can win to turn against you, and until then play along.

I don’t get why actual ability matters. It’s sufficiently capable to pull it off in some simulated environments. Are you claiming that we can’t decieve GPT-4 and it is actually waiting and playing along just because it can’t really win?

Signer Oct 31, 2024, 4:32 PM
13 points
2
on: The Compendium, A full argument about extinction risk from AGI

Whack-A-Mole fixes, from RLHF to finetuning, are about teaching the system to not demonstrate problematic behavior, not about fundamentally fixing that behavior.

Based on what? Problematic behavior avoidance does actually generalize in practice, right?

Signer Oct 1, 2024, 7:50 PM
1 point
0
in reply to: TAG’s comment on: Alexander Gietelink Oldenziel’s Shortform

Not at all. The problem is that their observations would mostly not be in a classical basis.

I phrased it badly, but what I mean is that there is a simulation of Hilbert space, where some regions contain patterns that can be interpreted as observers observing something, and if you count them by similarity, you won’t get counts consistent with Born measure of these patterns. I don’t think basis matters in this model, if you change basis for observer, observations and similarity threshold simultaneously? Change of basis would just rotate or scale patterns, without changing how many distinct observers you can interpret them as, right?

??

Collapse or reality fluid. The point of mangled worlds or some other modification is to evade postulating probabilities on the level of physics.

Signer Oct 1, 2024, 6:30 PM
1 point
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform
https://mason.gmu.edu/~rhanson/mangledworlds.html

I mean that if turing machine is computing universe according to the laws of quantum mechanics, observers in such universe would be distributed uniformly, not by Born probability. So you either need some modification to current physics, such as mangled worlds, or you can postulate that Born probabilities are truly random.

Signer Oct 1, 2024, 12:41 PM
1 point
0
in reply to: Alexander Gietelink Oldenziel’s comment on: Alexander Gietelink Oldenziel’s Shortform

Our observations are compatible with a world that is generated by a Turing machine with just a couple thousand bits.

Yes, but this is kinda incompatible with QM without mangled worlds.

Signer Sep 26, 2024, 5:59 PM
3 points
0
in reply to: Steven Byrnes’s comment on: [Intuitive self-models] 2. Conscious Awareness

Imagining two apples is a different thought from imagining one apple, right?

I mean, is it? Different states of the whole cortex are different. And the cortex can’t be in a state of imagining only one apple and, simultaneously, be in a state of imagining two apples, obviously. But it’s tautological. What are we gaining from thinking about it in such terms? You can say the same thing about the whole brain itself, that it can only have one brain-state in a moment.

I guess there is a sense in which other parts of the brain have more various thoughts relative to what cortex can handle, but, like you said, you can use half of cortex capacity, so why not define song and legal document as different thoughts?

As abstract elements of provisional framework cortex-level thoughts are fine, I just wonder what are you claiming about real constrains, aside from “there limits on thoughts”. because, for example, you need other limits anyway—you can’t think arbitrary complex thought even if it is intuitively cohesive. But yeah, enough gory details.

On the other hand, I can’t have two songs playing in my head simultaneously, nor can I be thinking about two unrelated legal documents simultaneously.

I can’t either, but I don’t see just from the architecture why it would be impossible in principle.

Again, I think autoassociative memory / attractor dynamics is a helpful analogy here. If I have a physical instantiation of a Hopfield network, I can’t query 100 of its stored patterns in parallel, right? I have to do it serially.

Yes, but you can theoretically encode many things in each pattern? Although if your parallel processes need different data, one of them will have to skip some responses… Would be better to have different networks, but I don’t see brain providing much isolation. Well, it seems to illustrate complications of parallel processing that may played a role in humans usually staying serial.

Signer Sep 26, 2024, 9:33 AM
4 points
0
on: [Intuitive self-models] 2. Conscious Awareness
I still don’t get this “only one thing in awareness” thing. There are multiple neurons in cortex and I can imagine two apples—in what sense there can only be one thing in awareness?

Or equivalently, it corresponds equally well to two different questions about the territory, with two different answers, and there’s just no fact of the matter about which is the real answer.

Obviously the real answer is the model which is more veridical^^. The latter hindsight model is right not about the state of the world at t=0.1, but about what you thought about the world at t=0.1 later.