CS PhD student
Abhimanyu Pallavi Sudhir
conditionalization is not the probabilistic version of implies
P Q Q| P P → Q T T T T T F F F F T N/A T F F N/A T Resolution logic for conditionalization:
Q if P or True
Resolution logic for implies:
Q if P or None
Abhimanyu Pallavi Sudhir’s Shortform
Betting on what is un-falsifiable and un-verifiable
I think that the philosophical questions you’re describing actually evaporate and turn out to be meaningless once you think enough about them, because they have a very anthropic flavour.
I don’t think that’s exactly true. But why do you think that follows from what I wrote?
That’s syntax, not semantics.
It’s really not, that’s the point I made about semantics.Eh that’s kind-of right, my original comment there was dumb.
You overstate your case. The universe contains a finite amount of incompressible information, which is strictly less than the information contained in. That self-reference applies to the universe is obvious, because the universe contains computer programs.The point is the universe is certainly a computer program, and that incompleteness applies to all computer programs (to all things with only finite incompressible information). In any case, I explained Godel with an explicitly empirical example, so I’m not sure what your point is.
I agree, and one could think of this in terms of markets: a market cannot capture all information about the world, because it is part of the world.
But I disagree that this is fundamentally unrelated—here too the issue is that it would need to represent states of the world corresponding to what belief it expresses. Ultimately mathematics is supposed to represent the real world.
Meaningful things are those the universe possesses a semantics for
No, it doesn’t. There is no 1⁄4 chance of anything once you’ve found yourself in Room A1.
You do acknowledge that the payout for the agent in room B (if it exists) from your actions is the same as the payout for you from your own actions, which if the coin came up tails is $3, yes?
I don’t understand what you are saying. If you find yourself in Room A1, you simply eliminate the last two possibilities so the total payout of Tails becomes 6.
If you find yourself in Room A1, you do find yourself in a world where you are allowed to bet. It doesn’t make sense to consider the counterfactual, because you already have gotten new information.
That’s not important at all. The agents in rooms A1 and A2 themselves would do better to choose tails than to choose heads. They really are being harmed by the information.
I see, that is indeed the same principle (and also simpler/we don’t need to worry about whether we “control” symmetric situations).
I don’t think this is right. A superrational agent exploits the symmetry between A1 and A2, correct? So it must reason that an identical agent in A2 will reason the same way as it does, and if it bets heads, so will the other agent. That’s the point of bringing up EDT.
[Question] A way to beat superrational/EDT agents?
Wait, but can’t the AI also choose to adopt the strategy “build another computer with a larger largest computable number”?
I don’t understand the significance of using a TM—is this any different from just applying some probability distribution over the set of actions?
Suppose the function U(t) is increasing fast enough, e.g. if the probability of reaching t is exp(-t), then let U(t) be exp(2t), or whatever.
I don’t think the question can be dismissed that easily.
current LLMs vs dangerous AIs
Most current “alignment research” with LLMs seems indistinguishable from “capabilities research”. Both are just “getting the AI to be better at what we want it to do”, and there isn’t really a critical difference between the two.
Alignment in the original sense was defined oppositionally to the AI’s own nefarious objectives. Which LLMs don’t have, so alignment research with LLMs is probably moot.
something related I wrote in my MATS application:
I think the most important alignment failure modes occur when deploying an LLM as part of an agent (i.e. a program that autonomously runs a limited-context chain of thought from LLM predictions, maintains a long-term storage, calls functions such as search over storage, self-prompting and habit modification either based on LLM-generated function calls or as cron-jobs/hooks).
These kinds of alignment failures are (1) only truly serious when the agent is somehow objective-driven or equivalently has feelings, which current LLMs have not been trained to be (I think that would need some kind of online learning, or learning to self-modify) (2) can only be solved when the agent is objective-driven.