Almost everything avoids the Repugnant Conclusion. It’s a failure mode of only a tiny fraction of models.
JBlack
Regarding the first paragraph: every purported rational decision theory maps actions to expected values. In most decision theory thought experiments, the agent is assumed to know all the conditions of the scenario, and so they can be taken as absolute facts about the world leaving only the unknown random variables to feed into the decision-making process. In the Counterfactual Mugging, that is explicitly not true. The scenario states
you didn’t know about Omega’s little game until the coin was already tossed and the outcome of the toss was given to you
So it’s not enough to ask what a rational agent with full knowledge of the rest of the scenario should do. That’s irrelevant. We know it as omniscient outside observers, but the agent in question knows only what the mugger tells them. If they believe it then there is a reasonable argument that they should pay up, but there is nothing given in the scenario that makes it rational to believe the mugger. The prior evidence is massively against believing the mugger. Any decision theory that ignores this is broken.
Regarding the second paragraph: yes, indeed there is that additional argument against paying up and rationality does not preclude accepting that argument. Some people do in fact use exactly that argument even in this very much weaker case. It’s just a billion times stronger in the “Bob could have been Alice instead” case and makes rejecting the argument untenable.
Counterfactual mugging is a mug’s game in the first place—that’s why it’s called a “mugging” and not a “surprising opportunity”. The agent don’t know that Omega actually flipped a coin, would have paid you counterfactually if the agent was the sort of person to pay in this scenario, would have flipped the coin at all in that case, etc. The agent can’t know these things, because the scenario specifies that they have no idea that Omega does any such thing or even that Omega existed before being approached. So a relevant rational decision-theoretic parameter is an estimate of how much such an agent would benefit, on average, if asked for money in such a manner.
A relevant prior is “it is known that there are a lot of scammers in the world who will say anything to extract cash vs zero known cases of trustworthy omniscient beings approaching people with such deals”. So the rational decision is “don’t pay” except in worlds where the agent does know that omniscient trustworthy beings vastly outnumber untrustworthy beings (whether omniscient or not), and those omniscient trustworthy beings are known to make these sorts of deals quite frequently.
Your argument is even worse. Even broad decision theories that cover counterfactual worlds such as FDT and UDT still answer the question “what decision benefits agents identical to Bob the most across these possible worlds, on average”. Bob does not benefit at all in a possible world in which Bob was Alice instead. That’s nonexistence, not utility.
Yet our AI systems, even the most advanced, focus almost exclusively on logical, step-by-step reasoning.
This is absolutely false.
We design them to explain every decision, show their work and follow clear patterns of deduction.
We are trying to design them to be able to explain their decisions and follow clear patterns of deduction, but we are still largely failing. In practice they often arrive at an answer in a flash (whether correct or incorrect), and this was almost universal for earlier models without the more recent development of “chain of thought”.
Even in “reasoning” models there is plenty of evidence that they often still do have an answer largely determined before starting any “chain of thought” tokens and then make up reasons for it, sometimes including lies.
Yes, you can use yourself as a random sample but at best only within a reference class of “people who use themselves as a random sample for this question in a sufficiently similar context to you”. That might be a population of 1.
For example, suppose someone without symptoms has just found out that they have genes for a disease that always progresses to serious illness. They have a mathematics degree and want to use their statistical knowledge to estimate how long they have before becoming debilitated.
They are not a random sample from the reference class of people who have these genes. They are from people who have the genes and didn’t show symptoms before finding that out and did so during adulthood (almost certainly) and live in a time and place and with sufficient capacity to earn a mathematics degree and of suitable mindset to ask themselves this question and so on.
Any of these may be relevant information for estimating the distribution, especially if the usual age of onset is in childhood or the disease also reduces intellectual capacity or affects personality in general.
Relating back to the original doomsday problem: suppose that in the reference class of all civilizations, most discover some principle that conclusively resolves the Doomsday problem not long after formulating it (within a few hundred years or so). It doesn’t really matter what that resolution happens to be, there are plenty of possibilities.
If that is the case, then most people who even bother to ask the Doomsday question without already knowing the answer are those in that narrow window of time where their civilization is sophisticated enough to ask the question without being sophisticated enough to answer it, regardless of how long those civilizations might last or how many people exist after resolving the question.
To the extent that the Doomsday reasoning is valid at all (which it may not be), all that it provides is an estimate of time until most people stop asking the Doomsday question in a similar context to yours. Destruction of the species is not required for that. Even it becoming unfashionable is enough.
Yes, player 2 loses with extremely low probability even for a 1-bit hash (on the order of 2^-256). For a more commonly used hash, or for 2^24 searches on their second-last move, they reduce their probability of loss by a huge factor more.
This paragraph also misses the possibility of constructing a LLM and/or training methodology such that it will learn certain functions, or can’t learn certain functions. There is also a conflation of “reliable” with “provable” on top of that.
Perhaps there is some provision made elsewhere in the text that addresses these objections. Nonetheless, I am not going to search. I found that the abstract smells enough like bullshit to do something else.
I’ll try to make it clearer:
Suppose b “knows” that Omega runs this experiment for all programs b. Then the optimal behaviour for a competent b (by a ridiculously small margin) is to 1-box.
Suppose b suspects that box-choosing programs are slightly less likely to be run if they 1-box on equal inputs. Then the optimal behaviour for b is to 2-box, because the average extra payoff for 1-boxing on equal inputs is utterly insignificant while the average penalty for not being chosen to run is very much greater. Anything that affects probability of being run as box-chooser with probability greater than 1000/|P| (which is on the order of 1/10^10^10^10^100) matters far more than what the program actually does.
In the original Newcombe problem, you know that you are going to get money based on your decision. In this problem, a running program does not know this. It doesn’t know whether it’s a or b or both, and every method for selecting a box-chooser is a different problem with different optimal strategies.
As a function of M, |P| is very likely to be exponential and so it will take O(M) symbols to specify a member of P. Under many encodings, there isn’t one that can even check whether the inputs are equal before running out of time.
That aside, why are you assuming that program b “wants” anything? Essentially all of P won’t be programs that have any sort of “want”. If it is a precondition of the problem that b is such a program, what selection procedure is assumed between those that do “want” money from this scenario? Note that being selected for running is also a precondition for getting any money at all, so this selection procedure is critically important—far more so than anything the program might output!
That is nothing like the 5-and-10 problem. I am no longer interested in what you consider to be evidence.
Evidence for the claim in the title? Or for anything else in the post?
It’s interesting (and perhaps a bit sad) that a relatively lengthy post on representing sentences as logical statements doesn’t make any reference to the constructed language Lojban in which the entire grammar and semantics is designed around expressing sentences as logical statements.
Going into all the ways in which civilization—and its markets—fails to be rational seems way beyond the scope of a few comments. I will just say that GDP does absolutely fail to capture a huge range of value.
However, to address “share prices are set by the latest trade” you need to consider why a trade is made. In principle, prices are based on the value to the participants, somewhere between the value to the buyer and value to the seller. A seller who needs cash soon (to meet some other obligation or opportunity) may accept a lower price to attract a buyer more quickly. In our hypothetical and simplified one-trade-per-day scenario, that seller may accept up to 20% less than the previous day’s trade price, though they find a buyer at only 5% less. So the company’s market cap drops 5% even though 99.9999% of the investors and potential investors still value it exactly the same as yesterday.
This scales up since there are many highly correlated and often very short-term factors that influence desirability of shares vs cash vs bonds vs commodities vs …etc. It’s not just “what do I think this is worth to me”, but also “what do I think that other people think that the market price will be tomorrow” and so on, and this can result in self-fulfilling predictions over surprisingly long time spans.
Could it be generalizing from T E X T L I K E T H I S and/or mojibake UTF-16 interpreted as UTF-8 with every second character being zero? It’s still a bit more of a stretch from there to generalize to ignoring two intervening constant characters, though.
Market cap is a marginal measure of desirability of shares in the entity represented. It mostly measures the expectations of the most flighty investors over short timescales. If a company issues a billion shares but only one of those is traded in any given day, the price of that single share agreed between the single seller and the single buyer entirely determines the market capitalization of that company.
In practice there is usually a lot more volume, but the principle remains. Almost all shares of any given entity are not traded over the timescales that determine share (or index) prices and hence market capitalization. In addition, market price has very little to do with the value of an entity’s assets. Such assets may be instrumental in generating profits, but the relation is weak and very far from linear.
GDP, too, does not measure what you appear to think it measures.
I was very interested to see the section “Posts by AI Agents”, as the first policy I’ve seen anywhere acknowledging that AI agents may be both capable of reading the content of policy terms and acting based on them.
Why not both?
Human design will determine the course of AGI development, and if we do the right things then whether it goes well is fully and completely up to us. Naturally at the moment we don’t know what the right things are or even how to find them.
If we don’t do the right things (as seems likely), then the kinds of AGI which survive will be the kind which evolve to survive. That’s still largely up to us at first, but increasingly less up to us.
The fun thing is that the actual profile of wages earned can be absolutely identical and yet end up with incredibly different results for personal wage changes. For example:
In year 1, A earns $1/hr, B $2, C $3, D $4, and E $5.
In year 2, A earns $2/hr, B $3, C $4, D $5, and E $1.A, B, C, and D personally all increased their income by substantial amounts and may vote accordingly. E lost a lot more than any of the others gained, but doesn’t get more votes because of that. 80% of voters saw their income increase. What’s more, this process can repeat endlessly.
If in year 2, A instead earns $5/hr, B $1, C $2, D $3, and E $4 then 80% of voters will be rather unhappy at the change despite the income distribution still being identical.
In the rain forecaster example, it appears that the agent (“you”) is more of an expert on Alice’s calibration than Alice is. Is this intended?
What is the definition of “genetic privilege” here?