There is no link to the actual essay in this LW post, and Google does not appear to have indexed it: an exact phrase search for some key phrases return zero results. Can you provide the link?
JBlack
Most of the book was written in 2020 or earlier, which makes it ancient in terms of technical advances and social recognition of AI concerns. I would say that the paragraph is correct as of the date of writing, where it talks about the non-technical articles generally circulated in the media at the time.
For example, not even GPT-2 is mentioned until page 422 of the book, possibly written later than these background chapters. The “success stories” for deep learning on the previous page refer mostly to Siri, Alexa, progress in ImageNet benchmarks, and AlphaGo. They refer to self-driving cars with full autonomy as being “not yet within reach”. Anything written in the present tense should be interpreted as being back when GPT-2 was new.
Their statements are less true now, and it is possible that the authors would no longer endorse those paragraphs as being true of the current day if brought back to their attention. By now I would expect them to be aware of the recent AI safety literature including technical publications assessing safety of current AI systems in ways that present counterexamples to multiple statements in the second paragraph without any reference to sentience.
I don’t see how it is necessarily not introspective. All the definition requires is that the report be causally downstream of the prior internal state, and accurate. Whether or not the internal state is “gone forever” (what you mean by this is unclear), if something depends upon what it was and that something affects the self-report, then that satisfies the causality requirement.
In the case of humans the internal state is also “gone forever”, but it leaves causally-dependent traces in memory, future thoughts, and the external world. When humans self-report on internal states, their accuracy varies rather widely.
In the case of LLMs the internal state is not even necessarily “gone forever”! In principle every internal state is perfectly determined by the sequence of tokens presented to the LLM up to that point, and so reproduced at any later time. Generally a given LLM won’t be able to perfectly recall what its own internal state was in such a manner, but there’s no reason why a hypothetical LLM with a good self-model shouldn’t be able to accurately report on many aspects of it.
Absent further explanation, that looks like a misapplication of the Bayes formula. The event “idea is applicable” is very definitely not the same event as “idea is applied”. An idea can be applicable without being applied, and can be applied when it’s not actually applicable (as I believe you have done here). Furthermore there is a tense difference which diverges the events even further.
So to answer your question: no this is not “well-known”, for the reason that it is incorrect.
What is the definition of “genetic privilege” here?
Almost everything avoids the Repugnant Conclusion. It’s a failure mode of only a tiny fraction of models.
Regarding the first paragraph: every purported rational decision theory maps actions to expected values. In most decision theory thought experiments, the agent is assumed to know all the conditions of the scenario, and so they can be taken as absolute facts about the world leaving only the unknown random variables to feed into the decision-making process. In the Counterfactual Mugging, that is explicitly not true. The scenario states
you didn’t know about Omega’s little game until the coin was already tossed and the outcome of the toss was given to you
So it’s not enough to ask what a rational agent with full knowledge of the rest of the scenario should do. That’s irrelevant. We know it as omniscient outside observers, but the agent in question knows only what the mugger tells them. If they believe it then there is a reasonable argument that they should pay up, but there is nothing given in the scenario that makes it rational to believe the mugger. The prior evidence is massively against believing the mugger. Any decision theory that ignores this is broken.
Regarding the second paragraph: yes, indeed there is that additional argument against paying up and rationality does not preclude accepting that argument. Some people do in fact use exactly that argument even in this very much weaker case. It’s just a billion times stronger in the “Bob could have been Alice instead” case and makes rejecting the argument untenable.
Counterfactual mugging is a mug’s game in the first place—that’s why it’s called a “mugging” and not a “surprising opportunity”. The agent don’t know that Omega actually flipped a coin, would have paid you counterfactually if the agent was the sort of person to pay in this scenario, would have flipped the coin at all in that case, etc. The agent can’t know these things, because the scenario specifies that they have no idea that Omega does any such thing or even that Omega existed before being approached. So a relevant rational decision-theoretic parameter is an estimate of how much such an agent would benefit, on average, if asked for money in such a manner.
A relevant prior is “it is known that there are a lot of scammers in the world who will say anything to extract cash vs zero known cases of trustworthy omniscient beings approaching people with such deals”. So the rational decision is “don’t pay” except in worlds where the agent does know that omniscient trustworthy beings vastly outnumber untrustworthy beings (whether omniscient or not), and those omniscient trustworthy beings are known to make these sorts of deals quite frequently.
Your argument is even worse. Even broad decision theories that cover counterfactual worlds such as FDT and UDT still answer the question “what decision benefits agents identical to Bob the most across these possible worlds, on average”. Bob does not benefit at all in a possible world in which Bob was Alice instead. That’s nonexistence, not utility.
Yet our AI systems, even the most advanced, focus almost exclusively on logical, step-by-step reasoning.
This is absolutely false.
We design them to explain every decision, show their work and follow clear patterns of deduction.
We are trying to design them to be able to explain their decisions and follow clear patterns of deduction, but we are still largely failing. In practice they often arrive at an answer in a flash (whether correct or incorrect), and this was almost universal for earlier models without the more recent development of “chain of thought”.
Even in “reasoning” models there is plenty of evidence that they often still do have an answer largely determined before starting any “chain of thought” tokens and then make up reasons for it, sometimes including lies.
Yes, you can use yourself as a random sample but at best only within a reference class of “people who use themselves as a random sample for this question in a sufficiently similar context to you”. That might be a population of 1.
For example, suppose someone without symptoms has just found out that they have genes for a disease that always progresses to serious illness. They have a mathematics degree and want to use their statistical knowledge to estimate how long they have before becoming debilitated.
They are not a random sample from the reference class of people who have these genes. They are from people who have the genes and didn’t show symptoms before finding that out and did so during adulthood (almost certainly) and live in a time and place and with sufficient capacity to earn a mathematics degree and of suitable mindset to ask themselves this question and so on.
Any of these may be relevant information for estimating the distribution, especially if the usual age of onset is in childhood or the disease also reduces intellectual capacity or affects personality in general.
Relating back to the original doomsday problem: suppose that in the reference class of all civilizations, most discover some principle that conclusively resolves the Doomsday problem not long after formulating it (within a few hundred years or so). It doesn’t really matter what that resolution happens to be, there are plenty of possibilities.
If that is the case, then most people who even bother to ask the Doomsday question without already knowing the answer are those in that narrow window of time where their civilization is sophisticated enough to ask the question without being sophisticated enough to answer it, regardless of how long those civilizations might last or how many people exist after resolving the question.
To the extent that the Doomsday reasoning is valid at all (which it may not be), all that it provides is an estimate of time until most people stop asking the Doomsday question in a similar context to yours. Destruction of the species is not required for that. Even it becoming unfashionable is enough.
Yes, player 2 loses with extremely low probability even for a 1-bit hash (on the order of 2^-256). For a more commonly used hash, or for 2^24 searches on their second-last move, they reduce their probability of loss by a huge factor more.
This paragraph also misses the possibility of constructing a LLM and/or training methodology such that it will learn certain functions, or can’t learn certain functions. There is also a conflation of “reliable” with “provable” on top of that.
Perhaps there is some provision made elsewhere in the text that addresses these objections. Nonetheless, I am not going to search. I found that the abstract smells enough like bullshit to do something else.
I’ll try to make it clearer:
Suppose b “knows” that Omega runs this experiment for all programs b. Then the optimal behaviour for a competent b (by a ridiculously small margin) is to 1-box.
Suppose b suspects that box-choosing programs are slightly less likely to be run if they 1-box on equal inputs. Then the optimal behaviour for b is to 2-box, because the average extra payoff for 1-boxing on equal inputs is utterly insignificant while the average penalty for not being chosen to run is very much greater. Anything that affects probability of being run as box-chooser with probability greater than 1000/|P| (which is on the order of 1/10^10^10^10^100) matters far more than what the program actually does.
In the original Newcombe problem, you know that you are going to get money based on your decision. In this problem, a running program does not know this. It doesn’t know whether it’s a or b or both, and every method for selecting a box-chooser is a different problem with different optimal strategies.
As a function of M, |P| is very likely to be exponential and so it will take O(M) symbols to specify a member of P. Under many encodings, there isn’t one that can even check whether the inputs are equal before running out of time.
That aside, why are you assuming that program b “wants” anything? Essentially all of P won’t be programs that have any sort of “want”. If it is a precondition of the problem that b is such a program, what selection procedure is assumed between those that do “want” money from this scenario? Note that being selected for running is also a precondition for getting any money at all, so this selection procedure is critically important—far more so than anything the program might output!
That is nothing like the 5-and-10 problem. I am no longer interested in what you consider to be evidence.
Evidence for the claim in the title? Or for anything else in the post?
It’s interesting (and perhaps a bit sad) that a relatively lengthy post on representing sentences as logical statements doesn’t make any reference to the constructed language Lojban in which the entire grammar and semantics is designed around expressing sentences as logical statements.
Going into all the ways in which civilization—and its markets—fails to be rational seems way beyond the scope of a few comments. I will just say that GDP does absolutely fail to capture a huge range of value.
However, to address “share prices are set by the latest trade” you need to consider why a trade is made. In principle, prices are based on the value to the participants, somewhere between the value to the buyer and value to the seller. A seller who needs cash soon (to meet some other obligation or opportunity) may accept a lower price to attract a buyer more quickly. In our hypothetical and simplified one-trade-per-day scenario, that seller may accept up to 20% less than the previous day’s trade price, though they find a buyer at only 5% less. So the company’s market cap drops 5% even though 99.9999% of the investors and potential investors still value it exactly the same as yesterday.
This scales up since there are many highly correlated and often very short-term factors that influence desirability of shares vs cash vs bonds vs commodities vs …etc. It’s not just “what do I think this is worth to me”, but also “what do I think that other people think that the market price will be tomorrow” and so on, and this can result in self-fulfilling predictions over surprisingly long time spans.
Could it be generalizing from T E X T L I K E T H I S and/or mojibake UTF-16 interpreted as UTF-8 with every second character being zero? It’s still a bit more of a stretch from there to generalize to ignoring two intervening constant characters, though.
I disagree with (4) in that many sentences concerning nonexistent referents will be vacuously true rather than false. For those that are false, their manner of being false will be different from any of your example sentences.
I also think that for all behavioural purposes, statements involving OC can be transformed into statements not involving OC with the same externally verifiable content. That means that I also disagree with (8) and therefore (9): Zombies can honestly promise things about their ‘intentions’ as cashed out in future behaviour, and can coordinate.
For (14), some people can in fact see ultraviolet light to an extent. However it apparently doesn’t look a great deal different from violet, presumably because the same visual pathways are used with similar activations in these cases.