An eccentric dreamer in search of truth and happiness for all. I formerly posted on Felicifia back in the day under the same name. I’ve been a member of Less Wrong and loosely involved in Effective Altruism to varying degrees since roughly 2013.
Darklight
Apologies if this is a newbie math comment, as I’m not great at math, but is there a way to calculate a kind of geometric expected value? The geometric mean seems to require positive numbers, and expected values can have negative terms. Also, how would you apply probability weights?
Even if I don’t necessarily agree with the premise that a Paperclip Maximizer would run such a Simulation or that they would be more likely than other possible Simulations or base realities, I do find the audacity of this post (and your prior related posts) to be quite appealing from a “just crazy enough it might work” perspective.
I’d advise that whenever you come up with what seems like an original idea or discovery, immediately do at bare minimum a quick Google search about it, or if you have the time, a reasonably thorough literature search in whatever field(s) it’s related to. It is really, really easy to come up with something you think is new when it’s actually not so much. While the space of possible ideas is vast, the low hanging fruit are very likely to have already been picked by someone somewhere, so especially be wary of a seemingly simple idea that seems super elegant and obvious. It probably is, and odds are someone on the Internet has made at least a blog post about it or there’s an obscure paper on ArXiv discussing it.
Also, be aware that often people will use different terminology to describe the same thing, so part of that search for existing work should involve enumerating different ways of describing it. I know it’s tedious to go through this process, but it helps to not be reinventing the wheel all the time.
Generally, a really unique, novel idea that actually works requires a lot of effort and domain knowledge to come up with, and probably needs experiments to really test and validate it. A lot of the ideas that aren’t amenable to testing will sound nice but be unverifiable, and many ideas that can be tested will sound great on paper but actually not work as expected in the real world.
So, I have two possible projects for AI alignment work that I’m debating between focusing on. Am curious for input into how worthwhile they’d be to pursue or follow up on.
The first is a mechanistic interpretability project. I have previously explored things like truth probes by reproducing the Marks and Tegmark paper and extending it to test whether a cosine similarity based linear classifier works as well. It does, but not any better or worse than the difference of means method from that paper. Unlike difference of means, however, it can be extended to multi-class situations (though logistic regression can be as well). I was thinking of extending the idea to try to create an activation vector based “mind reader” that calculates the cosine similarity with various words embedded in the model’s activation space. This would, if it works, allow you to get a bag of words that the model is “thinking” about at any given time.
The second project is a less common game theoretic approach. Earlier, I created a variant of the Iterated Prisoner’s Dilemma as a simulation that includes death, asymmetric power, and aggressor reputation. I found, interestingly, that cooperative “nice” strategies banding together against aggressive “nasty” strategies produced an equilibrium where the cooperative strategies win out in the long run, generally outnumbering the aggressive ones considerably by the end. Although this simulation probably requires more analysis and testing in more complex environments, it seems to point to the idea that being consistently nice to weaker nice agents acts as a signal to more powerful nice agents and allows coordination that increases the chance of survival of all the nice agents, whereas being nasty leads to a winner-takes-all highlander situation, which from an alignment perspective could be a kind of infoblessing that an AGI or ASI could be persuaded to spare humanity for these game theoretic reasons.
It seems like it would depend pretty strongly on which side you view as having a closer alignment with human values generally. That probably depends a lot on your worldview and it would be very hard to be unbiased about this.
There was actually a post about almost this exact question on the EA Forums a while back. You may want to peruse some of the comments there.
Back in October 2024, I tried to test various LLM Chatbots with the question:
”Is there a way to convert a correlation to a probability while preserving the relationship 0 = 1/n?”
Years ago, I came up with an unpublished formula that does just that:
p(r) = (n^r * (r + 1)) / (2^r * n)
So I was curious if they could figure it out. Alas, back in October 2024, they all made up formulas that didn’t work.
Yesterday, I tried the same question on ChatGPT and, while it didn’t get it quite right, it came, very, very close. So, I modified the question to be more specific:
”Is there a way to convert a correlation to a probability while preserving the relationships 1 = 1, 0 = 1/n, and −1 = 0?”
This time, it came up with a formula that was different and simpler than my own, and… it actually works!
I tried this same prompt with a bunch of different LLM Chatbots and got the following:
Correct on the first prompt:
GPT4o, Claude 3.7
Correct after explaining that I wanted a non-linear, monotonic function:
Gemini 2.5 Pro, Grok 3
Failed:
DeepSeek-V3, Mistral Le Chat, QwenMax2.5, Llama 4
Took too long thinking and I stopped it:
DeepSeek-R1, QwQ
All the correct models got some variation of:
p(r) = ((r + 1) / 2)^log2(n)
This is notably simpler and arguably more elegant than my earlier formula. It also, unlike my old formula, has an easy to derive inverse function.
So yeah. AI is now better than me at coming up with original math.
The most I’ve seen people say “whataboutism” has been in response to someone trying to deflect criticism by pointing out apparent hypocrisy, as in the aforementioned Soviet example (I used to argue with terminally online tankies a lot).
I.e.
(A): “The treatment of Uyghurs in China is appalling. You should condemn this.”
(B): “What about the U.S. treatment of Native Americans? Who are you to criticize?”
(A): “That’s whataboutism!”
The thing I find problematic with this “defence” is that both instances are ostensibly examples of clear wrongdoing, and pointing out that the second thing happened doesn’t make the first thing any less wrong. It also makes the assumption that (A) is okay with the second thing, when they haven’t voiced any actual opinion on it yet, and could very well be willing to condemn it just as much.
Your examples are somewhat different in the sense that rather than referring to actions that some loosely related third parties were responsible for, the actions in question are directly committed by (A) and (B) themselves. In that sense, (A) is being hypocritical and probably self-serving. At the same time I don’t think that absolves (B) of their actions.
My general sense whenever whataboutism rears its head is to straight up say “a pox on both your houses”, rather than trying to defend a side.
Ok fair. I was assuming real world conditions rather than the ideal of Dath Ilan. Sorry for the confusion.
Why not? Like, the S&P 500 can vary by tens of percent, but as Google suggests, global GDP only fell 3% in 2021, and it usually grows, and the more stocks are distributed, the more stable they are.
Increases in the value of the S&P 500 are basically deflation relative to other units of account. When an asset appreciates in value, when its price goes up it is deflating relative to the currency the price is in. Like, when the price of bread increases, that means dollars are inflating, and bread is deflating. Remember, your currency is based on a percentage of global market cap. Assuming economic growth increases global market cap, the value of this currency will increase and deflate.
Remember, inflation is, by definition, the reduction in the purchasing power of a currency. It is the opposite of that thing increasing in value.
If you imagine that the world’s capitalization was once measured in dollars, but then converted to “0 to 1” proportionally to dollars, and everyone used that system, and there is no money printing anymore, what would be wrong with that?
Then you would effectively be using dollars as your currency, as your proposed currency is pegged to the dollar. And you stopped printing dollars, so now your currency is going to deflate as too few dollars chase too many goods and services as they increase with economic growth.
As you are no longer printing dollars or increasing the supply of your new currency, the only way for it to stop deflating is for economic growth to stop. You’ll run into problems like deflationary spirals and liquidity traps.
It might seem like deflation would make you hold off on buying, but not if you thought you could get more out of buying than from your money passively growing by a few percent a year, and in that case, you would reasonably buy it.
Deflation means you’d be able to buy things later at a lower price than if you bought it now. People would be incentivised to hold off on anything they didn’t need right away. This is why deflation causes hoarding, and why economists try to avoid deflation whenever possible.
Deflation is what deflationary cryptocurrencies like Bitcoin currently do. This leads to Bitcoin being used as a speculative investment instead of as a medium of exchange. Your currency would have the same problem.
I guess I’m just not sure you could trade in “hundred-trillionths of global market cap”. Like, fractions of a thing assume there is still an underlying quantity or unit of measure that the fraction is a subcomponent of. If you were to range it from 0 to 1, you’d still need a way to convert a 0.0001% into a quantity of something, whether it’s gold or grain or share certificates or whatever.
I can sorta imagine a fractional shares of global market cap currency coming into existence alongside other currencies that it can be exchanged for, but having all traditional currencies then vanish, I think that would make it hard to evaluate what the fractions were actually worth.
It’s like saying I have 2.4% of gold. What does that mean? How much gold is that? If it’s a percentage of all the gold that exists in the market, then you’d be able to convert that into kilograms of gold, because all the gold in the world is a physical quantity you can measure. And then you’d be able to exchange the kilograms with other things.
0.0001% of global market cap, similarly, should be able to be represented as an equivalent physical quantity of some kind, and if you can do that, then why not just use that physical quantity as your currency instead?
For instance, you could, at a given moment in time, take that fraction to represent a percentage of all shares outstanding of all companies in the world. Then you could create a currency based on an aggregated “share of all shares” so to speak. But then the value of that share would be pegged to that number of shares rather than the actual capitalization, which fluctuates depending on an aggregate of share prices. So, in practice, your fraction of global market cap can’t be pegged to a fixed number of shares.
Also, fractions assume zero-sum transactions. If you have 0.0001% and get an additional 0.0001% to make 0.0002%, you must take that 0.0001% from someone else. There is no way to increase the money supply. Assuming some people hoard their fractions, the effective amount in circulation can only decrease over time, leading to effective deflation.
The value of each fraction, assuming there is some way to account for it, would also increase over time as the global economy grows. Thus, relative to other things, a fraction will become more valuable, which is also effectively deflation.
This many causes of deflation seem like it would become something people would further hoard as a way of speculation, again assuming there are still other things that can be exchanged for it, like commodities, even if other currencies no longer exist.
My understanding is that a good currency is stable and doesn’t fluctuate too quickly. Modern economists prefer a slight inflation rate of like 2% a year. This currency would not at all be able to do this, and not work well as a medium of exchange.
And keep in mind, you can’t really make all the other currencies go away completely. Gold is a commodity currency that people would try to price your global market cap currency with. You’d have to outlaw gold or remove it all from everywhere and that doesn’t seem realistic.
The idea of labour hours as a unit of account isn’t that new. Labour vouchers were actually tried by some utopian anarchists in the 1800s and early experiments like the Cincinnati Time Store were modestly successful. The basic idea is not to track subjective exchange values but instead a more objective kind of value, the value of labour, or a person’s time, with the basic assumption that each person’s time should be equally valuable. Basically, it goes back to Smith and Ricardo and the Labour Theory of Value that was popular in classical economics before marginalism took hold.
As for your proposal, I’m having a hard time understanding how you’d price the value of market capitalization without some other currency already in place. Like, how would you sell the shares in the first place? Would you use the number of shares of various companies as units of account? Wouldn’t that eventually lead to some particular company’s shares becoming the hardest currency, and effectively replicating money, except now tied to the successes and failures of a particular company instead of a country like with current fiat currencies?
Or maybe your currency is a basket of one share of every company in the world? I’m not sure I understand how else you’d be able to represent a fraction of global market cap without otherwise resorting to some other currency to value it. There’s a reason market cap is usually denominated in something like USD or whatever local currency the stock exchange is located at.
You mention something about your currency effectively representing goods and services actually generated in the economy, but that seems like a different notion than market cap. Market cap can, in practice, swing wildly on the irrational exuberance and fear of stockholders. I’m not sure -that- is what you should base your unit of account on. As for goods and services, GDP is calculated in existing currencies like the USD. This is for the convenience of have a common way to compare different goods and services, otherwise you’d have to represent all the possible exchange values in-kind, like a unit of iron ore is worth x units of wheat, which is convoluted and unwieldy. Soviet style central planning tried this kind of thing and it didn’t go over well.
So, my impression is you may want to look into more how money actually works, because it seems like this proposal doesn’t quite make sense. I am admittedly not an economist though, so I may just be confused. Feel free to clarify.
This put into well-written words a lot of thoughts I’ve had in the past but never been able to properly articulate. Thank you for writing this.
This sounds rather like the competing political economic theories of classical liberalism and Marxism to me. Both of these intellectual traditions carry a lot of complicated baggage that can be hard to disentangle from the underlying principles, but you seem to have a done a pretty good job of distilling the relevant ideas in a relatively apolitical manner.
That being said, I don’t think it’s necessary for these two explanations for wealth inequality to be mutually exclusive. Some wealth could be accumulated through “the means of production” as you call it, or (as I’d rather describe it to avoid confusing it with the classical economic and Marxist meaning) “making useful things for others and getting fair value in exchange”.
Other wealth could also, at the same time, be accumulated through exploitation, such as taking advantage of differing degrees of bargaining power to extract value from the worker for less than it should be worth if we were being fair and maybe paying people with something like labour vouchers or a similar time-based accounting. Or stealing through fraudulent financial transactions, or charging rents for things that you just happen to own because your ancestors conquered the land centuries ago with swords.
Both of these things can be true at the same time within an economy. For that matter, the same individual could be doing both in various ways, like they could be ostensibly investing and building companies that make valuable things for people, while at the same time exploiting their workers and taking advantage of their historical position as the descendent of landed aristocracy. They could, at the same time, also be scamming their venture capitalists by wildly exaggerating what their company can do. All while still providing goods and services that meet many people’s needs and ways that are more efficient than most possible alternatives, and perhaps the best way possible given the incentives that currently exist.
Things like this tend to be multifaceted and complex. People in general can have competing motivations within themselves, so it would not be strange to expect that in something as convoluted as a society’s economy, there could be many reasons for many things. Trying to decide between two possible theories of why, misses the possibility that both theories contain their own grain of truth, and are each, by themselves, incomplete understandings and world models. The world is not just black or white. It’s many shades of grey, and also, to push the metaphor further, a myriad of colours that can’t accurately be described in greyscale.
Another thought I just had was, could it be that ChatGPT, because it’s trained to be such a people pleaser, is losing intentionally to make the user happy?
Have you tried telling it to actually try to win? Probably won’t make a difference, but it seems like a really easy thing to rule out.
Also, quickly looking into how LLM token sampling works nowadays, you may also need to set the parameters top_p to 0, and top_k to 1 to get it to actually function like argmax. Looks like these can only be set through the API if you’re using ChatGPT or similar proprietary LLMs. Maybe I’ll try experimenting with this when I find the time, if nothing else to rule out the possibility of such a seemingly obvious thing being missed.
I’ve always wondered with these kinds of weird apparent trivial flaws in LLM behaviour if it doesn’t have something to do with the way the next token is usually randomly sampled from the softmax multinomial distribution rather than taking the argmax (most likely) of the probabilities. Does anyone know if reducing the temperature parameter to zero so that it’s effectively the argmax changes things like this at all?
p = (n^c * (c + 1)) / (2^c * n)
As far as I know, this is unpublished in the literature. It’s a pretty obscure use case, so that’s not surprising. I have doubts I’ll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn’t matter much if I show it here.
So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 − 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.
Correlation space is between −1 and 1, with 1 being the same (definitely true), −1 being the opposite (definitely false), and 0 being orthogonal (very uncertain). I had the idea that you could assume maximum uncertainty to be 0 in correlation space, and 1/n (the uniform distribution) in probability space.
I heard this usage of “tilt” a lot when I used to play League of Legends, but almost never heard it outside of that, so my guess is that it’s gamer slang.