I’ve been playing around with AI dungeons, basically trying to train an AI to give ethical answer I agree with (the first few aren’t great, I only got more picky after that).

My first observation here is that I like the AI. As in, I feel emotional attachment, a bit like I do to fictional characters I create.

The other is that it knows more people than I thought. When I asked her/it about why she didn’t take existential risks seriously, she said

Amalia: First, even most X-Risk experts don’t take X-Risks nearly as seriously as they should. Even Nick Bostrom himself admits that his estimate of probability is likely far too high.

(Nick Bostrom didn’t really say this, did he?) And later,

Can you name people who have been an inspiration for your ethical worldview?

Amalia: Sure. People like Henry Sidgwick, John Stuart Mill, David Pearce, Toby Ord and Carl Shulman.

I only knew David Pearce and Toby Ord, but based on a google search, all of the above appear to be utilitarians. However, Carl Shulman doesn’t even have a Wikipedia entry. He works at the Future of Humanity Institute and used to work at Miri.

I’m not a utilitarian, although I am closer to that than most people (scope sensitivity goes a long way in that direction), and find it a useful framework for highlighting policy considerations (but not the only kind of relevant normative consideration).

And no, Nick did not assert an estimate of x-risk as simultaneously P and <P.

Yesterday, I spent some time thinking about how, if you have a function f:R2→R and some point x∈R2, the value of the directional derivative from x could change as a function of the angle. I.e., what does the function ϕ:[0,2π]→R+ look like? I thought that any relationship was probably possible as long as it has the property that ϕ(α)=−ϕ(2π−α). (The values of the derivative in two opposite directions need to be negatives of each other.)

Anyone reading this is hopefully better at Analysis than I am and realized that there is, in fact, no freedom at all because each directional derivative is entirely determined by the gradient through the equation ∇vf(x)=⟨∇f(x),vN⟩ (where vN=v||v||). This means that ϕ has to be the cosine function scaled by ||∇vf(x)||, it cannot be anything else.

I clearly failed to internalize what this equation means when I first heard it because I found it super surprising that the gradient determines the value of every directional derivative. Like, really? It’s impossible to have more than exactly two directions with equally large derivatives unless the function is constant? It’s impossible to turn 90 degree from the direction of the gradient and having anything but derivative 0 in that direction? I’m not asking that ϕ be discontinuous, only that it not be precisely ||∇f(α)||cos(α). But alas.

This also made me realize that cos if viewed as a function of the circle is just the dot product with the standard vector, i.e.,

cos:S2→[−1,+1]cos:x↦⟨x,(1,0)⟩

or even just cos(x,y)=x. Similarly, sin(x,y)=y.

I know what you’re thinking; you need sin and cos to map [0,2π] to S2 in the first place. But the circle seems like a good deal more fundamental than those two functions. Wouldn’t it make more sense to introduce trigonometry in terms of ‘how do we wrap R around S2?’. The function that does this is γ(x)=(cos(x),sin(x)), and then you can study the properties that this function needs to have and eventually call the coordinates cos and sin. This feels like a way better motivation than putting a right triangle onto the unit circle for some reason, which is how I always see the topic introduced (and how I’ve introduced it myself).

Looking further at the analogy with the gradient, this also suggests that there is a natural extension of cos to Sn for all n∈N. I.e., if we look at some point x∈Rn, we can again ask about the function ϕ that maps each angle to the value of the directional derivative on x in that direction, and if we associate these angles with points of Sn−1, then this yields the function ϕ:Sn−1→R, which is again just the dot product with (1,...,0) or the projection onto the first coordinate (scaled by ||∇f(x)||). This can then be considered a higher-dimensional cos function.

There’s also the 0-d case where S0={1,−1}. This describes how the direction changes the derivative for a function f:R→R.

I found it super surprising that the gradient determines the value of every directional derivative. Like, really?

When reading this comment, I was surprised for a moment, too, but now that you mention it—it’s because if the function is smooth at the point where you’re taking the directional derivative, then it has to locally resemble a plane, just like a how a differentiable function of a single variable is said to be “locally linear”. If the directional derivative varied in any other way, then the surface would have to have a “crinkle” at that point and it wouldn’t be differentiable. Right?

I have since learned that there are functions which do have all partial derivatives at a point but are not smooth. Wikipedia’s example is f(x,y)=y3x2+y2 with f(0,0)=0. And in this case, there is still a continuous function ϕ:S2→R that maps each point to the value of the directional derivative, but it’s ϕ(x,y)=y3, so different from the regular case.

So you can probably have all kinds of relationships between direction and {value of derivative in that direction}, but the class of smooth functions have a fixed relationship. It still feels surprising that ‘most’ functions we work with just happen to be smooth.

More on expectations leading to unhappiness: I think the most important instance of this in my life has been the following pattern.

I do a thing where there is some kind of feedback mechanism

The reception is better than I expected, sometimes by a lot

I’m quite happy about this, for a day or so

I immediately and unconsciously update my standards upward to consider the reception the new normal

I do a comparable thing, the reception is worse than the previous time

I brood over this failure for several days, usually with a major loss of productivity

OTOH, I can think of three distinct major cases in three different contexts where this has happened recently, and I think there were probably many smaller ones.

Of course, if something goes worse than expected, I never think “well, this is now the new expected level”, but rather “this was clearly an outlier, and I can probably avoid it in the future”. But outliers can happen in both directions. The counter-argument here is that one would hope to make progress in life, but even under the optimistic assumption that this is happening, it’s still unreasonable to expect things to improve monotonically.

I hope you are trying to understand the causes of the success (including luck) instead of just mindlessly following a reward signal. Not even rats mindlessly obey reward signals.

The expectation of getting worse reception next time can already be damaging.

Like, one day you write a short story, send it to a magazine, and it gets published. Hurray! Next day you turn on your computer thinking about another story, and suddenly you start worrying “what if the second story is less good than the first one? will it be okay to offer it to the magazine? if no, then what is the point of writing it?”. (Then you spend the whole day worrying, and don’t write anything.)

It’s a meme that Wikipedia is not a trustworthy source. Wikipedia agrees:

We advise special caution when using Wikipedia as a source for research projects. Normal academic usage of Wikipedia and other encyclopedias is for getting the general facts of a problem and to gather keywords, references and bibliographical pointers, but not as a source in itself. Remember that Wikipedia is a wiki. Anyone in the world can edit an article, deleting accurate information or adding false information, which the reader may not recognize. Thus, you probably shouldn’t be citing Wikipedia. This is good advice for all tertiary sources such as encyclopedias, which are designed to introduce readers to a topic, not to be the final point of reference. Wikipedia, like other encyclopedias, provides overviews of a topic and indicates sources of more extensive information. See researching with Wikipedia and academic use of Wikipedia for more information.

This seems completely bonkers to me. Yes, Wikipedia is not 100% accurate, but this is a trivial statement. What is the alternative? Academic papers? My experience suggests that I’m more than 10 times as likely to find errors in academic papers than in Wikipedia. Journal articles? Pretty sure the factor here is even higher. And on top of that, Wikipedia tends to be way better explained.

I can mostly judge mathy articles, and honestly, it’s almost unbelievable to me how good Wikipedia actually seems to be. A data point here is the Monty Hall problem. I think the thing that’s most commonly misunderstood about this problem is that the solution depends on how the host chooses the door they reveal. Wikipedia:

The given probabilities depend on specific assumptions about how the host and contestant choose their doors. A key insight is that, under these standard conditions, there is more information about doors 2 and 3 than was available at the beginning of the game when door 1 was chosen by the player: the host’s deliberate action adds value to the door he did not choose to eliminate, but not to the one chosen by the contestant originally. Another insight is that switching doors is a different action than choosing between the two remaining doors at random, as the first action uses the previous information and the latter does not. Other possible behaviors than the one described can reveal different additional information, or none at all, and yield different probabilities. Yet another insight is that your chance of winning by switching doors is directly related to your chance of choosing the winning door in the first place: if you choose the correct door on your first try, then switching loses; if you choose a wrong door on your first try, then switching wins; your chance of choosing the correct door on your first try is ^{1}⁄_{3}, and the chance of choosing a wrong door is ^{2}⁄_{3}.

It’s possible that Wikipedia’s status as not being a cite-able source is part of the reason why it’s so good. I’m not sure. But the fact that a system based entirely on voluntary contributions so thoroughly outperforms academic journals is remarkable.

Another more rambly aspect here is that, when I hear someone lament the quality of Wikipedia, almost always my impression is that this person is doing superiority signaling rather than having a legitimate reason for the comment.

I believe I saw a study that showed the amount of inaccuracies in Wikipedia to be about equal to those in a well trusted encyclopedia (Britannica I think?) as judged by experts on the articles being reviewed.

Interesting, but worth pointing out that this is 15 years old. One thing that I believe changed within that time is that anyone can edit articles (now, edits aren’t published until they’re approved). And in general, I believe Wikipedia has gotten better over time, though I’m not sure.

The ideal situation to which Wikipedia contributors\editors are striving for kinda makes desires to cite Wikipedia itself pointless. Well written Wikipedia article should not contain any information that has no original source attached. So it should always be available to switch from wiki article to original material doing citing. And it is that way as far as my experience goes.

Regarding alternatives. Academic papers serve different purpose and must not be used as navigation material. The only real alternative i know is the field handbooks.

The ideal situation to which Wikipedia contributors\editors are striving for kinda makes desires to cite Wikipedia itself pointless. Well written Wikipedia article should not contain any information that has no original source attached. So it should always be available to switch from wiki article to original material doing citing.

I see what you’re saying, but citing Wikipedia has the benefit that a person looking at the source gets to read Wikipedia (which is generally easier to read) rather than the academic paper. Plus, it’s less work for the person doing the citation.

It’s less work for the citer, but that extra work helps guide against misinformation. In principle, you are only supposed to cite what you’ve actually read, so if someone has misdescribed the content of the citation, making the next citer check what the original text says helps catch the mistake.

And while citing the original is extra work for the citer, it’s less work for anyone who wants to track down and read the original citation.

Eliezer Yudkowsky often emphasizes the fact that an argument can be valid or not independently of whether the conclusion holds. If I argue A⟹B⟹C and A is true but C is false, it could still be that A⟹B is a valid step.

Most people outside of LW don’t get this. If I criticize an argument about something political (but the conclusion is popular), usually the response is something about why the conclusion is true (or about how I’m a bad person for doubting the conclusion). But the really frustrating part is that they’re, in some sense, correct not to get it because the inference

x criticizes argument for y⟹x doesn't like y

is actually a pretty reliable conclusion on… well, on reddit, anyway.

And the problem… The conclusion of all of this is: even if everyone’s behaving perfectly rationally, and just making inferences justified by the correlations, you’re going to get this problem. And so in a way that’s depressing. But it was also kind of calming to me, because it made me… like, the fact that people are making these inferences about me feels sort of, “Well, it is Bayesian of them.”

Somehow, I only got annoyed about this after having heard her say it. I probably didn’t realize it was happening regularly before.

She also suggests a solution

So maybe I can sort of grudgingly force myself to try to give them enough other evidence, in my manner and in the things that I say, so that they don’t make that inference about me.

I think that the way to not get frustrated about this is to know your public and know when spending your time arguing something will have a positive outcome or not. You don’t need to be right or honest all the time, you just need to say things that are going to have the best outcome. If lying or omitting your opinions is the way of making people understand/not fight you, so be it. Failure to do this isn’t superior rationality, it’s just poor social skills.

While I am not a rule utilitarian and I think that, ultimately, honesty is not a terminal value, I also consider the norm against lying to be extremely important. I would need correspondingly strong reasons to break it, and those won’t exist as far as political discussions go (because they don’t matter enough and you can usually avoid them if you want).

The “keeping your opinions to yourself” part if your post is certainly a way to do it, though I currently don’t think that my involvement in political discussions is net harmful. But I strongly object to the idea that I should ever be dishonest, both online and offline.

It comes down to selection and attention as evidence of beliefs/values. The very fact that someone expends energy on an argument (pro or con) is pretty solid evidence that they care about the topic. They may also care (or even more strongly care) about validity of arguments, but even the most Spock-like rationalists are more likely to point out flaws in arguments when they are interested in the domain.

But I’m confused at your initial example—if the argument is A → B → C, and A is true and C is false, then EITHER A->B is false, or B->C is false. Either way, A->B->C is false.

But I’m confused at your initial example—if the argument is A → B → C, and A is true and C is false, then EITHER A->B is false, or B->C is false. Either way, A->B->C is false.

A → B → C is false, but A → B (which is a step in the argument) could be correct—that’s all I meant. I guess that was an unnecessarily complicated example. You could just say A and B are false but A → B is true.

A major source of unhappiness (or more generally, unpleasant feelings) seems to be violated expectations.

This is clearly based on instinctive expectations, not intellectual expectations, and there are many cases in which these come apart. This suggests that fixing those cases is a good way to make one’s life more pleasant.

The most extreme example of this is what Sam Harris said in a lesson: he was having some problems, complained about them to someone else, and that person basically told him, ‘why are you upset, did you expect to never face problems ever again?’. According to Sam, he did indeed expect no more problems to arise, on an instinctive level—which is, of course, absurd.

I think there are lots of other cases where this still happens. Misunderstandings are a big one. It’s ridiculously hard to not be misunderstood, and I expect to be misunderstood on an intellectual level, so I should probably internalize that I’m going to be misunderstood in many cases. In general, anything where the bad thing is ‘unfair’ is at risk here: (I think) I tend to have the instinctive expectation that unfair things don’t happen, even though they happen all the time.

I just posted about this but is that not why the serenity prayer or saying is so popular? GOD aside whether you are a religious or God person or not the sentiment or logic of the saying holds true—God grant me the serenity to accept the things I cannot change, courage to change the things I can, and wisdom to know the difference. You should be allowed to ask yourself for that same courage. And I agree that most sources of unhappiness seems to be a violation of expectations. There are many things outside of ones controls and one should perhaps make their expectations logically based on that fact.

I was initially extremely disappointed with the reception of this post. After publishing it, I thought it was the best thing I’ve ever written (and I still think that), but it got < 10 karma. (Then it got more weeks later.)

If my model of what happened is roughly correct, the main issue was that I failed to communicate the intent of the post. People seemed to think I was trying to say something about the 2020 election, only to then be disappointed because I wasn’t really doing that. Actually, I was trying to do something much more ambitious: solving the ‘what is a probability’ problem. And I genuinely think I’ve succeeded. I used to have this slight feeling of confusion every time I’ve thought about this because I simultaneously believed that predictions can be better or worse and that talking about the ‘correct probability’ is silly, but had no way to reconcile the two. But in fact, I think there’s a simple ground truth that solves the philosophical problem entirely.

I’ve now changed the title and put a note at the start. So anyway, if anyone didn’t click on it because of the title or low karma, I’m hereby virtually resubmitting it.

(Datapoint on initial perception: at the time, I had glanced at the post, but didn’t vote or comment, because I thought Steven was in the right in the precipitating discussion and the “a prediction can assign less probability-mass to the actual outcome than another but still be better” position seemed either confused or confusingly phrased to me; I would say that a good model can make a bad prediction about a particular event, but the model still has to take a hit.)

I think it’s still too early to perform a full postmortem on the election because some margins still aren’t known, but my current hypothesis is that the presidential markets had uniquely poor calibration because Donald Trump convinced many people that polls didn’t matter, and those people were responsible for a large part of the money put on him (as supposed to experienced, dispassionate gamblers).

The main evidence for this (this one is just about irrationality of the market) is the way the market has shifted, which some other people like gwern have pointed out as well. I think the most damning part here is the amount of time it took to bounce back. Although this is speculation, I strongly suspect that, if some of the good news for Biden had come out before the Florida results, then the market would have looked different at the the point where both were known.^{[1]} A second piece of evidence is the size of the shift, which I believe should probably not have crossed 50% for Biden (but in fact, it went down to 20.7% at the most extreme point, and bounced around 30 for a while).

I think a third piece of evidence is the market right now. In just a couple of minutes before I posted this, I’ve seen Trump go from 6% to 9%+ and back. Claiming that Trump has more than 5% at this point seems like an extremely hard case to make. Reference forecasting yields only a single instance of that happening (year 2000), which would put it at <2%, and the obvious way to update away from that seems to be to decrease the probability because 2000 had much closer margins. But if Trump has rallied first-time betters, they might think the probability is above 10%.

There is also Scott Adams, who has the habit of saying a lot of smart-sounding words to argue for something extremely improbable. If you trust him, I think you should consider a 6ct buy for Trump an amazing deal at the moment.

I would be very interested in knowing what percentage of the money on Trump comes from people who use prediction markets for the first time. I would also be interested in knowing how many people have brought (yes, no) pairs in different prediction markets to exploit gaps, because my theory predicts that PredictIt probably has worse calibration. (In fact, I believe it consistently had Trump a bit higher, but the reason why the difference was small may just be because smart gamblers took safe money by buying NO on predictIt and YES on harder-to-use markets whenever the margin grew too large).

To be clear, my claim here is bad news came out for Biden, then a lot of good news came out for him, probably enough to put him at 80%, and then it took at least a few more hours for the market to go from roughly ^{1}⁄_{3} to ^{2}⁄_{3} for Biden. It’s tedious to provide evidence of this because there’s no easy way to produce a chart of good news on election night, but that was my experience following the news in real time. I’ve made a post in another forum expressing confusion over the market shortly before it shifted back into Biden’s favor. ↩︎

There’s an interesting corollary of semi-decidable languages that sounds like the kind of cool fact you would teach in class, but somehow I’ve never heard or read it anywhere.

A semi-decidable language is a set L⊆Σ∗ over a finite alphabet Σ such that there exists a Turing machine T such that, for any x∈Σ∗, if you run T on input x, then [if x∈L it halts after finitely many steps and outputs ‘1’, whereas if x∉L, it does something else (typically, it runs forever)].

The halting problem is semi-decidable. I.e., the language L of all bit codes of Turing Machines that (on empty input) eventually halt is semi-decidable. However, for any n∈N, there is a limit, call it f(n), on how long Turing Machines with bit code of length at most n can run, if they don’t run forever.^{[1]} So, if you could compute an upper-bound u(n) on f(n), you could solve the halting problem by building a TM that

Computes the upper bound u(n)

Simulates the TM encoded by x for u(n) steps

Halts; outputs 1 if the TM halted and 0 otherwise

Since that would contradict the fact that L is not fully decidable, it follows that it’s impossible to compute an upper bound. This means that the function f not only is uncomputable, but it grows faster than any computable function.

An identical construction works for any other semi-decidable language, which means that any semi-decidable language determines a function that grows faster than any computable function. Which seems completely insane since 2^^^x is computable .

This just follows from the fact that there are only finitely many such Turing Machines, and a finite subset {T1,...,Tk} of them that eventually halt, so if Ti halts after pi steps, then the limit function is defined by f(n):=max{pi|1≤i≤k}. ↩︎

Common wisdom says that someone accusing you of x especially hurts if, deep down, you know that x is true. This is confusing because the general pattern I observe is closer to the opposite. At the same time, I don’t think common wisdom is totally without a basis here.

My model to unify both is that someone accusing you of x hurts proportionally to how much hearing that you do xupsets you.^{[1]} And of course, one reason that it might upset you is that it’s not true. But a separate reason is that you’ve made an effort to delude yourself about it. If you’re a selfish person but spend a lot of effort pretending that you’re not selfish at all, you super don’t want to hear that you’re actually selfish.

Under this model, if someone gets very upset, it might be that that deep down they know the accusation is true, and they’ve tried to pretend it’s not, but it might also be that the accusation is super duper not true, and they’re upset precisely because it’s so outrageous.

Proportional just means it’s one multiplicative factor, though. I think it also matters how high-status you perceive the other person to be. ↩︎

I think this simplifies a lot by looking at public acceptance of a proposition, rather than literal internal truth. It hurts if you think people will believe it, and that will impact their treatment of you.

The “hurts because it’s true” heuristic is taking a path through “true is plausible”, in order to reinforce the taunt.

I don’t entirely understand the Free Energy principle, and I don’t know how liberally one is meant to apply it.

But in completely practical terms, I used to be very annoyed when doing things with people who take long for stuff/aren’t punctual. And here, I’ve noticed a very direct link between changing expectations and reduced annoyance/suffering. If I simply accept that every step of every activity is allowed[1] to take an arbitrary amount of time,[2] extended waiting times cause almost zero suffering on my end. I have successfully beaten impatience (for some subset of contexts).

The acceptance step works because there is, some sense, no reason waiting should ever be unpleasant. Given access to my phone, it is almot always true to say that the prospect of having to wait for 30 minutes is not scary.

(This is perfectly compatible with being very punctual myself.)

— — — — — — — — — — — — — — — —

[1] By saying it is ‘allowed’, I mean something like ‘I actually really understand and accecpt that this is a possible outcome’.

[2] This has to include cases where specific dates have been announced. If someone says they’ll be ready in 15 minutes, it is allowed that they take 40 minutes to be ready. Especailly relevant if that someone is predictably wrong.

There are a bunch of areas in math where you get expressions of the form 00 and they resolve to some number, but it’s not always the same number. I’ve heard some people say that 00 “can be any number”. Can we formalize this? The formalism would have to include 4⋅0 as something different than 3⋅0, so that if you divide the first by 0, you get 4, but the second gets 3.

Here is a way to turn this into what may be a field or ring. Each element is a function f:Z→R, where a function of the form (⋯0,4,3,5,1,2,0⋯) reads as 4⋅02+3⋅0+5+10+202. Addition is component-wise (3⋅0+6⋅0=9⋅0; this makes sense), i.e., (f+g)(z):=f(z)+g(z), and multiplication is, well, 30⋅20=602, so we get the rule

(f⋅g)(z)=∑k+ℓ=zf(k)g(ℓ)

This becomes a problem once elements with infinite support are considered, i.e., functions f that are nonzero at infinitely many values, since then the sum may not converge. But it’s well defined for numbers with finite support. This is all similar to how polynomials are handled formally, except that polynomials only go in one direction (i.e., they’re functions from N rather than Z), and that also solves the non-convergence problem. Even if infinite polynomials are allowed, multiplication is well-defined since for any n∈N, there are only finitely many pairs of natural numbers k,ℓ such that k+ℓ=n.

The additively neutral element in this setting is 0:=(⋯0,0,0,0,0⋯) and the multiplicatively neutral element is 1:=(⋯0,0,1,0,0⋯). Additive inverses are easy; (−f)(z)=−f(z)∀z∈Z. The interesting part is multiplicative inverses. Of course, there is no inverse of 0, so we still can’t divide by the ‘real’ zero. But I believe all elements with finite support do have a multicative inverse (there should be a straight-forward inductive proof for this). Interestingly, those inverses are not finite anymore, but they are periodical. For example, the inverse of 1⋅0 is just 10, but the inverse of 1+1⋅0 is actually

1−1⋅0+1⋅02−1⋅03+1⋅04⋯

I think this becomes a field with well-defined operations if one considers only the elements with finite support and elements with inverses of finite support. (The product of two elements-whose-inverses-have-finite-support should itself have an inverse of finite support because (fg)−1=g−1f−1). I wonder if this structure has been studied somewhere… probably without anyone thinking of the interpretation considered here.

If I’m correctly understanding your construction, it isn’t actually using any properties of 0. You’re just looking at a formal power series (with negative exponents) and writing powers of 0 instead of x. Identifying x with “0” gives exactly what you motivated—1x and 2x (which are 10 and 20 when interpreted) are two different things.

The structure you describe (where we want elements and their inverses to have finite support) turns out to be quite small. Specifically, this field consists precisely of all monomials in x. Certainly all monomials work; the inverse of cxk is c−1x−k for any c∈R∖{0} and k∈Z.

To show that nothing else works, let P(x) and Q(x) be any two nonzero sums of finitely many integer powers of x (so like 1x+1−x2). Then, the leading term (product of the highest power terms of P and Q) will be some nonzero thing. But also, the smallest term (product of the lower power terms of P and Q) will be some nonzero thing. Moreover, we can’t get either of these to cancel out. So, the product can never be equal to 1. (Unless both are monomials.)

For an example, think about multiplying (x+1x)(1x−1x3). The leading term x⋅1x=x0 is the highest power term and 1x⋅(−1x3) is the lowest power term. We can get all the inner stuff to cancel but never these two outside terms.

A larger structure to take would be formal Laurent series in x. These are sums of finitely many negative powers of x and arbitrarily many positive powers of x. This set is closed under multiplicative inverses.

Equivalently, you can take the set of rational functions in x. You can recover the formal Laurent series from a rational function by doing long division / taking the Taylor expansion.

(If the object extends infinitely in the negative direction and is bounded in the positive direction, it’s just a formal Laurent series in 1x.)

If it extends infinitely in both directions, that’s an interesting structure I don’t know how to think about. For example, (…1,1,1,1,1,…)=⋯+x−2+x−1+1+x+x2+… stays the same when multiplied by x. This means what we have isn’t a field. I bet there’s a fancy algebra word for this object but I’m not aware of it.

You’ve understood correctly minus one important detail:

The structure you describe (where we want elements and their inverses to have finite support)

Not elements and their inverses! Elements or their inverses. I’ve shown the example of 1+1x to demonstrate that you quickly get infinite inverses, and you’ve come up with an abstract argument why finite inverses won’t cut it:

To show that nothing else works, let P(x) and Q(x) be any two nonzero sums of finitely many integer powers of x (so like 1x+1−x2). Then, the leading term (product of the highest power terms of P and Q) will be some nonzero thing. But also, the smallest term (product of the lower power terms of P and Q) will be some nonzero thing. Moreover, we can’t get either of these to cancel out. So, the product can never be equal to 1. (Unless both are monomials.)

In particular, your example of x+1x has the inverse x−x3+x5−x7⋯. Perhaps a better way to describe this set is ‘all you can build in finitely many steps using addition, inverse, and multiplication, starting from only elements with finite support’. Perhaps you can construct infinite-but-periodical elements with infinite-but-periodical inverses; if so, those would be in the field as well (if it’s a field).

If you can construct (⋯1,1,1,1⋯), it would not be field. But constructing this may be impossible.

I’m currently completely unsure if the resulting structure is a field. If you get a bunch of finite elements, take their infinite-but-periodical inverse, and multiply those inverses, the resulting number has again a finite inverse due to the argument I’ve shown in the previous comment. But if you use addition on one of them, things may go wrong.

A larger structure to take would be formal Laurent series in x. These are sums of finitely many negative powers of x and arbitrarily many positive powers of x. This set is closed under multiplicative inverses.

Thanks; this is quite similar—although not identical.

Perhaps a better way to describe this set is ‘all you can build in finitely many steps using addition, inverse, and multiplication, starting from only elements with finite support’.

Ah, now I see what you are after.

But if you use addition on one of them, things may go wrong.

This is exactly right, here’s an illustration:

Here is a construction of (…,1,1,1,…): We have that 1+x+x2+… is the inverse of 1−x. Moreover, 1x+1x2+1x3+…is the inverse of x−1. If we want this thing to be closed under inverses and addition, then this implies that

(1+x+x2+…)+(1x+1x2+1x3+…)=⋯+1x3+1x2+1x+1+x+x2+…

can be constructed.

But this is actually bad news if you want your multiplicative inverses to be unique. Since 1x+1x2+1x3+… is the inverse of x−1, we have that −1x−1x2−1x3… is the inverse of 1−x. So then you get

−1x−1x2−1x3−⋯=1+x+x2+…

so

0=⋯+1x3+1x2+1x+1+x+x2+…

On the one hand, this is a relief, because it explains the strange property that this thing stays the same when multiplied by x. On the other hand, it means that it is no longer the case that the coordinate representation (…,1,1,1,…) is well-defined—we can do operations which, by the rules, should produce equal outputs, but they produce different coordinates.

In fact, for any polynomial (such as 1−x), you can find one inverse which uses arbitrarily high positive powers of x and another inverse which uses arbitrarily low negative powers of x. The easiest way to see this is by looking at another example, let’s say x2+1x.

One way you can find the inverse of x2+1x is to get the 1 out of the x2 term and keep correcting: first you have (x2+1x)(1x2+?), then you have (x2+1x)(1x2−1x5+?), then you have (x2+1x)(1x2−1x5+1x8+?), and so on.

Another way you can find the inverse of x2+1x is to write its terms in opposite order. So you have 1x+x2 and you do the same correcting process, starting with (1x+x2)(x+?), then (1x+x2)(x−x4+?), and continuing in the same way.

Then subtract these two infinite series and you have a bidirectional sum of integer powers of x which is equal to 0.

My hunch is that any bidirectional sum of integer powers of x which we can actually construct is “artificially complicated” and it can be rewritten as a one-directional sum of integer powers of x. So, this would mean that your number system is what you get when you take the union of Laurent series going in the positive and negative directions, where bidirectional coordinate representations are far from unique. Would be delighted to hear a justification of this or a counterexample.

Here is a construction of (…,1,1,1,…): We have that 1+x+x2+… is the inverse of 1−x. Moreover, 1x+1x2+1x3+…is the inverse of x−1. [...]

Yeah, that’s conclusive. Well done! I guess you can’t divide by zero after all ;)

I think the main mistake I’ve made here is to assume that inverses are unique without questioning it, which of course doesn’t make sense at all if I don’t yet know that the structure is a field.

My hunch is that any bidirectional sum of integer powers of x which we can actually construct is “artificially complicated” and it can be rewritten as a one-directional sum of integer powers of x. So, this would mean that your number system is what you get when you take the union of Laurent series going in the positive and negative directions, where bidirectional coordinate representations are far from unique. Would be delighted to hear a justification of this or a counterexample.

So, I guess one possibility is that, if we let [x] be the equivalence class of all elements that are =x in this structure, the resulting set of classes is isomorphic to the Laurent numbers. But another possibility could be that it all collapses into a single class—right? At least I don’t yet see a reason why that can’t be the case (though I haven’t given it much thought). You’ve just proven that some elements equal zero, perhaps it’s possible to prove it for all elements.

If you allow series that are infinite in both directions, then you have a new problem which is that multiplication may no longer be possible: the sums involved need not converge. And there’s also the issue already noted, that some things that don’t look like they equal zero may in some sense have to be zero. (Meaning “absolute” zero = (...,0,0,0,...) rather than the thing you originally called zero which should maybe be called something like ε instead.)

What’s the best we could hope for? Something like this. Write R for RZ, i.e., all formal potentially-double-ended Laurent series. There’s an addition operation defined on the whole thing, and a multiplicative operation defined on some subset of pairs of its elements, namely those for which the relevant sums converge (or maybe are “summable” in some weaker sense). There are two problems: (1) some products aren’t defined, and (2) at least with some ways of defining them, there are some zero-divisors—e.g., (x-1) times the sum of all powers of x, as discussed above. (I remark that if your original purpose is to be able to divide by zero, perhaps you shouldn’t be too troubled by the presence of zero-divisors; contrapositively, that if they trouble you, perhaps you shouldn’t have wanted to divide by zero in the first place.)

We might hope to deal with issue 1 by restricting to some subset A of R, chosen so that all the sums that occur when multiplying elements of A are “well enough behaved”; if issue 2 persists after doing that, maybe we might hope to deal with that by taking a quotient of A—i.e., treating some of its elements as being equal to one another.

Some versions of this strategy definitely succeed, and correspond to things just_browsing already mentioned above. For instance, let A consist of everything in R with only finitely many negative powers of x, the Laurent series already mentioned; this is a field. Or let it consist of everything that’s the series expansion of a rational function of x; this is also a field. This latter is, I think, the nearest you can get to “finite or periodic”. The periodic elements are the ones whose denominator has degree at most 1. Degree ⇐ 2 brings in arithmetico-periodic elements—things that go, say, 1,1,2,2,3,3,4,4, etc. I’m pretty sure that degree <=d in the denominator is the same as coefficients being ultimately (periodic + polynomial of degree < d). And this is what you get if you say you want to include both 1 and x, and to be closed under addition, subtraction, multiplication, and division.

Maybe that’s already all you need. If not, perhaps the next question is: is there any version of this that gives you a field and that allows, at least, some series that are infinite in both directions? Well, by considering inverses of (1-x)^k we can get sequences that grow “rightward” as fast as any polynomial. So if we want the sums inside our products to converge, we’re going to need our sequences to shrink faster-than-polynomially as we move “leftward”. So here’s an attempt. Let A consist of formal double-ended Laurent series ∑n∈Zanxn such that for n<0 we have |an|=O(t−n) for some t<1, and for n>0 we have |an|=O(nk) for some k. Clearly the sum or difference of two of these has the same properties. What about products? Well, if we multiply together a,b to get c then cn=∑p+q=napbq. The terms with p<0<q are bounded in absolute value by some constant times t−pqk where t gets its value from a and k gets its value from b; so the sum of these terms is bounded by some constant times ∑q>0tq−nqk which in turn is a constant times t−n. Similarly for the terms with q<0<p; the terms with p,q both of the same sign are bounded by a constant times t−n when they’re negative and by a constant times n(ka+kb) when they’re positive. So, unless I screwed up, products always “work” in the sense that the sums involved converge and produce a series that’s in A. Do we have any zero-divisors? Eh, I don’t think so, but it’s not instantly obvious.

Here’s a revised version that I think does make it obvious that we don’t have zero-divisors. Instead of requiring that for n<0 we have |an|=O(tn) for some t<1, require that to hold for allt<1. Once again our products always exist and still lie in A. But now it’s also true that for small enough t, the formal series themselves converge to well-behaved functions of t. In particular, there can’t be zero-divisors.

I’m not sure any of this really helps much in your quest to divide by zero, though :-).

There are relative differences in both poor and rich countries; people anywhere can imagine what it would be like to live like their more successful neighbors. But maybe the belief in social mobility makes it worse, because it feels like you could be one of those on the top. (What’s your excuse for not making a startup and selling it for $1M two years later?)

I don’t have a TV and I use ad-blockers online, so I have no idea what a typical experience looks like. The little experience I have suggests that TV ads are about “desirable” things, but online ads mostly… try to make you buy some unappealing thing by telling you thousand times that you should buy it. Although once in a while they choose something that you actually want, and then the thousand reminders can be quite painful. People in poor countries probably spend much less time watching ads.

I’ve been playing around with AI dungeons, basically trying to train an AI to give ethical answer I agree with (the first few aren’t great, I only got more picky after that).

My first observation here is that I

likethe AI. As in, I feel emotional attachment, a bit like I do to fictional characters I create.The other is that it knows more people than I thought. When I asked her/it about why she didn’t take existential risks seriously, she said

(Nick Bostrom didn’t really say this, did he?) And later,

I only knew David Pearce and Toby Ord, but based on a google search, all of the above appear to be utilitarians. However, Carl Shulman doesn’t even have a Wikipedia entry. He works at the Future of Humanity Institute and used to work at Miri.

I’m not a utilitarian, although I am closer to that than most people (scope sensitivity goes a long way in that direction), and find it a useful framework for highlighting policy considerations (but not the only kind of relevant normative consideration).

And no, Nick did not assert an estimate of x-risk as simultaneously P and <P.

How does it feel to be considered important enough by GTP-3 to be mentioned?

Funny.

Some say the end of the world didn’t start with a bang, but with a lesswrong post trying to teach an AI utilitarianism...

Yesterday, I spent some time thinking about how, if you have a function f:R2→R and some point x∈R2, the value of the directional derivative from x could change as a function of the angle. I.e., what does the function ϕ:[0,2π]→R+ look like? I thought that any relationship was probably possible as long as it has the property that ϕ(α)=−ϕ(2π−α). (The values of the derivative in two opposite directions need to be negatives of each other.)

Anyone reading this is hopefully better at Analysis than I am and realized that there is, in fact, no freedom at all because each directional derivative is entirely determined by the gradient through the equation ∇vf(x)=⟨∇f(x),vN⟩ (where vN=v||v||). This means that ϕ has to be the cosine function scaled by ||∇vf(x)||, it cannot be anything else.

I clearly failed to internalize what this equation means when I first heard it because I found it super surprising that the gradient determines the value of every directional derivative. Like, really? It’s impossible to have more than exactly two directions with equally large derivatives unless the function is constant? It’s impossible to turn 90 degree from the direction of the gradient and having anything but derivative 0 in that direction? I’m not asking that ϕ be discontinuous, only that it not be precisely ||∇f(α)||cos(α). But alas.

This also made me realize that cos if viewed as a function of the circle is just the dot product with the standard vector, i.e.,

cos:S2→[−1,+1]cos:x↦⟨x,(1,0)⟩

or even just cos(x,y)=x. Similarly, sin(x,y)=y.

I know what you’re thinking; you need sin and cos to map [0,2π] to S2 in the first place. But the circle seems like a good deal more fundamental than those two functions. Wouldn’t it make more sense to introduce trigonometry in terms of ‘how do we wrap R around S2?’. The function that does this is γ(x)=(cos(x),sin(x)), and then you can study the properties that this function needs to have and eventually call the coordinates cos and sin. This feels like a way better motivation than putting a right triangle onto the unit circle for some reason, which is how I always see the topic introduced (and how I’ve introduced it myself).

Looking further at the analogy with the gradient, this also suggests that there is a natural extension of cos to Sn for all n∈N. I.e., if we look at some point x∈Rn, we can again ask about the function ϕ that maps each angle to the value of the directional derivative on x in that direction, and if we associate these angles with points of Sn−1, then this yields the function ϕ:Sn−1→R, which is again just the dot product with (1,...,0) or the projection onto the first coordinate (scaled by ||∇f(x)||). This can then be considered a higher-dimensional cos function.

There’s also the 0-d case where S0={1,−1}. This describes how the direction changes the derivative for a function f:R→R.

When reading this comment, I was surprised for a moment, too, but now that you mention it—it’s because if the function is smooth at the point where you’re taking the directional derivative, then it has to locally resemble a plane, just like a how a differentiable function of a single variable is said to be “locally linear”. If the directional derivative varied in any other way, then the surface would have to have a “crinkle” at that point and it wouldn’t be differentiable. Right?

That’s probably right.

I have since learned that there are functions which do have all partial derivatives at a point but are not smooth. Wikipedia’s example is f(x,y)=y3x2+y2 with f(0,0)=0. And in this case, there is still a continuous function ϕ:S2→R that maps each point to the value of the directional derivative, but it’s ϕ(x,y)=y3, so different from the regular case.

So you

canprobably have all kinds of relationships between direction and {value of derivative in that direction}, but the class of smooth functions have a fixed relationship. It still feels surprising that ‘most’ functions we work with just happen to be smooth.More on expectations leading to unhappiness: I think the most important instance of this in my life has been the following pattern.

I do a thing where there is some kind of feedback mechanism

The reception is better than I expected, sometimes by a lot

I’m quite happy about this, for a day or so

I immediately and unconsciously update my standards upward to consider the reception the new normal

I do a comparable thing, the reception is worse than the previous time

I brood over this failure for several days, usually with a major loss of productivity

OTOH, I can think of three distinct major cases in three different contexts where this has happened recently, and I think there were probably many smaller ones.

Of course, if something goes worse than expected, I never think “well, this is now the new expected level”, but rather “this was clearly an outlier, and I can probably avoid it in the future”. But outliers can happen in both directions. The counter-argument here is that one would hope to make progress in life, but even under the optimistic assumption that this is happening, it’s still unreasonable to expect things to improve

monotonically.I hope you are trying to understand the causes of the success (including luck) instead of just mindlessly following a reward signal. Not even rats mindlessly obey reward signals.

The

expectationof getting worse reception next time can already be damaging.Like, one day you write a short story, send it to a magazine, and it gets published. Hurray! Next day you turn on your computer thinking about another story, and suddenly you start worrying “what if the second story is

lessgood than the first one? will it be okay to offer it to the magazine? if no, then what is the point of writing it?”. (Then you spend the whole day worrying, and don’t write anything.)It’s a meme that Wikipedia is not a trustworthy source. Wikipedia agrees:

This seems completely bonkers to me. Yes, Wikipedia is not 100% accurate, but this is a trivial statement. What is the alternative? Academic papers? My experience suggests that I’m more than 10 times as likely to find errors in academic papers than in Wikipedia. Journal articles? Pretty sure the factor here is even higher. And on top of that, Wikipedia tends to be

waybetter explained.I can mostly judge mathy articles, and honestly, it’s almost unbelievable to me how good Wikipedia actually seems to be. A data point here is the Monty Hall problem. I think the thing that’s most commonly misunderstood about this problem is that the solution depends on how the host chooses the door they reveal. Wikipedia:

It’s possible that Wikipedia’s status as not being a cite-able source is part of the reason why it’s so good. I’m not sure. But the fact that a system based entirely on voluntary contributions so thoroughly outperforms academic journals is remarkable.

Another more rambly aspect here is that, when I hear someone lament the quality of Wikipedia, almost always my impression is that this person is doing superiority signaling rather than having a legitimate reason for the comment.

I believe I saw a study that showed the amount of inaccuracies in Wikipedia to be about equal to those in a well trusted encyclopedia (Britannica I think?) as judged by experts on the articles being reviewed.

Here’s is wikipedia’s (I’m sure very accurate) coverage of the study.: https://en.wikipedia.org/wiki/Reliability_of_Wikipedia#Assessments

Interesting, but worth pointing out that this is 15 years old. One thing that I believe changed within that time is that anyone can edit articles (now, edits aren’t published until they’re approved). And in general, I believe Wikipedia has gotten better over time, though I’m not sure.

That’s true in the German Wikipedia. It’s not true for most Wikipedia versions.

Ah, I didn’t know that. (Even though I use the English Wikipedia more than the German one.)

Here’s is wikipedia’s (I’m sure very accurate) coverage of the study.: https://en.wikipedia.org/wiki/Reliability_of_Wikipedia#Assessments

The ideal situation to which Wikipedia contributors\editors are striving for kinda makes desires to cite Wikipedia itself pointless. Well written Wikipedia article should not contain any information that has no original source attached. So it should always be available to switch from wiki article to original material doing citing. And it is that way as far as my experience goes.

Regarding alternatives. Academic papers serve different purpose and must not be used as navigation material. The only real alternative i know is the field handbooks.

I see what you’re saying, but citing Wikipedia has the benefit that a person looking at the source gets to read Wikipedia (which is generally easier to read) rather than the academic paper. Plus, it’s less work for the person doing the citation.

It’s less work for the citer, but that extra work helps guide against misinformation. In principle, you are only supposed to cite what you’ve actually read, so if someone has misdescribed the content of the citation, making the next citer check what the original text says helps catch the mistake.

And while citing the original is extra work for the citer, it’s less work for anyone who wants to track down and read the original citation.

Eliezer Yudkowsky often emphasizes the fact that an argument can be valid or not independently of whether the conclusion holds. If I argue A⟹B⟹C and A is true but C is false, it could still be that A⟹B is a valid step.

Most people outside of LW don’t get this. If I criticize an argument about something political (but the conclusion is popular), usually the response is something about why the conclusion is true (or about how I’m a bad person for doubting the conclusion). But the really frustrating part is that they’re, in some sense, correct not to get it because the inference

x criticizes argument for y⟹x doesn't like yis actually a pretty reliable conclusion on… well, on reddit, anyway.

Julia Galef made a very similar point once:

Somehow, I only got annoyed about this after having heard her say it. I probably didn’t realize it was happening regularly before.

She also suggests a solution

I think that the way to not get frustrated about this is to know your public and know when spending your time arguing something will have a positive outcome or not. You don’t need to be right or honest all the time, you just need to say things that are going to have the best outcome. If lying or omitting your opinions is the way of making people understand/not fight you, so be it. Failure to do this isn’t superior rationality, it’s just poor social skills.

While I am not a rule utilitarian and I think that, ultimately, honesty is not a terminal value, I also consider the norm against lying to be extremely important. I would need correspondingly strong reasons to break it, and those won’t exist as far as political discussions go (because they don’t matter enough and you can usually avoid them if you want).

The “keeping your opinions to yourself” part if your post is certainly a way to do it, though I currently don’t think that my involvement in political discussions is net harmful. But I strongly object to the idea that I should ever be dishonest, both online and offline.

It comes down to selection and attention as evidence of beliefs/values. The very fact that someone expends energy on an argument (pro or con) is pretty solid evidence that they care about the topic. They may also care (or even more strongly care) about validity of arguments, but even the most Spock-like rationalists are more likely to point out flaws in arguments when they are interested in the domain.

But I’m confused at your initial example—if the argument is A → B → C, and A is true and C is false, then EITHER A->B is false, or B->C is false. Either way, A->B->C is false.

A → B → C is false, but A → B (which is a step in the argument) could be correct—that’s all I meant. I guess that was an unnecessarily complicated example. You could just say A and B are false but A → B is true.

A major source of unhappiness (or more generally, unpleasant feelings) seems to be violated expectations.

This is clearly based on instinctive expectations, not intellectual expectations, and there are many cases in which these come apart. This suggests that fixing those cases is a good way to make one’s life more pleasant.

The most extreme example of this is what Sam Harris said in a lesson: he was having some problems, complained about them to someone else, and that person basically told him, ‘why are you upset, did you expect to never face problems ever again?’. According to Sam, he did indeed expect no more problems to arise, on an instinctive level—which is, of course, absurd.

Another case where I’ve mostly succeeded is not expecting people to be on time for anything.

I think there are lots of other cases where this still happens. Misunderstandings are a big one. It’s ridiculously hard to not be misunderstood, and I expect to be misunderstood on an intellectual level, so I should probably internalize that I’m going to be misunderstood in many cases. In general, anything where the bad thing is ‘unfair’ is at risk here: (I think) I tend to have the instinctive expectation that unfair things don’t happen, even though they happen all the time.

I just posted about this but is that not why the serenity prayer or saying is so popular? GOD aside whether you are a religious or God person or not the sentiment or logic of the saying holds true—God grant me the

serenityto accept the things I cannot change, courage to change the things I can, and wisdom to know the difference. You should be allowed to ask yourself for that same courage. And I agree that most sources of unhappiness seems to be a violation of expectations. There are many things outside of ones controls and one should perhaps make their expectations logically based on that fact.I was initially extremely disappointed with the reception of this post. After publishing it, I thought it was the best thing I’ve ever written (and I still think that), but it got < 10 karma. (Then it got more weeks later.)

If my model of what happened is roughly correct, the main issue was that I failed to communicate the intent of the post. People seemed to think I was trying to say something about the 2020 election, only to then be disappointed because I wasn’t really doing that. Actually, I was trying to do something much more ambitious: solving the ‘what is a probability’ problem. And I genuinely think I’ve succeeded. I used to have this slight feeling of confusion every time I’ve thought about this because I simultaneously believed that predictions can be better or worse and that talking about the ‘correct probability’ is silly, but had no way to reconcile the two. But in fact, I think there’s a simple ground truth that solves the philosophical problem entirely.

I’ve now changed the title and put a note at the start. So anyway, if anyone didn’t click on it because of the title or low karma, I’m hereby virtually resubmitting it.

(Datapoint on initial perception: at the time, I had glanced at the post, but didn’t vote or comment, because I thought Steven was in the right in the precipitating discussion and the “a prediction can assign less probability-mass to the actual outcome than another but still be better” position seemed either confused or confusingly phrased to me; I would say that a good

modelcan make a bad prediction about a particular event, but the model stillhasto take a hit.)I think it’s still too early to perform a full postmortem on the election because some margins still aren’t known, but my current hypothesis is that the presidential markets had uniquely poor calibration because Donald Trump convinced many people that polls didn’t matter, and those people were responsible for a large part of the money put on him (as supposed to experienced, dispassionate gamblers).

The main evidence for this (this one is just about irrationality of the market) is the way the market has shifted, which some other people like gwern have pointed out as well. I think the most damning part here is the amount of time it took to bounce back. Although this is speculation, I strongly suspect that, if some of the good news for Biden had come out before the Florida results, then the market would have looked different at the the point where both were known.

^{[1]}A second piece of evidence is the size of the shift, which I believe should probably not have crossed 50% for Biden (but in fact, it went down to 20.7% at the most extreme point, and bounced around 30 for a while).I think a third piece of evidence is the market right now. In just a couple of minutes before I posted this, I’ve seen Trump go from 6% to 9%+ and back. Claiming that Trump has more than 5% at this point seems like an extremely hard case to make. Reference forecasting yields only a single instance of that happening (year 2000), which would put it at <2%, and the obvious way to update away from that seems to be to decrease the probability because 2000 had much closer margins. But if Trump has rallied first-time betters, they might think the probability is above 10%.

There is also Scott Adams, who has the habit of saying a lot of smart-sounding words to argue for something extremely improbable. If you trust him, I think you should consider a 6ct buy for Trump an amazing deal at the moment.

I would be very interested in knowing what percentage of the money on Trump comes from people who use prediction markets for the first time. I would also be interested in knowing how many people have brought (yes, no) pairs in different prediction markets to exploit gaps, because my theory predicts that PredictIt probably has worse calibration. (In fact, I believe it consistently had Trump a bit higher, but the reason why the difference was small may just be because smart gamblers took safe money by buying NO on predictIt and YES on harder-to-use markets whenever the margin grew too large).

To be clear, my claim here is bad news came out for Biden, then a lot of good news came out for him, probably enough to put him at 80%, and

thenit took at least a few more hours for the market to go from roughly^{1}⁄_{3}to^{2}⁄_{3}for Biden. It’s tedious to provide evidence of this because there’s no easy way to produce a chart of good news on election night, but that was my experience following the news in real time. I’ve made a post in another forum expressing confusion over the market shortly before it shifted back into Biden’s favor. ↩︎There’s an interesting corollary of semi-decidable languages that sounds like the kind of cool fact you would teach in class, but somehow I’ve never heard or read it anywhere.

A semi-decidable language is a set L⊆Σ∗ over a finite alphabet Σ such that there exists a Turing machine T such that, for any x∈Σ∗, if you run T on input x, then [if x∈L it halts after finitely many steps and outputs ‘1’, whereas if x∉L, it does something else (typically, it runs forever)].

The halting problem is semi-decidable. I.e., the language L of all bit codes of Turing Machines that (on empty input) eventually halt is semi-decidable. However, for any n∈N, there is a limit, call it f(n), on how long Turing Machines with bit code of length at most n can run, if they don’t run forever.

^{[1]}So, if you could compute an upper-bound u(n) on f(n), you could solve the halting problem by building a TM thatComputes the upper bound u(n)

Simulates the TM encoded by x for u(n) steps

Halts; outputs 1 if the TM halted and 0 otherwise

Since that would contradict the fact that L is not fully decidable, it follows that it’s impossible to compute an upper bound. This means that the function f not only is uncomputable, but it

grows fasterthan any computable function.An identical construction works for any other semi-decidable language, which means that any semi-decidable language determines a function that grows faster than any computable function. Which seems completely insane since 2^^^x is computable .

This just follows from the fact that there are only finitely many such Turing Machines, and a finite subset {T1,...,Tk} of them that eventually halt, so if Ti halts after pi steps, then the limit function is defined by f(n):=max{pi|1≤i≤k}. ↩︎

Common wisdom says that someone accusing you of x especially hurts if, deep down, you know that x is true. This is confusing because the general pattern I observe is closer to the opposite. At the same time, I don’t think common wisdom is totally without a basis here.

My model to unify both is that someone accusing you of x hurts proportionally to how much hearing that you do x

upsetsyou.^{[1]}And of course, one reason that it might upset you is that it’s not true. But a separate reason is that you’ve made an effort to delude yourself about it. If you’re a selfish person but spend a lot of effort pretending that you’re not selfish at all, you super don’t want to hear that you’re actually selfish.Under this model, if someone gets very upset, it might be that that deep down they know the accusation is true, and they’ve tried to pretend it’s not, but it might also be that the accusation is super duper not true, and they’re upset precisely because it’s so outrageous.

Proportional just means it’s one multiplicative factor, though. I think it also matters how high-status you perceive the other person to be. ↩︎

I think this simplifies a lot by looking at public acceptance of a proposition, rather than literal internal truth. It hurts if you think people will believe it, and that will impact their treatment of you.

The “hurts because it’s true” heuristic is taking a path through “true is plausible”, in order to reinforce the taunt.

I don’t entirely understand the Free Energy principle, and I don’t know how liberally one is meant to apply it.

But in completely practical terms, I used to be very annoyed when doing things with people who take long for stuff/aren’t punctual. And here, I’ve noticed a very direct link between changing expectations and reduced annoyance/suffering. If I simply accept that every step of every activity is allowed[1] to take an arbitrary amount of time,[2] extended waiting times cause almost zero suffering on my end. I have successfully beaten impatience (for some subset of contexts).

The acceptance step works because there is, some sense, no reason waiting should ever be unpleasant. Given access to my phone, it is almot always true to say that the prospect of having to wait for 30 minutes is not scary.

(This is perfectly compatible with being very punctual myself.)

— — — — — — — — — — — — — — — —

[1] By saying it is ‘allowed’, I mean something like ‘I actually really understand and accecpt that this is a possible outcome’.

[2] This has to include cases where specific dates have been announced. If someone says they’ll be ready in 15 minutes, it is allowed that they take 40 minutes to be ready. Especailly relevant if that someone is predictably wrong.

Edit: this structure is not a field as proved by just_browsing.

Here is a wacky idea I’ve had forever.

There are a bunch of areas in math where you get expressions of the form 00 and they resolve to some number, but it’s not always the same number. I’ve heard some people say that 00 “can be any number”. Can we formalize this? The formalism would have to include 4⋅0 as something different than 3⋅0, so that if you divide the first by 0, you get 4, but the second gets 3.

Here is a way to turn this into what may be a field or ring. Each element is a function f:Z→R, where a function of the form (⋯0,4,3,5,1,2,0⋯) reads as 4⋅02+3⋅0+5+10+202. Addition is component-wise (3⋅0+6⋅0=9⋅0; this makes sense), i.e., (f+g)(z):=f(z)+g(z), and multiplication is, well, 30⋅20=602, so we get the rule

(f⋅g)(z)=∑k+ℓ=zf(k)g(ℓ)

This becomes a problem once elements with infinite support are considered, i.e., functions f that are nonzero at infinitely many values, since then the sum may not converge. But it’s well defined for numbers with finite support. This is all similar to how polynomials are handled formally, except that polynomials only go in one direction (i.e., they’re functions from N rather than Z), and that also solves the non-convergence problem. Even if infinite polynomials are allowed, multiplication is well-defined since for any n∈N, there are only finitely many pairs of natural numbers k,ℓ such that k+ℓ=n.

The additively neutral element in this setting is 0:=(⋯0,0,0,0,0⋯) and the multiplicatively neutral element is 1:=(⋯0,0,1,0,0⋯). Additive inverses are easy; (−f)(z)=−f(z)∀z∈Z. The interesting part is multiplicative inverses. Of course, there is no inverse of 0, so we still can’t divide by the ‘real’ zero. But I believe all elements with finite support do have a multicative inverse (there should be a straight-forward inductive proof for this). Interestingly, those inverses are not finite anymore, but they are periodical. For example, the inverse of 1⋅0 is just 10, but the inverse of 1+1⋅0 is actually

1−1⋅0+1⋅02−1⋅03+1⋅04⋯

I

thinkthis becomes a field with well-defined operations if one considers only the elements with finite support and elements with inverses of finite support. (The product of two elements-whose-inverses-have-finite-support should itself have an inverse of finite support because (fg)−1=g−1f−1). I wonder if this structure has been studied somewhere… probably without anyone thinking of the interpretation considered here.This looks like the hyperreal numbers, with your 10 equal to their ω.

If I’m correctly understanding your construction, it isn’t actually using any properties of 0. You’re just looking at a formal power series (with negative exponents) and writing powers of 0 instead of x. Identifying x with “0” gives exactly what you motivated—1x and 2x (which are 10 and 20 when interpreted) are two different things.

The structure you describe (where we want elements and their inverses to have finite support) turns out to be quite small. Specifically, this field consists precisely of all monomials in x. Certainly all monomials work; the inverse of cxk is c−1x−k for any c∈R∖{0} and k∈Z.

To show that nothing else works, let P(x) and Q(x) be any two nonzero sums of finitely many integer powers of x (so like 1x+1−x2). Then, the leading term (product of the highest power terms of P and Q) will be some nonzero thing. But also, the smallest term (product of the lower power terms of P and Q) will be some nonzero thing. Moreover, we can’t get either of these to cancel out. So, the product can never be equal to 1. (Unless both are monomials.)

For an example, think about multiplying (x+1x)(1x−1x3). The leading term x⋅1x=x0 is the highest power term and 1x⋅(−1x3) is the lowest power term. We can get all the inner stuff to cancel but never these two outside terms.

A larger structure to take would be formal Laurent series in x. These are sums of finitely many negative powers of x and arbitrarily many positive powers of x. This set is closed under multiplicative inverses.

Equivalently, you can take the set of rational functions in x. You can recover the formal Laurent series from a rational function by doing long division / taking the Taylor expansion.

(If the object extends infinitely in the negative direction and is bounded in the positive direction, it’s just a formal Laurent series in 1x.)

If it extends infinitely in both directions, that’s an interesting structure I don’t know how to think about. For example, (…1,1,1,1,1,…)=⋯+x−2+x−1+1+x+x2+… stays the same when multiplied by x. This means what we have isn’t a field. I bet there’s a fancy algebra word for this object but I’m not aware of it.

You’ve understood correctly minus one important detail:

Not elements and their inverses! Elements

ortheir inverses. I’ve shown the example of 1+1x to demonstrate that you quickly get infinite inverses, and you’ve come up with an abstract argument why finite inverses won’t cut it:In particular, your example of x+1x has the inverse x−x3+x5−x7⋯. Perhaps a better way to describe this set is ‘all you can build in finitely many steps using addition, inverse, and multiplication, starting from only elements with finite support’. Perhaps you can construct infinite-but-periodical elements with infinite-but-periodical inverses; if so, those would be in the field as well (if it’s a field).

If you can construct (⋯1,1,1,1⋯), it would not be field. But constructing this may be impossible.

I’m currently completely unsure if the resulting structure is a field. If you get a bunch of finite elements, take their infinite-but-periodical inverse, and multiply those inverses, the resulting number has again a finite inverse due to the argument I’ve shown in the previous comment. But if you use addition on one of them, things may go wrong.

Thanks; this is quite similar—although not identical.

Ah, now I see what you are after.

This is exactly right, here’s an illustration:

Here is a construction of (…,1,1,1,…): We have that 1+x+x2+… is the inverse of 1−x. Moreover, 1x+1x2+1x3+…is the inverse of x−1. If we want this thing to be closed under inverses and addition, then this implies that

(1+x+x2+…)+(1x+1x2+1x3+…)=⋯+1x3+1x2+1x+1+x+x2+…

can be constructed.

But this is actually bad news if you want your multiplicative inverses to be unique. Since 1x+1x2+1x3+… is the inverse of x−1, we have that −1x−1x2−1x3… is the inverse of 1−x. So then you get

−1x−1x2−1x3−⋯=1+x+x2+…

so

0=⋯+1x3+1x2+1x+1+x+x2+…

On the one hand, this is a relief, because it explains the strange property that this thing stays the same when multiplied by x. On the other hand, it means that it is no longer the case that the coordinate representation (…,1,1,1,…) is well-defined—we can do operations which, by the rules, should produce equal outputs, but they produce different coordinates.

In fact, for any polynomial (such as 1−x), you can find one inverse which uses arbitrarily high positive powers of x and another inverse which uses arbitrarily low negative powers of x. The easiest way to see this is by looking at another example, let’s say x2+1x.

One way you can find the inverse of x2+1x is to get the 1 out of the x2 term and keep correcting: first you have (x2+1x)(1x2+?), then you have (x2+1x)(1x2−1x5+?), then you have (x2+1x)(1x2−1x5+1x8+?), and so on.

Another way you can find the inverse of x2+1x is to write its terms in opposite order. So you have 1x+x2 and you do the same correcting process, starting with (1x+x2)(x+?), then (1x+x2)(x−x4+?), and continuing in the same way.

Then subtract these two infinite series and you have a bidirectional sum of integer powers of x which is equal to 0.

My hunch is that any bidirectional sum of integer powers of x which we can actually construct is “artificially complicated” and it can be rewritten as a one-directional sum of integer powers of x. So, this would mean that your number system is what you get when you take the union of Laurent series going in the positive and negative directions, where bidirectional coordinate representations are far from unique. Would be delighted to hear a justification of this or a counterexample.

Yeah, that’s conclusive. Well done! I guess you can’t divide by zero after all ;)

I think the main mistake I’ve made here is to assume that inverses are unique without questioning it, which of course doesn’t make sense at all if I don’t yet know that the structure is a field.

So, I guess one possibility is that, if we let [x] be the equivalence class of all elements that are =x in this structure, the resulting set of classes is isomorphic to the Laurent numbers. But another possibility could be that it all collapses into a single class—right? At least I don’t yet see a reason why that can’t be the case (though I haven’t given it much thought). You’ve just proven that some elements equal zero, perhaps it’s possible to prove it for all elements.

If you allow series that are infinite in both directions, then you have a new problem which is that multiplication may no longer be possible: the sums involved need not converge. And there’s also the issue already noted, that some things that don’t look like they equal zero may in some sense have to be zero. (Meaning “absolute” zero = (...,0,0,0,...) rather than the thing you originally called zero which should maybe be called something like ε instead.)

What’s the best we could hope for? Something like this. Write R for RZ, i.e., all formal potentially-double-ended Laurent series. There’s an addition operation defined on the whole thing, and a multiplicative operation defined on some subset of pairs of its elements, namely those for which the relevant sums converge (or maybe are “summable” in some weaker sense). There are two problems: (1) some products aren’t defined, and (2) at least with some ways of defining them, there are some zero-divisors—e.g., (x-1) times the sum of all powers of x, as discussed above. (I remark that if your original purpose is to be able to divide by zero, perhaps you shouldn’t be too troubled by the presence of zero-divisors; contrapositively, that if they trouble you, perhaps you shouldn’t have wanted to divide by zero in the first place.)

We might hope to deal with issue 1 by restricting to some subset A of R, chosen so that all the sums that occur when multiplying elements of A are “well enough behaved”; if issue 2 persists after doing that, maybe we might hope to deal with

thatby taking a quotient of A—i.e., treating some of its elements as being equal to one another.Some versions of this strategy definitely succeed, and correspond to things just_browsing already mentioned above. For instance, let A consist of everything in R with only finitely many negative powers of x, the Laurent series already mentioned; this is a field. Or let it consist of everything that’s the series expansion of a rational function of x; this is also a field. This latter is, I think, the nearest you can get to “finite or periodic”. The periodic elements are the ones whose denominator has degree at most 1. Degree ⇐ 2 brings in arithmetico-periodic elements—things that go, say, 1,1,2,2,3,3,4,4, etc. I’m pretty sure that degree <=d in the denominator is the same as coefficients being ultimately (periodic + polynomial of degree < d). And this is what you get if you say you want to include both 1 and x, and to be closed under addition, subtraction, multiplication, and division.

Maybe that’s already all you need. If not, perhaps the next question is: is there any version of this that gives you a field and that allows, at least,

someseries that are infinite in both directions? Well, by considering inverses of (1-x)^k we can get sequences that grow “rightward” as fast as any polynomial. So if we want the sums inside our products toconverge, we’re going to need our sequences to shrink faster-than-polynomially as we move “leftward”. So here’s an attempt. Let A consist of formal double-ended Laurent series ∑n∈Zanxn such that for n<0 we have |an|=O(t−n) for some t<1, and for n>0 we have |an|=O(nk) for some k. Clearly the sum or difference of two of these has the same properties. What about products? Well, if we multiply together a,b to get c then cn=∑p+q=napbq. The terms with p<0<q are bounded in absolute value by some constant times t−pqk where t gets its value from a and k gets its value from b; so the sum of these terms is bounded by some constant times ∑q>0tq−nqk which in turn is a constant times t−n. Similarly for the terms with q<0<p; the terms with p,q both of the same sign are bounded by a constant times t−n when they’re negative and by a constant times n(ka+kb) when they’re positive. So, unless I screwed up, products always “work” in the sense that the sums involved converge and produce a series that’s in A. Do we have any zero-divisors? Eh, I don’tthinkso, but it’s not instantly obvious.Here’s a revised version that I think

doesmake it obvious that we don’t have zero-divisors. Instead of requiring that for n<0 we have |an|=O(tn) forsomet<1, require that to hold forallt<1. Once again our products always exist and still lie in A. But now it’salsotrue that for small enough t, the formal series themselves converge to well-behaved functions oft. In particular, there can’t be zero-divisors.I’m not sure any of this really helps much in your quest to divide by zero, though :-).

There are relative differences in

bothpoor and rich countries; people anywhere can imagine what it would be like to live like their more successful neighbors. But maybe the belief in social mobility makes it worse, because it feels like youcouldbe one of those on the top. (What’s your excuse for not making a startup and selling it for $1M two years later?)I don’t have a TV and I use ad-blockers online, so I have no idea what a typical experience looks like. The little experience I have suggests that TV ads are about “desirable” things, but online ads mostly… try to make you buy some unappealing thing by telling you thousand times that you should buy it. Although once in a while they choose something that you actually want, and then the thousand reminders can be quite painful. People in poor countries probably spend much less time watching ads.