infinatary

I can’t find this term anywhere else, and “infinitary” means something else so I think it’s not a typo, is it your own neologism? If not, where is it from?

Karma: 364

infinatary

I can’t find this term anywhere else, and “infinitary” means something else so I think it’s not a typo, is it your own neologism? If not, where is it from?

Partial answer: I input Solomonoff (2009), “Algorithmic Probability: Theory and Applications” into ChatGPT and Claude, ansking the following question:

The attached article describes Solomonoff induction. Does it answer the question if there’s always an UTM that induces a Solomonoff prior that approximates arbitrarily well a given positive distribution?

ChatGPT annoyingly waffles back what’s in the article, as it often did when I tried it on such things in the past. Claude instead is concise and precise. Complete answers follow.

Claude:

No, the article does not directly address the question of whether there always exists a universal Turing machine (UTM) that induces a Solomonoff prior that approximates arbitrarily well a given positive distribution.

The article focuses on describing Solomonoff induction and algorithmic probability more generally. It discusses properties like completeness, incomputability, diversity, and subjectivity of the Solomonoff prior, as well as applications to sequence prediction and grammar induction.

But it does not analyze the specific question of whether for any positive distribution there exists a choice of UTM that makes the corresponding Solomonoff prior arbitrarily close to that distribution. The article does state that the choice of reference machine affects the Solomonoff prior, and that by modifying the computer language it is possible to “express any conceivable a priori probability distributions.” However, it does not formally prove that any positive distribution can be approximated arbitrarily well.

So in summary, while the article provides good background on Solomonoff induction, it does not directly address or prove the specific claim in the question. The universality of approximating arbitrary positive distributions with a Solomonoff prior by choosing the right UTM is not explicitly analyzed.

ChatGPT:

The article “Algorithmic Probability—Theory and Applications” by Ray J. Solomonoff discusses the concept of Algorithmic Probability (ALP) and its properties, including its completeness and incomputability. ALP is described as a method that is theoretically capable of discovering any regularity in a dataset using a relatively small sample. This feature, referred to as the completeness of ALP, implies that it can approximate the true probabilities of symbols in data generated by any unknown probabilistic source P, with a very small error rate. The error in probability approaches zero more rapidly than 1/n, indicating rapid convergence to correct probabilities. This convergence holds for any P describable by a computer program, including many functions that are formally incomputable.

However, ALP is also characterized by its incomputability, meaning it is impossible to find the truly best models with any certainty due to the infinity of models to be tested, some of which take an unacceptably long time to evaluate. Despite this, ALP’s incomputability does not inhibit its practical use for prediction. In practice, approximations to ALP can be used, though it is impossible to determine how close these approximations are to the true ALP. The practical approach involves using approximations of ALP and methods to estimate their effectiveness in future problems (Out of Sample Error), methods to search for good models, and quick and simple methods to compare models. Despite its theoretical incomputability, ALP’s approximations can approach completeness, offering little error in prediction if sufficient time is spent searching for good models. This aspect makes ALP a desirable feature in practical prediction systems despite its incomputable nature.

In summary, the article does address the question of whether there’s always an Universal Turing Machine (UTM) that induces a Solomonoff prior approximating arbitrarily well a given positive distribution. It explains that while ALP in its true form is incomputable, in practice, approximations of ALP can be employed effectively. These approximations can approach the completeness of ALP, thereby allowing for accurate predictions and the discovery of regularities in data, albeit without the ability to ascertain how close these approximations are to the true ALP. This implies that while a UTM can be used to induce a Solomonoff prior, the exactness of this approximation in practice is limited by the incomputability of ALP and relies on the effectiveness of the approximation methods employed.

19. Recursive Taleb’s surgeon

If someone is significantly relying on Taleb’s surgeon to make a choice, then they are not competent enough to discern actual competence.

The actions are inferred from the argmax, but they are also inputs to the prediction models.

The actions sui generis being “inputs to the prediction models” does not distinguish CDT from EDT.

(To be continued, leaving now.)

I always thought AIXI uses CDT because the actions are inputs to the Turing machines rather than outputs, so it’s not looking at short Turing machines that output the action under consideration, the action is a given.

Care to explain why that’s EDT? A link to an existing explanation would be fine.

Lesswrong supports math.

If the core problem is power-seekers inevitably bubbling up whatever social ladder, then take inspiration from politics and put a time limit on duties above a certain rank.

I interpreted it the standard way too initially, but then I had a hunch there was… I dunno, something fishy, and then indeed it turned out @StrivingForLegibility understood it in a completely different way, so somehow it wasn’t clear! Magic.

Oh I see now, just needs to work to pinpoint Nash equilibria, I did not make that connection.

But anyway, the reason I’m suspicious of your interpretation is not that your math is not correct, but that it makes the OP notation so unnatural. The unnatural things are:

being context-dependent.

not having its standard meaning.

used implicitly instead of explicitly, when later it takes on a more important role to change decision theory.

Using as condition without mentioning that already if .

So I guess I will stay in doubt until the OP confirms “yep I meant that”.

I started working on a asymptotics problem, partly to see if I would change idea. I try to keep my eyes on the ball in general, so I started noticing the applications and practical implications of it. Previously, I had encountered the topic mostly reading up-in-the-clouds theoretical stuff.

I also think a tribal instinct was tinging my past thoughts; asymptotics were “Frequentist” while I was “Bayesian”.

To clarify: do you think in about 5 years we will be able to do such thing to then state of the art big models?

Off the top of my head: Q1 2023 I was vaguely scornful of asymptotics, Q4 2023 I think they are a useful tool.

I’m weirded out by this. To look at everything together, I write the original expression, and your expression rewritten using the OP’s notation:

Original:

Yours:

(I’m using the notation that a function applied to a set is the image of that set.)

So the big pi symbol stands for

So it’s not a standalone operator: it’s context-dependent because it pops out an implicit . The OP otherwise gives the impression of a more functional mindset, so I suspect the OP may mean something different from your guess.

Other problem with your interpretation: it yields the empty set unless all agents consider doing nothing an option. The only possible non-empty output is . Reason: each set you are intersecting contains tuples with all elements equal to the ones in , but for one. So the intersection will necessarily only contain tuples with all elements equal to those in .

The theory that my mind automatically generates seeing these happenings is that Ilya was in cahoots with Sam&Greg, and the pantomime was a plot to oust external members of the board.

However, I like to think I’m wise enough to give this 5% probability on reflection.

I’m trying to get a quick intuition of this. I’ve not read the papers.

My attempt:

On a compact domain, any function can be uniformly approximated by a polynomial (Weierstrass)

Powers explode quickly, so you need many terms to make a nice function with a power series, to correct the high powers at the edges

As the domain gets larger, it is more difficult to make the approximation

So the relevant question is: how does the degree at training phase transition change with domain size, domain dimensionality, and Fourier series decay rate?

Does this make sense?

A quantifier

Here you mean , right?

I was joking, but taking it seriously: I was thinking about voting effectively; do you mean changing country?

<snark> Your models of intelligent systems collapse to entropy on OOD intelligence levels. </snark>

I skimmed the paper and Figure 5 caught my attention. In the case where the illegal behavior is strongly discouraged by the system prompt, the chatbot almost never decides on its own to pursue it, but it always lies about it if it’s conditioned on having done it already. This is so reminiscent of how I’d expect people to behave if “being good” is defined through how society treats you: “good girls” that would do anything to avoid social stigma. I hypothesize the model is picking this up. Do you think it makes sense?

Ah guess it’s a typo then, and your use is a nonstandard one.