Vanessa Kosoy

Karma: 10,309

Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran’s group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.

E-mail: {first name}@alter.org.il

Vanessa Kosoy 17 Apr 2026 17:22 UTC
2 points
0
in reply to: quetzal_rainbow’s comment on: Vanessa Kosoy’s Shortform
What I meant is not “people only care about ~Dunbar number of people”, but something more like “the closest ~Dunbar number of people have [some fraction around the range 1/1000-1/2] of the total value”. Giuseppe Garibaldi was also influenced by considerations such as increasing his own status (or maybe even posthumous reputation).
As to “humans are not capable to behave this way rationally”, I disagree. (The whole point of decision theories like UDT/FDT is that you don’t need to rewrite your source code to behave in an a priori-optimal way, and I believe that I’m fully capable of following the recommendations of such decision theories—and do follow their recommendations. )There is probably also a sense in which we value something vaguely akin to “abstract moral concepts”, but this caches out to something very different from utilitarianism (closer to virtue ethics).

Vanessa Kosoy 17 Apr 2026 15:17 UTC
4 points
0
in reply to: Viliam’s comment on: Vanessa Kosoy’s Shortform
I’m not sure what do you mean by “does not scale much”, but I agree with everything else. (My own ideal outcome is not literally “I am the queen”, but the same principle applies.)

Vanessa Kosoy 15 Apr 2026 8:04 UTC
LW: 2 AF: 2
0
AF
in reply to: Vanessa Kosoy’s comment on: Vanessa Kosoy’s Shortform
The above treatment of “CDT precommitment games” is problematic: the concept made sense in the context of “FDT to CDT translation” but it’s not clear what it’s doing here (i.e. what is the first argument?) Here is a better treatment.
Definition: A CDT decision problem is the following data. We have a set of variables and for each we have its range , its set of parents and its kernel
The parent relation must induce an acyclic directed graph. We also have a selected subset of decision variables and a selected subset of outcome variables s.t. . For each there is a special element (denoting that the decision wasn’t made) and we denote . We are a given a loss function
This is connected to our overall formalism by setting and . We also define
The CDT counterfactuals and decision-rule are defined via a do-operator that forces to take the value unless the value was in which case it remains .
Definition: A CDT precommitment game is a CDT decision problem in which there is some special (the precommitment decision) s.t. , denoting ,
1.
2. For some we have (where denotes the decision to not precommit).
3. For every it holds that and is s.t. if has any value other than then takes the value .
4. For every , there is a special s.t. (i) (ii) (iii) is the only variable with parent (iv) is defined s.t. if has value or then has the same value as , whereas if has value then has the value .
This is connected to our abstract notion of precommitment game by setting , and for . We define to hold whenever (i) for some (ii) (iii) for every , .
The underlying decision problem of the precommitment game is constructed by deleting and identifying and in the obvious way.
The game is said to be trivial when all variables with parent are of the form or .
Proposition: CDT is precommitment-stable in trivial precommitment games.
Definition: Given a CDT precommitment game with , its DDT-translation is defined by setting and
Proposition: If is a trivial CDT precommitment game then its DDT-translation is pseudocausal. Moreover, plain DDT is precommitment stable on the translation even without iteration.
What links here?
- Vanessa Kosoy's comment on Vanessa Kosoy’s Shortform by Vanessa Kosoy (14 Apr 2026 17:22 UTC; 2 points)

Vanessa Kosoy 14 Apr 2026 17:22 UTC
LW: 2 AF: 2
0
AF
in reply to: Vanessa Kosoy’s comment on: Vanessa Kosoy’s Shortform
Above, I compare different decision theories to FDT. At the same time, I claim that in a deeper sense, FDT is ill-defined. One may doubt whether that is a coherent line of reasoning. Therefore, instead of a comparison to FDT, I propose to frame these observations as being about stability to precommitments. Details follow.
Definition: A precommitment game is a decision problem in which
1. We are given some (the precommitment policies). We will denote .
2. We are given : for each precommitment , it says to which policy this is precomitting.
3. For any , if then : if a precommitting decision is made, then it is the only decision.
4. Denote , . (Notice that .) We are given a relation s.t. (i) implies and (ii) and implies . This tells us which precommitted outcomes correspond to which unprecomitted outcomes.
The restriction of to is called the underlying decision problem of .
Definition: An EDT precommitment game is a precommitment game which is also an EDT problem (i.e. extensive form and equipped with ) s.t. the following property holds. Denote (the “external” i.e unprecomittable policies) and . We require that there is some and s.t.
1. is a convex combination of and .^[1]
2. is in the convex hull of where ranges over .
The underlying decision problem is then an EDT decision problem with the belief .
as above with is called formally causal when for any and
(Is there a natural generalization without the assumption ? I don’t know.)
Proposition: EDT is precommitment-stable in formally causal precommitment games. That is, in any such game there is which is EDT-optimal.
For example, XOR blackmail can be formalized as an EDT precommitment game which is not formally causal and EDT is not precommitment-stable there (the only optimal policy is precommitting to reject).

Definition: [EDIT: The treatment of CDT here is problematic, see child post.] A CDT precommitment game is a precommitment game which is also a CDT problem (in the sense that we are given and formally causal in the second argument) s.t. the following property holds. For any and , there is some s.t. and .
The underlying decision problem is then a CDT decision problem with and .
as above is called policy-bottlenecked when for any , . (Because this condition holds when the decision nodes form a bottleneck in the causal network s.t. the outcome depends only on nodes on the downstream side.)
Proposition: CDT is precommitment-stable in policy-bottlenecked precommitment games. That is, in any such game there is which is CDT-optimal.
For example, Newcomb’s paradox can be formalized as a CDT precommitment game is which is not policy-bottlenecked and CDT is not precommitment-stable there (the only optimal policy is precommitting to one-box).
Definition: A DDT precommitment game is a precommitment game which is also a DDT problem (in the sense that we are given ) s.t. the following property holds. For any and , if is supported on then there exists s.t. and . is called pseudocausal when we can also guarantee that .
The underlying decision problem is then a DDT decision problem with .
Proposition: IDDT is precommitment-stable in pseudocausal precommitment games. That is, in any such game there is which is IDDT-optimal.
It should be straightforward to also formulate an analogous claim with plain DDT and iterated pseudocausal precommitment games.
To make the claim that DDT/IDDT is precommitment-stable more often than EDT and CDT, we need to somehow compare different decision theories on the same game. For this purpose, we have the following translations.
Definition: Given an EDT precommitment game with , its DDT-translation is defined by setting
Proposition: If is formally causal then its DDT-translation is pseudocausal. Moreover, plain DDT is then precommitment-stable even without iteration.
Definition: Given a CDT precommitment game, its DDT-translation is defined by setting
Proposition: If is policy-bottlenecked then its DDT-translation is pseudocausal. Moreover, plain DDT is then precommitment-stable even without iteration.
1. ^
  Below we only use the case , in which case there is no and this simplifies to .

Vanessa Kosoy 12 Apr 2026 6:38 UTC
3 points
0
in reply to: MichaelDickens’s comment on: Vanessa Kosoy’s Shortform
There is no objective morality, but there is such as a thing as objectively rational decision-making. And I never said anything about egoism.
Your comment sounds to me like it’s coming from a particular school of moral philosophy discourse, which (in my view) is built on the erroneous redefinition of words. In particular, “moral” and “rational”, together with various synonym-ish words, mean different very things in colloquial speech, but this type of moral philosophy discourse conflates them. In theory, you can of course define your words any way you like. However, if you do so, you relinquish the right the argue from any common sense intuitive claim that uses these words in their original meaning. (Which, in my view, is how fallacies are smuggled in during this kind of discourse.)
Similarly, “egoism” and “taking rational actions according to your own preferences” are also very different things.
(Thank you for your comment, my explanation here is a useful addition to the OP, I think.)

Vanessa Kosoy 11 Apr 2026 11:50 UTC
21 points
−19
on: Vanessa Kosoy’s Shortform
Takes on moral philosophy and the history of this community that I mostly mentioned before but should maybe be put together somewhere:
- Human preferences are very partial/parochial, and this is meta-endorsed. There is a finite number P s.t. for any N>0, the lives of N strangers are less than P times as terminally-valuable for you as the life of your loved one. If you want to be honest with yourself (which you should if it’s high-value for you to have accurate beliefs), you should endorse this.
- (Objective, abstract) Morality is fake, both non-cognitivism and error theory have merit. Parochial altruistic preferences (=empathy) are real, rational and superrational cooperation are real. Morality-as-used-in-practice is a process of continuous negotiations about social norms (the “social contract” if you like).
- In particular, utilitarianism is very confused. That said, (super)rational cooperation can cash out as something utilitarianism-ish in some situations. (For example, if it is best for everyone if we precommit to derailing the trolley even if a personal friend is on the other track.)
- Paradoxes such as Pascal’s mugging, population ethics and infinite ethics all stem from trying to use a confused framework (impartial and unbounded utility).
- This type of confusion contributed to the failure of Old MIRI’s agent foundations programme, by causing it to over-index on ideas like Pascal’s mugging and the procrastination paradox.
- The self-deceptive endorsement of impartial unbounded utility obscures the importance of multi-agent considerations in morality-as-used-in-practice, and this contributed to failures of the Effective Altruism movement such as SBF and OpenAI. Ideas such as the “pivotal act” are also sus in a similar way (although I can see versions of that which might be justified).

Vanessa Kosoy 11 Apr 2026 9:40 UTC
LW: 2 AF: 2
0
AF
in reply to: abramdemski’s comment on: Vanessa Kosoy’s Shortform
This argument uses the assumption that Alice can’t change eir beliefs in response to learning that Omega has proposed specific bets and not others.
Not true. Changing her beliefs in response to Omega’s proposal doesn’t help her. Imagine that Alice is given a choice between
1. Take a bet that pays +2 if X and −1 if not-X.
2. Take a bet that pays +2 if not-X and −1 if X.
3. Refuse both bets.
No matter what probability Alice assigns to X after her update, “normal” Bayesian calculus (really CDT calculus, see below) mandates that she chooses 1 or 2, not 3.
It seems clear that a bookie can reliably make money from gamblers if the bookie knows which horse will win which race; this is not, in the classical way of thinking, a testament to the irrationality of the gamblers.
I guess this example assumes the gamblers are not allowed to update on the offered bets? (Otherwise it doesn’t make sense to me.) Like I said, we don’t assume it here.
Instead, infrabayesianism recommends a strict preference for mixed strategies.
Not really, you’re over-indexing on the somewhat outdated 6 year old post you’re replying to. It is true that if Alice has a coin that Omega cannot predict, she can come ahead by betting according to the coin. But, as my 1-2-3 example above demonstrates, this is not the core idea. The “modern” formulation of infra-Bayesianism only allows deterministic policies, whereas randomization is modeled by means such as “taking the action to flip a coin”.
That version relies on a “causal” assumption that Omega’s choices are probabilistically independent of the gambler’s. This assumption seems inherently contrary to the problem description (since Omega can predict the gambler’s choices, and uses those predictions to make its choices).
What is actually going on here, this is not a Dutch Book argument against Bayesianism per se, this is a Dutch Book argument against Bayesian CDT. CDT-Alice believes that choosing to bet on X doesn’t influence the veracity of X, since there is indeed no physical causal link from the former to the latter (X might even be determined before the bet is offered or made). EDT-Alice can succeed here by noticing that her own choice is correlated with X and therefore the probability of X differs between the “Alice bets on X” counterfactual and the “Alice bets on not-X” counterfactual.
So, why is this example interesting beyond other examples that undermine CDT?
Mainly, it’s just easier to understand how infra-Bayesianism solves the problem here, and in particular we only need (crisp) credal sets rather supradistributions (fuzzy credal sets).
Another reason is, the notion of subjective probability is often justified by thinking about bets. But thinking about bets requires a decision theory, and not just a theory of epistemology. Hence, once you noticed that you’re confused about decision theory, you should be open to reconsidering the notion of subjective probability as well.
Yet another reason is, there’s something interesting going on where the supra-POMDP method of dealing with Newcombian problems preserves causality in some sense, while the EDT solution “violates” it. I thought it’s notable, although probably more important are the cases where EDT fails altogether (while infra-Bayesianism / DDT succeeds).

Vanessa Kosoy 9 Apr 2026 15:25 UTC
LW: 3 AF: 3
0
AF
in reply to: DavidHolmes’s comment on: [Paper] Stringological sequence prediction I
I haven’t tried LZP in practice, but you can guess what results to expect by looking at the size of the LZ77-compression of the text. I expect that any remotely decent text prediction algorithm would be based on stochastic process prediction. The deterministic setting is just a toy model.
Thanks for the catch!

Vanessa Kosoy 1 Apr 2026 15:48 UTC
21 points
0
on: Lesswrong Liberated
Star Trek
It’s supposed to look like the control panel of the Enterprise.

Vanessa Kosoy 29 Mar 2026 9:55 UTC
LW: 2 AF: 2
0
AF
in reply to: Vanessa Kosoy’s comment on: Vanessa Kosoy’s Shortform
A few more observations.
Partially Observable Iteration
The definition of iteration we had before implicitly assumes that the agent can observe the full outcome of previous iterations. We don’t have to make this assumption. Instead, we can assume a set of possible observations and a mapping , in which case we define
I believe that Theorem 4 remains valid.
Idealized Disambiguative Decision Theory
As we remarked before, DDT is not invariant under adding a constant to the loss function. It is interesting to consider what happens when we add an increasingly large constant. In the limit, DDT converges to something I dubbed “Idealized Disambiguative Decision Theory” (IDDT)^[1], which works as follows.
For IDDT, it is sufficient to let be crisp (i.e. a credal set). We may allow supracontributions if we wish, but any problem defined via “unambiguous” FDT (i.e. as opposed to ) reduces to the crisp case. Define by
For problems coming from unambiguous FDT, , but IDDT is defined in full generality. For every , define by
The decision rule is then
Notice that it is now invariant w.r.t. adding constants to . Moreover,
Proposition 5: For any stable problem, it holds that (i) any IDDT-optimal policy is FDT-optimal (ii) there is an FDT-optimal policy which is IDDT-optimal. For any pseudocausal problem, it also holds that any FDT-optimal policy is IDDT-optimal.
One might think, based on this proposition, that IDDT is a superior decision theory to DDT. However, I think that IDDT is incompatible with learning, because of its discontinuous dependence on probabilities.
More Examples
Absent-Minded Driver
(Based on Aumann, Hart and Perry.) We will operationalize the problem by assuming the agent’s decision may deterministically depend on observing a coin flip. To simplify the presentation, we assume a single coin flip per intersection, which limits the resulting probabilities to , but it’s easy to generalize further.
Denote by and the constant policies. Denote by the policy
Denote by the remaining policy.
Consistently with our source, we set the loss function to be , , (it doesn’t depend on the coin flips).
This problem is formally causal. However, as opposed to all previous examples, it has no extensive form! Hence, EDT in the sense we defined it is ill-posed: to apply EDT reasoning here we need to at least supplement it by a theory of anthropic probabilities. CDT’s counterfactuals agree with FDT’s if we posit that the do-operator is constrained to choosing among “absent-minded” policies.
Self-Prisoner’s Dilemma
Previously we described the self-coordination problem, but perhaps self-PD is a more striking example.
Here, and is the agent’s factual play, whereas and is the agent’s counterfactual play as predicted by Omega.
Using the obvious notations , we have
The loss is the usual PD loss of the “factual” player.
This problem is not formally causal, because e.g.
The natural CDT interpretation is the one where the factual policy controls the counterfacual player and the counterfactual policy controls the factual player. (Alas, the terminology gets confusing here: in one case the words “factual” and “counterfactual” refer to the agent’s policy, and in the other case to the coin’s outcome.) Both CDT and EDT play regardless of self-belief. However, the problem is pseudocausal and hence DDT converges to .
1. ^
  IDDT is related to the old idea of “surmeasures” from the original infra-Bayesianism sequence.
2. ^
  We can also imagine equipping the agent with a “self-belief” (not necessarily ) and setting , in which case also becomes relevant.

Vanessa Kosoy 27 Mar 2026 13:23 UTC
LW: 2 AF: 2
0
AF
in reply to: Vinayak Pathak’s comment on: An Introduction to Credal Sets and Infra-Bayes Learnability
What you propose here doesn’t address the issue of non-realizability at all. For example, let’s say is countable. Then any of the 3 regret criteria (uniform, Bayesian and your own “credal” proposal) implies that the algorithm would converge to a near-optimal policy for any given . This cannot work if some such is infeasible to optimize.

Vanessa Kosoy 26 Mar 2026 16:48 UTC
LW: 3 AF: 2
0
AF
in reply to: Vanessa Kosoy’s comment on: Vanessa Kosoy’s Shortform
This is an idea I came up with and presented in the Agent Foundations 2025 at CMU conference.

Here is a nice simple formalism for decision theory, that in particular supports the decision theory coming out of infra-Bayesianism. I now call the latter decision theory “Disambiguative Decision Theory”, since the counterfactuals work by “disambiguating” the agent’s belief.

Formalism

Let be the agent’s event space and the space of possible policies ^[1] . Let be the agent’s loss function. For each , we are given some . ^[2] This represents the event “the agent’s behavior is consistent with policy ”. We assume that

This data is common for all decision theories, but the rest of the details depend on the theory:

Functional Decision Theory (FDT)

We are given a mapping s.t. is supported on for all . The distribution represents the logical counterfactual associated with . It is also possible to consider the more general “robust” version , but we will avoid it here for simplicity. The decision rule is then

We will call an FDT problem “formally causal” when for any , the measures and agree when restricted to . That is, for any measurable , we require

Causal Decision Theory (CDT)

CDT has the same formal form as FDT, but we always require the problem to formally causal. Moreover, the interpretation of is different: it now represents the causal counterfactual associated with . The decision rule is also formally the same:

Given an FDT problem , we can translate it to a CDT problem, if we specify the agent’s belief about its own policies and causal interpretation: the kernel . Here is a copy of that represents the factual policy and is a copy of that represents the counterfactual policy. We require that , and that is formally causal in the second argument.

Given this data, we define the translation

Extensive Form and Evidential Decision Theory (EDT)

Extensive Form

To formalize EDT, we need to assume the decision process is given in “extensive” form. That is, we have a set of decision points, for each a set of actions , and a mapping , that defines the previous decision point and action. Here, we use the notation

We assume that is acyclic and hence makes into the vertices of a forest whose edges are labeled by .

We define a policy to be s.t.
2. For every , there is at most one s.t. .
3. For every , if then there exists some s.t. .
is now the set of policies defined in this way.

We further assume that there is a mapping (representing the last action taken) s.t. for all

Here, stands for iterating in the obvious way.

For any , we can use the notation

This represents the event “the decision point actually takes place”.

EDT

So far, this notion of extensive form decision problem is useful not just for EDT. Specifically for EDT, we add the assumption that we’re given the agent’s belief . We can now state the EDT decision rule. We define recursively. Always, .

For every s.t. , we set

Thus, the agent conditions both on following policy and observing decision point .

Given an FDT problem in extensive form, we can translate it to a EDT problem, if we specify the agent’s belief about its own policies . We define the translation

Disambiguative Decision Theory (DDT)

We are given the agent’s belief . Here, refers to supracontributions. The decision rule is

Here, is the characteristic function of the set . Equivalently, we can define by

We then have

This is the reason for the name “disambiguative”: is a “disambiguated” version of , where the policy is made unambiguous.

Given an FDT problem , we can translate it to a DDT problem without any further data:

That is, is the supracontribution hull of the distributions when ranges over .

DDT does have the odd property of non-invariance w.r.t. shifting by a constant, as opposed to all other decision theories considered. There might be some story about how this non-invariance is an inevitable consequence of learning (where imposing bounds on is important), but I’m not ready to tell it.

Comparison

Now, let’s look into how different decision theories compare. We will be using FDT as the “gold standard” throughout, when it comes to choosing the correct policy. Note though, that FDT assumes we somehow assign strict meaning to the logical counterfactuals, which is unclear how to accomplish. On the other hand, DDT makes the substantially weaker assumption that can define the supracontribution belief. In particular, it is consistent with learning, as was explained here.

Proposition 1: Consider a formally causal FDT problem . Assume that the causal interpretation takes the form . Then, .

Proposition 2: Consider a formally causal FDT problem in extensive form. Then, .

Proposition 3: Consider a formally causal FDT problem. Then, Then, .

Thus, in the strictly causal case all decision theories coincide: but even here DDT requires the least precise assumptions for that to work (compared to CDT and EDT). More importantly, DDT allows to go far beyond the formally causal case. However, we do need a mild assumption about the problem:

Definition 1: An FDT problem is called pseudocausal when for any , if then .

It’s easy to see that any formally causal problem is pseudocausal, but there are many counterexamples to the converse.

Essentially, pseudocausality means that the outcome cannot depend on decisions in situations of probability 0. Notice that in reality the agent is never absolutely certain about the decision problem, hence observing a situation of probability 0 should cause it to believe it is in a different decision problem altogether. This makes the pseudocausality condition very natural.

Pseudocausality has the nice property of not depending on the loss function. If we do allow dependence on the loss function, we can make do with an even weaker condition.

Definition 2: An FDT problem is called stable when there exists an FDT-optimal s.t. for any , if then is also FDT-optimal.

It’s obvious that any stable problem is pseudocausal. Naturally, the converse is false.

Neither pseudocausality nor stability is sufficient to guarantee that DDT and FDT give identical recommendations. However, it becomes true when we iterate the problem.

Definition 3: Given a decision problem and , we define its -th power as follows. The event space is just the ordinary power . The policy space is . The loss function is

Given , we define by

For FDT, for any we define the kernel by . We then define the logical counterfactuals

For DDT, we take the belief to be .

Note that iterating a problem commutes with converting it from FDT to DDT.

Theorem 4: For a stable FDT problem, there exists s.t. for any , DDT and FDT agree on the problem .

The requirement to iterate doesn’t seem like a terrible cost, since in a learning context some kind of iteration is necessary anyway. It can also be understood as a natural result of the need for stability: problems that are close to being unstable require more iterations.

Examples

All these examples besides the last one have natural extensive forms with one decision point.

Newcomb

This problem is formally causal, however the usual causal interpretation is non-trivial:

As a result, .

XOR Blackmail

The problem is pseudocausal but not formally causal. Nevertheless, CDT agrees with FDT thanks to the following causal interpretation:

Counterfactual Mugging

The problem is pseudocausal but not formally causal.

Empty-Dependent Transparent Newcomb

For simplicity, we postulate that the agent is forced to two-box when seeing a full box, since this choice is a “no-brainer” for all decision theories.

The problem is stable but not pseudocausal. EDT is ill-posed because , where is the unique decision point (that corresponds to seeing an empty box).

Full-Dependent Transparent Newcomb

As above, we postulate that the agent is forced to two-box when seeing an empty box.

The problem is not stable. EDT is ill-posed because , where is the unique decision point (that corresponds to seeing an full box). DDT is indifferent between and , but it’s possible to construct a variant where DDT is strictly FDT-suboptimal.

Full-Dependent Transparent Newcomb with Noise

We now assume Omega has a probability of filling the box even when the agent two-boxes.

The problem is pseudocausal, but not formally causal of course. EDT is well-posed and . DDT converges to FDT after iterations.

Self-Coordination

Here’s an interesting example of a problem with two decision points. Omega flips a coin and shows the result to the agent. The agent then has to choose between buttons A, B and C. Button C always yields 3 dollars. Buttons A and B yield 4 dollars if Omega predicts the agent would choose the same button in the other coin counterfactual, and 0 dollars otherwise.

The rest of the definitions are clear and we won’t write them out. The problem is pseudocausal but not formally causal. CDT and EDT agree here, with their behavior depending on the agent’s self-belief . For uniform they choose the FDT-suboptimal policy . Moreover, there is an “equilibrium” where they choose even for “calibrated” (i.e. that puts most of the probability mass on ).
1. ↩︎
  It is simplest to think of both as finite sets, but they can also be compact Polish spaces.
2. ↩︎
  In the topological case, is required to be closed.
What links here?
- Vanessa Kosoy's comment on Vanessa Kosoy’s Shortform by Vanessa Kosoy (11 Apr 2026 9:40 UTC; 2 points)

Vanessa Kosoy 25 Mar 2026 19:19 UTC
LW: 2 AF: 2
0
AF
in reply to: Vinayak Pathak’s comment on: What is Inadequate about Bayesianism for AI Alignment: Motivating Infra-Bayesianism
the computational complexity of individual hypotheses in the hypothesis class cannot be the thing that characterizes the hardness of learning, but rather it has to be some measure of how complex the entire hypothesis class is.

This is true, of course, but mostly immaterial. Outside of contrived examples, it’s rare for the hypothesis class to be feasible to learn while containing hypotheses that are infeasible to evaluate. It seems extremely implausible that you can find a hypothesis class that is simultaneously (i) possible to specify in practice ^[1] (ii) feasible to learn and (iii) contains a hypothesis which is an exact description of the real universe. Therefore, non-realizability is unavoidable.
1. ↩︎
  By which I mean, we can construct the learning algorithm without being something akin to omniscient beings that already know everything about the universe and are able to hardcode this knowledge into the algorithm. Indeed, the reasons why we need a learning algorithm at all are (i) we don’t know a lot of what we want the agent to know (ii) it’s too labor-intensive to hardcode even the things that we do know. Therefore, we need a hypothesis class that is extremely broad and mostly uninformative.

Vanessa Kosoy 21 Mar 2026 17:32 UTC
LW: 6 AF: 4
0
AF
in reply to: Vanessa Kosoy’s comment on: Vanessa Kosoy’s Shortform
This idea was described in a presentation I have in ’23, but wasn’t written down anywhere.

Here is a formalization of recursive self-improvement (more precisely, recursive metalearning) in the metacognitive agent framework.

Let be the set of programs or computations represented using some syntax. For example, might be a set of strings that serve as programs for a fixed Universal Turing Machine, but might also be something more structural, like the space of -terms or the space of PCF programs. Let be the space of semantic objects that we assign to computations. For example, we might just consider programs that output a single bit, in which case , but we can also consider rich semantic theories such as domain theory or game semantics. Let be the set of “weakly consistent” mappings from to . Here, “weakly consistent” means that we might impose some consistency conditions or none, but the conditions are weak enough to admit computationally tractable learning (in particular, these conditions fall short of picking out the true semantics). Thus, is our space of logical counterpossible worlds.

Let be the hypothesis class our agent is learning and an associated complexity function. With every we associate some which a prior over refinements of . Moreover, we require a symbolic representation of , that is, some s.t. , where is the true semantics. For example, maybe there is some with a mapping (some semantic objects that can be interpreted as elements of ) and s.t. for all , , and then .

Consider any symbolic representation of an element of , that is, some . Then, it is possible to construct , which is an intrinsic version of . That is, corresponds, from the agent’s perspective, to the assertion “the belief represented by is true” (without committing to the explicit form of this belief). is defined as follows.

Define by

can be regarded as a multivalued mapping from to . ^[1] This mapping is Kakutani. Therefore, by Kakutani’s theorem, it has a non-empty set of fixed points . Moreover, it’s easy to see that is convex and closed, and therefore an element of .

Given we define the associated “metahypothesis” as

We now say that an agent is recursively metalearning (w.r.t. the choices involved), if (i) it satisfies a “good enough” regret bound w.r.t. and (where is allowed to appear in some way in the regret bound for ) and (ii) For every , we have and , where is some sufficiently slow-growing function, e.g. a polynomial.

Intuitively, this reflects the idea if is true, the agent should be able to not just learn , but also learn to exploit for subsequent learning, where “subsequent learning” is operationalized as optimization under the prior .
1. ↩︎
  For simplicity, we assume that is just credal sets over . For supradistributions, use instead of .

Vanessa Kosoy 8 Mar 2026 20:06 UTC
17 points
19
on: How does LessWrong’s Ranking Algorithm Work?
Just don’t. I understand the frustration of not getting engagement, but don’t spam the site.

Vanessa Kosoy 22 Feb 2026 8:54 UTC
LW: 2 AF: 2
0
AF
in reply to: Cole Wyeth’s comment on: Formalizing Newcombian problems with fuzzy infra-Bayesianism
Halpern and Leung propose the “minimax weighted expected regret” (MWER) decision-rule, which is a generalization of the minimax-expected-regret (MER) decision-rule. In contrast, our decision rule is a weighted generalization of maximin-expected-utility (MMEU). The problem with MER is that it doesn’t work very well with learning. The closest thing to doing learning with MER is adversarial bandits. However, adversarial regret is statistically intractable for Markov Decision Processes. And even with bandits there is a hidden obliviousness assumption if you try to interpret it in a principled decision-theoretic way.

Vanessa Kosoy 21 Feb 2026 17:19 UTC
LW: 10 AF: 6
0
AF
in reply to: Cole Wyeth’s comment on: An Introduction to Credal Sets and Infra-Bayes Learnability
The truth is outside of my hypothesis class, but my hypothesis class probably contains a non-trivial law that is a coarsening of the truth, which is the whole point.
For example, you can imagine that you start with some kind of intractable simplicity prior. Then, for each hypothesis you choose a tractable law that coarsens it. You end up with a probability distribution over laws.
A different way to view this is, this is just a way to force your policy to have low-regret w.r.t. all/most hypothesis while weighing complex hypotheses less. For a complex hypothesis, you naturally expect learning it to be harder so you’re weighing its regret less. Typically, it’s only possible to have a uniform regret bound if you impose a bound on the complexity of hypotheses in some sense. Absent such a bound, your regret bound must be non-uniform. You can formalize it by explicitly allowing the per-hypothesis regret to depend on some complexity parameter, but the Bayes approach is an alternative. (Also, Bayes regret obviously implies per-hypothesis non-uniform regret with a 1/probability coefficient.)

Vanessa Kosoy 21 Feb 2026 11:10 UTC
LW: 2 AF: 2
0
AF
in reply to: Cole Wyeth’s comment on: An Introduction to Credal Sets and Infra-Bayes Learnability
First, Bayes-regret and worst-case-regret are standard concepts in classical RL theory, and the infra-versions are straightforward analogs.
Second, you don’t have to focus on the Bayes-regret necessarily. In fact, in our papers, we focus entirely on uniform (worst-case) regret bounds.
Third, instead of an ordinary prior over laws you can consider an infraprior over laws (i.e. have ambiguity in hypothesis-space and not just in outcome-space). The resulting notion of “infra-Bayes-regret” has both Bayes-regret and worst-case-regret as special cases.
Fourth, the justification is quite straightforward. If you have an (unambiguous i.e. ordinary probability distribution) prior over laws, and your performance metric is the Bayes-infra-expected utility, then the Bayes-regret is just the difference between the performance of your policy and the performance of an optimal policy that magically knows the true hypothesis. So it’s a very natural measure of your policy’s ability to learn the hypothesis.

Vanessa Kosoy 12 Feb 2026 10:02 UTC
3 points
0
in reply to: Joanna’s comment on: Joanna’s Shortform
I like the overall vibe. Two issues:
- It says “Top Posts” and the mouse-over text is “by karma”, however in reality I can choose which posts to put there. Now, I like it that I can choose which posts to put there, but once I customized them, the mouse-over becomes a lie.
- ~~The “recent comments” disappeared. This is~~ ~~really bad~~ ~~because I use that to find my recent comments when I want to edit them. (For example now I wanted to find this comment to add this second bullet but had to do it manually.)~~ OK, I now see I can find them under “feed” but this might be confusing.

Vanessa Kosoy 20 Jan 2026 7:39 UTC
8 points
2
on: “The first two weeks are the hardest”: my first digital declutter
[Context: I’m not a digital minimalist but I am somewhat of a “digital reducetarian”: I don’t have social media (besides LinkedIn) and have a browser plugin that reduces my access to particular websites (like LessWrong).]
Cool post :)
For me, there’s something “strange” here (not surprising, but unlike my own experience), where the implication is that people have huge swaths of “free time” that they use for scrolling and the like (which you instead use for what’s described in this post). I spend the vast majority of my time either working or doing something with kids/lovers/friends. (I did read this post in bed preparing to start my day, and am sneaking in this comment between breakfast and work.) Plus short breaks from work, and a short time in bed before sleeping, during which I read fiction books (admittedly using digital means, but in principle I could use physical books just as well, if I could fit them all into my apartment).
It’s fun to hear about your experience talking to random strangers! Catalogued it under “I would never do this but I’m glad some people do”.

Vanessa Kosoy

Partially Observable Iteration

Idealized Disambiguative Decision Theory

More Examples

Absent-Minded Driver

Self-Prisoner’s Dilemma

Formalism

Functional Decision Theory (FDT)

Causal Decision Theory (CDT)

Extensive Form and Evidential Decision Theory (EDT)

Extensive Form

EDT

Disambiguative Decision Theory (DDT)

Comparison

Examples

Newcomb

XOR Blackmail

Counterfactual Mugging

Empty-Dependent Transparent Newcomb

Full-Dependent Transparent Newcomb

Full-Dependent Transparent Newcomb with Noise

Self-Coordination