J Bostock

Karma: 845

Measuring Learned Optimization in Small Transformer Models

J Bostock8 Apr 2024 14:41 UTC

21 points

0 comments11 min readLW link

Briefly Extending Differential Optimization to Distributions

J Bostock10 Mar 2024 20:41 UTC

4 points

0 comments2 min readLW link

J Bostock 7 Mar 2024 12:57 UTC
1 point
0
in reply to: Magdalena Wache’s comment on: From Finite Factors to Bayes Nets
Thanks for the feedback. There’s a condition which I assumed when writing this which I have realized is much stronger than I originally thought, and I think I should’ve devoted more time to thinking about its implications.
When I mentioned “no information being lost”, what I meant is that in the interaction $A \to B$ , each value $b \in B$ (where $B$ is the domain of $P_{B}$ ) corresponds to only one value of $a \in A$ . In terms of FFS, this means that each variable must be the maximally fine partition of the base set which is possible with that variable’s set of factors.
Under these conditions, I am pretty sure that $A ⊥ C ⟹ A ⊥ C | B$

J Bostock 3 Feb 2024 19:04 UTC
1 point
0
in reply to: tailcalled’s comment on: From Finite Factors to Bayes Nets
I was thinking about causality in terms of forced directional arrows in Bayes nets, rather than in terms of d-separation. I don’t think your example as written is helpful because Bayes nets rely on the independence of variables to do causal inference: $X \to Y \to Z$ is equivalent to $X \leftarrow Y \leftarrow Z$ .
It’s more important to think about cases like $X \to Y \leftarrow Z$ where causality can be inferred. If we change this to $^X,^Y,^Z$ by adding noise then we still get a distribution satisfying $^X \to^Y \leftarrow^Z$ (as $^X$ and $^Z$ are still independent).
Even if we did have other nodes forcing $X \to Y \to Z$ (such as a node $U$ which is parent to $Y$ , and another node $V$ which is parent to $Z$ ), then I still don’t think adding noise lets us swap the orders round.
On the other hand, there are certainly issues in Bayes nets of more elements, particularly the “diamond-shaped” net with arrows $W \to X, W \to Y, X \to Z, Y \to Z$ . Here adding noise does prevent effective temporal inference, since, if $^X$ and $^Y$ are no longer d-separated by $^W$ , we cannot prove from correlations alone that no information goes between them through $^Z$ .

Finite Factored Sets to Bayes Nets Part 2

J Bostock3 Feb 2024 12:25 UTC

6 points

0 comments8 min readLW link

J Bostock 25 Jan 2024 19:20 UTC
1 point
0
in reply to: Morpheus’s comment on: From Finite Factors to Bayes Nets
I had forgotten about OEIS! Anyway Ithink the actual number might be 1577 rather than 1617 (this also gives no answers). I was only assuming agnosticism over factors in the overlap region $A B C D$ if all pairs $A B, A C, . . ., C D$ had factors, but I think that is missing some examples. My current guess is that any overlap region like $A B C D$ should be agnostic iff all of the overlap regions “surrounding” it in the Venn diagram ( $A B C$ , $A B D$ , $A C D$ , $B C D$ ) in this situation either have a factor present or agnostic. This gives the series 1, 2, 15, 1577, 3397521 (my computer has not spat out the next element). This also gives nothing on the OEIS.
My reasoning for this condition is that we should be able to “remove” an observable from the system without trouble. If we have an agnosticism, in the intersection $A B C D$ , then we can only remove observable $B$ if this doesn’t cause trouble for the new intersection $A B D$ , which is only true if we already have an factor in $A B D$ (or are agnostic about it).

From Finite Factors to Bayes Nets

J Bostock23 Jan 2024 20:03 UTC

38 points

7 comments8 min readLW link

J Bostock 16 Jan 2024 22:38 UTC
5 points
3
on: Natural Latents: The Math
I know very, very little about category theory, but some of this work regarding natural latents seem to absolutely smack of it. There seems to be a fairly important three-way relationship between causal models, finite factored sets, and Bayes nets.
To be precise, any causal model consisting of root sets $B$ , downstream sets $X$ , and functions mapping sets to downstream sets like $f_{4} : (B_{1} \otimes B_{3} \otimes X_{2}) \to X_{4}$ must, when equipped with a set of independent probability distributions over B, create a joint probability distribution compatible with the Bayes net that’s isomorphic to the causal model in the obvious way. (So in the previous example, there would be arrows from only $B_{1}$ , $B_{3}$ , and $X_{2}$ to $X_{4}$ ) The proof of this seems almost trivial but I don’t trust myself not to balls it up somehow when working with probability theory notation.
In the resulting Bayes net, one “minimal” natural latent which conditionally separates $X_{i}$ and $X_{j}$ is just the probabilities over just the root elements from $B$ which both $X_{i}$ and $X_{j}$ depend on. It might be possible to show that this “minimal” construction of $Λ$ satisfies a universal property, and so other $Λ^{'}$ which is also “minimal” in this way must be isomorphic to $Λ$ .

J Bostock 27 Dec 2023 15:11 UTC
1 point
0
in reply to: Charlie Steiner’s comment on: Differential Optimization Reframes and Generalizes Utility-Maximization
I think the position of the ball is in V, since the players are responding to the position of the ball by forcing it towards the goal. It’s difficult to predict the long-term position of the ball based on where it is now. The position of the opponent’s goal would be an example of something in U for both teams. In this case both team’s utility-functions contain a robust pointer to the goal’s position.

Differential Optimization Reframes and Generalizes Utility-Maximization

J Bostock27 Dec 2023 1:54 UTC

30 points

2 comments3 min readLW link

Mathematically-Defined Optimization Captures A Lot of Useful Information

J Bostock29 Oct 2023 17:17 UTC

19 points

0 comments2 min readLW link

Defining Optimization in a Deeper Way Part 4

J Bostock28 Jul 2022 17:02 UTC

7 points

0 comments5 min readLW link

Defining Optimization in a Deeper Way Part 3

J Bostock20 Jul 2022 22:06 UTC

8 points

0 comments2 min readLW link

Defining Optimization in a Deeper Way Part 2

J Bostock11 Jul 2022 20:29 UTC

7 points

0 comments4 min readLW link

Defining Optimization in a Deeper Way Part 1

J Bostock1 Jul 2022 14:03 UTC

7 points

0 comments2 min readLW link

Thinking about Broad Classes of Utility-like Functions

J Bostock7 Jun 2022 14:05 UTC

7 points

0 comments4 min readLW link

J Bostock 15 May 2022 10:49 UTC
1 point
on: But What’s Your *New Alignment Insight,* out of a Future-Textbook Paragraph?
I’d go for:
Reinforcement learning agents do two sorts of planning. One is the application of the dynamic (world-modelling) network and using a Monte Carlo tree search (or something like it) over explicitly-represented world states. The other is implicit in the future-reward-estimate function. You need to have as much planning as possible be of the first type:
1. It’s much more supervisable. An explicitly-represented world state is more interrogable than the inner workings of a future-reward-estimate.
2. It’s less susceptible to value-leaking. By this I mean issues in alignment which arise from instrumentally-valuable (i.e. not directly part of the reward function) goals leaking into the future-reward-estimate.
3. You can also turn down the depth on the tree search. If the agent literally can’t plan beyond a dozen steps ahead it can’t be deceptively aligned.

J Bostock 28 Apr 2022 23:13 UTC
6 points
on: The Game of Masks
I would question the framing of mental subagents as “mesa optimizers” here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of “humans are made of a bunch of different subsystems which use common symbols to talk to one another” has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.
For example, I might reframe a lot of the elements of talking about the unattainable “object of desire” in the following way:
1. Human minds have a reward system which rewards thinking about “good” things we don’t have (or else we couldn’t ever do things)
2. Human thoughts ping from one concept to adjacent concepts
3. Thoughts of good things associate to assessment of our current state
4. Thoughts of our current state being lacking cause a negative emotional response
5. The reward signal fails to backpropagate to the reward system in 1 enough, so the thoughts of “good” things we don’t have are reinforced
6. The cycle continues
I don’t think this is literally the reason, but framings on this level seem more mechanistic to me.
I also think that any framings along the lines of “you are lying to yourself all the way down and cannot help it” and “literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way” are just kind of bad. Seems like a Kafka trap to me.
I’ve spoken elsewhere about the human perception of ourselves as a coherent entity being a misfiring of systems which model others as coherent entities (for evolutionary reasons), I don’t particularly think some sort of societal pressure is the primary reason for our thinking of ourselves as being coherent, although societal pressure is certainly to blame for the instinct to repress certain desires.

J Bostock 23 Apr 2022 15:08 UTC
13 points
on: China Covid #2
I’m interested in the “Xi will be assassinated/otherwise killed if he doesn’t secure this bid for presidency” perspective. Even if he was put in a position where he’d lose the bid for a third term, is it likely that he’d be killed for stepping down? The four previous paramount leaders weren’t. Is the argument that he’s amassed too much power/done too much evil/burned too many bridges in getting his level of power?
Although I think most people who amass Xi’s level of power are best modelled as desiring power (or at least as executing patterns which have in the past maximized power) for its own sake, so I guess the question of threat to his life is somewhat moot with regards to policy.

J Bostock 20 Apr 2022 22:38 UTC
4 points
on: Jemist’s Shortform
Seems like there’s a potential solution to ELK-like problems. If you can force the information to move from the AI’s ontology to (it’s model of) a human’s ontology and then force it to move it back again.

This gets around “basic” deception since we can always compare the AI’s ontology before and after the translation.

The question is how do we force the knowledge to go through the (modeled) human’s ontology, and how do we know the forward and backward translators aren’t behaving badly in some way.