Winter Cross

Karma: 33

I’m a mathematician passionate about AI Safety. Currently, I’m doing agent foundations research with Dovetail. Check out my research on this account!

Winter Cross 14 Jun 2026 21:54 UTC
1 point
0
on: Paying Kids To Do Schoolwork
Not only does this serve as a powerful tool for incentivizing learning — by paying students to do work, we unlock a powerful tool for speeding up education: asynchronous learning.
I don’t really see how paying students can allow for asynchronous learning. It seems to me that would be more related to changes in how daily agendas are structured and how grades are determined rather than the incentive structure

Winter Cross 12 Jun 2026 15:17 UTC
3 points
1
on: AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing
Consider a robot that is manipulating a deck of cards. This can be described with a world model with possible states, corresponding to the possible arrangements of the deck
Should this be 52! instead of 13! since there are 52 cards in a deck?

Winter Cross 11 Sep 2025 21:39 UTC
2 points
0
in reply to: Alfred Harwood’s comment on: The Internal Model Principle: A Straightforward Explanation
Thanks for the answer! That confirms what I was thinking.
That second case: $(s_{1}, w_{1}) \to (s_{2}, w_{2})$ and $(s_{1}, w_{3}) \to (s_{2}, w_{2})$ surprised me since I initially thought that the IMP implied that there was an additional isomorphism between the controller and the environment. I guess that isomorphism effectively still exists since it can be created through the use of coarse graining over the controller states like you mentioned.

Winter Cross 9 Sep 2025 19:41 UTC
2 points
0
on: The Internal Model Principle: A Straightforward Explanation
This article is really approachable for someone like me who’s just getting acquainted with mathematical AI safety research, so I appreciate that! This definitely helped me better understand the IMP.
I have a question about this part in the “How is the controller ‘modelling’ the environment?” section:
If the joint system is represented by environment-controller pairs $(s, w)$ , then $γ^{+}$ being injective means that no two pairs (within $X^{+}$ ) will have the same environment value $s$ or controller value $w$ . This means that with appropriate re-labelling, each joint state can be indexed:
$(s_{1}, w_{1}), (s_{2}, w_{2}), (s_{3}, w_{3}), . . . etc.$
I don’t see how this follows. Couldn’t the shape of the joint states be more complicated such as forming a cycle or having multiple joint states evolve to the same joint state? Both these possibilities would break the indexing. Is there something I’m missing that implies this linear structure?
What links here?
- The Internal Model Principle: A Straightforward Explanation by Alfred Harwood (12 Apr 2025 10:58 UTC; 23 points)