JustinSheek

Karma: 31

JustinSheek 1 Mar 2026 3:03 UTC
2 points
0
on: Whack-a-Mole is Not a Winnable Game
Thanks for this. Yet it’s odd to see a post about mechanism design with nary a mention of mechanism design.

JustinSheek 23 Feb 2026 19:50 UTC
1 point
0
in reply to: Cole Wyeth’s comment on: “We are confused about agency”
My apologies then, I don’t know how to (compactly) improve your understanding other than to point at my priors more vigorously.
[Bowing out.]

JustinSheek 20 Feb 2026 8:36 UTC
1 point
0
in reply to: Cole Wyeth’s comment on: “We are confused about agency”
Non sequitur? I did not expect that statement to prove “there is a clean theory of bounded agency.” It’s a terse illustration of why correspondence principles aren’t physics-magic. I am trying to explain why I believe we are disagreeing on priors here. It appeared to me that you believe elegant grand unified theories only appear in physics a priori.
https://www.fourmilab.ch/etexts/einstein/specrel/www/
Namely $§$ and $§$ .

I think that we are in the situation where we need to understand something (agency; metaethics) better to achieve the desired effect (safe superintelligence). I think the amount of scientific work to be done on the former far eclipses the engineering work of the latter, to the point that it’s disingenuous to call the scientific work a subproblem. So it is more precise to say that I agree with you on the factual matter, but am dismayed by your rhetorical framing.

JustinSheek 18 Feb 2026 3:56 UTC
5 points
0
in reply to: Cole Wyeth’s comment on: “We are confused about agency”
Flaws in the EU maximization paradigm do not necessarily suggest that there is a better paradigm waiting to be found, because unlike in physics, I see no a priori reason to expect an elegant grand unified theory for bounded rationality.
Disagreed. In my experience, flaws in [X] paradigm usually suggests there is a better paradigm [Y] waiting to be found. Correspondence principles wherein the domain of validity of [X] is subsumed by [Y] are not limited to physics. Specifically, a comprehensive theory of bounded agency (say, with bound b) should recover a theory of non-bounded agency in the limit , otherwise it’s probably incorrect!
On the object level, some of Einstein’s gedankenexperiments on clock synchronization are pretty hard to distinguish from “a philosopher specializing in metaphysics had invented special relativity by reasoning from first principles about time”.
More importantly, AI safety is fundamentally an engineering problem, not a scientific problem.
Disagreed. If it’s an engineering problem, then I would expect that we must already deeply understand all the relevant principles—so that it’s just a matter of exchanging cost vs. safety, i.e. how much do we want to overengineer this? That does not characterize my epistemic state on AI safety in the least. In the moon landing analogy, I think AI safety research is arguably about at the level of the Tsiolkovsky rocket equation.

JustinSheek 12 Feb 2026 23:17 UTC
2 points
0
in reply to: Wuschel Schulz’s comment on: Monitor Jailbreaking: Evading Chain-of-Thought Monitoring Without Encoded Reasoning
Ok point taken. I’m glad to see your research—it’s clearly presented and seems reproducible. I still think the above-quoted implication is broadly misleading. Based on your comments, I’m satisfied that we’re in disagreement about what part of the attack surface is important/worrisome.

[I absolve you of the tacit duty to respond to this.]

JustinSheek 11 Feb 2026 22:33 UTC
1 point
0
in reply to: ToasterLightning’s comment on: Designing Prediction Markets
If “price=probability”, then changing the pricing curve is equivalent to changing how the AMM updates its probability estimates (on evidence of buy/sell orders).

JustinSheek 11 Feb 2026 22:09 UTC
1 point
0
in reply to: ToasterLightning’s comment on: Designing Prediction Markets
I will try to dig up some references for you. Sorry it really was a small side project and has been several years.

Ah so I can’t imagine a probability function for that market that isn’t $\frac{y}{y + n}$ . $\frac{y^{2}}{y^{2} + n^{2}}$ is a fine pricing function that doesn’t appear to adhere to the rules of probability theory. If I try to compose two $\frac{y^{2}}{y^{2} + n^{2}}$ markets, one conditional on the other, then can I multiply their prices to find the joint probability? Does this violate “price=probability”?

”price=probability is a general rule for prediction markets” is a very interesting claim. Seems obvious, but then you have to ask yourself whose probability?
I’m familiar with some of those operations (I’ve skimmed the Boyd paper). Unfortunately, there are a lot of different ways of expressing the same constraints, so I can’t immediately tell whether Manifold’s implementation is equivalent to what I had imagined.
Thanks for your answers, I’ll look into some of the other ideas you referenced.

JustinSheek 11 Feb 2026 21:34 UTC
5 points
1
on: Monitor Jailbreaking: Evading Chain-of-Thought Monitoring Without Encoded Reasoning
“These results cast doubt on the worry that optimization pressure on the CoT would lead to encoded reasoning.”

This interpretation of your results doesn’t hold up for me. That “less sophisticated strategies, like jailbreaking the monitor” are found first, seems more like a property of the monitor than the optimization pressure generally. If you use a weak monitor, then you will tend to see a weak strategy (e.g. jailbreak the monitor) emerge. If you use a human monitor, you should expect to see a strategy that defeats human monitoring (e.g. steganography) emerge. If you use a superhuman monitor, you should expect to see an even more sophisticated strategy emerge.

Why should I be assuaged that encoded reasoning didn’t emerge when the monitor itself applied little optimization pressure in that direction?

JustinSheek 11 Feb 2026 18:59 UTC
3 points
1
on: Designing Prediction Markets
I’ve seen similar derivations before, but it’s been a few years since I looked at AMMs in detail. I’ve spent some time recently looking at mechanism design for prediction markets, so this is a timely reminder!

Three questions—
Would you agree that this captures your main conclusion for a binary prediction market:
CPMM $⟺$ “price = probability”?

I seem to recall that CPMMs easily generalize to multiple assets. Instead of $a b = a^{'} b^{'}$ you have $a b c = a^{'} b^{'} c^{'}$ and so on. Do you happen to know if that matches the generalization of your prediction market with binary outcome to one with categorical outcome?

You mention in your 2nd footnote that there are “different probability functions matching these desiderata”. Care to say more on this?

JustinSheek 13 Jan 2026 1:53 UTC
3 points
0
in reply to: johnswentworth’s comment on: KPD is a weak obstruction
Your notation is clear to me! It can be shown that:
$D_{f} (p^{⋆}, q) = \int d x q (x) f^{⋄} (f^{⋄ #} (⟨ λ, T (x) ⟩))$
Even though $f^{⋄}$ and $f^{⋄ #}$ are both convex, their composition is not necessarily convex. Sorry, I don’t have any clear counterexamples at the ready. (The Hessian determinants all vanish in the graphene/bit-string model.) It’s likely that I’ve configured my numerical solvers badly for this example.
As far as binding to reality:
1. Power laws are ubiquitous in nature. Exponential family doesn’t admit power laws; f-div does. (It admits many other important non-exponential distributions as well—see the B-E example above.)
2. Symmetries in your data that are weaker than—or just different from—independence are ubiquitous in nature. Exponential family is only congruent with independence; not so, f-div.
3. KL / Gibbs entropy is inadequate for many observable systems. This typically appears under the heading nonextensive entropy (Gell-Mann & Tsallis w/ contributions by Touchette and others) or Tsallis entropy. This paper is far enough outside my field that I can only loosely validate it, but it’s an attempt to explain dark energy by way of applying observational data to determine Bayes factors of different nonextensive entropy models of our cosmological horizon.
4. I consider Jensen’s, DPI, and coarse-graining monotonicity to be pretty important desiderata (see Nielsen). f-div is the biggest family that supports these, uniquely determined.
5. [I could write several more posts worth of material]

JustinSheek 7 Jan 2026 4:45 UTC
1 point
0
in reply to: JenniferRM’s comment on: KPD is a weak obstruction
Zero-math-just-words summary:
So the natural world seems to be full of this symmetry thing, which is great because I suspect otherwise we wouldn’t have any hope of making a good map of it. Here’s some math that makes constructive epistemic claims about a particularly simple example—permutation symmetry—that people have been generally confused about for 90 years. Then here’s how that idea would testably interact with a thought experiment that you could almost pull off in a lab (BYO magic box).

Remember T is a program that’s supposed to emit all the computable parts of D that are relevant for doing inference on A. That’s P(A|D) = P(A|T(D)). (Naturally, the identity function on D does a perfectly fine job of preserving the relevant parts of D, so to be non-trivial you really want T to compress D in some way.) Here $π$ is some permutation of D (this doesn’t compress D). T(D) = T( $π$ (D)) just says that T doesn’t care about the ordering of D. Then you combine these facts to resolve that inference on A also doesn’t care about the ordering of D. That’s P(A|D) = P(A| $π$ (D)).

Negative example:
$A_{m e o w}$ = “this sensory input contains a cat”
D = “the pixels of the sensory input”
T = “AlexNet without the final softmax layer”

Observe $π$ destroys inference for $A_{m e o w}$ . P( $A_{m e o w}$ |D) $\neq$ P( $A_{m e o w}$ | $π$ (D)). Classification of natural images isn’t invariant to arbitrary reordering of pixels. AlexNet isn’t invariant to reordering of pixels because gradient descent would rapidly exit that region of weight space. T(D) $\neq$ T( $π$ (D)).

Perhaps there are some other (maybe approximate) symmetries ( $σ$ ) like reflections, translations, blurring, recoloring, rescaling. P( $A_{m e o w}$ |D) $\approx$ P( $A_{m e o w}$ | $σ$ (D)). AlexNet might be in or near those regions of weight space. T(D) $\approx$ T( $σ$ (D)). This should also apply to your cortex or aliens or whatever has a notion of a cat!

I would be very surprised if this didn’t connect to natural latents. It connects to universality. I don’t think it connects directly to infrabayes but they might be compatible. It also connects to SLT, but for reasons that are beyond the scope of this post.

JustinSheek 24 Dec 2025 23:29 UTC
4 points
0
on: Conditional On Long-Range Signal, Ising Still Factors Locally
In the Ising model, the inability—through local fluctuations, well below $T_{c}$ -- for the coarse-grained state $⟨ σ ⟩ = - 1$ to transition to/from $⟨ σ ⟩ = 1$ is an example of superselection. Unfortunately, I can’t find a good primer on the subject, but here’s something: https://physics.stackexchange.com/a/56570

To be able to take your desired step, reasoning from local->global, you also have to make additional measurements to rule out non-uniform temperature and non-uniform magnetic fields.

KPD is a weak obstruction

JustinSheek19 Nov 2025 0:34 UTC

13 points

4 comments13 min readLW link