XelaP comments on Explained Simply: Quantilizers

XelaP 1 Mar 2026 11:30 UTC
1 point
0

couldn’t a superintelligence generate thousands of non-substantive variations for an effective, dangerous action while electing not to do so for other actions?

I’m not sure what you mean. The model here is that you have a way to sample from a learned approximation to the distribution of what humans/experts/(whatever “safe” thing you trust does), and then the superintelligence picks uniformly from the top x% of those samples.

I don’t think shooting for a lower region would help much in practice—I expect most case to also have bad 85-85+p% actions.