Alexander Gietelink Oldenziel comments on A single principle related to many Alignment subproblems?

Alexander Gietelink Oldenziel 6 Jun 2025 11:23 UTC
3 points
0
Circling back to this. I’m interested in your thoughts.

I think the Algorithmic Statistics framework [including the K-structure function] is a good fit for what you want here in 2.

to recall the central idea is that any object is ultimately just a binary string $s$ that we encode through a two-part code: a code $c$ encoding a finite set of strings $A_{c}$ such that $s \in A_{c}$ with a pointer to $s$
within $A_{c}$ .
For example $s$ could encode a dataset while $c$ would encode the $ϵ -$ typical data strings for a given model probability distribution in a set of hypotheses for some small $ϵ > 0$ . This is a way to talk completely deterministically about (probabilistic model), e.g. like a LLM trained in a transformer architecture.
This framework is flexible enough to describe two codes $c_{1}, c_{2}$ encoding $A_{c_{1}}, A_{c_{2}}$ such that $x \in A_{c_{1}}$
and $x \in A_{c_{2}}$ as required. One can e.g. easily find simple examples of this using mixtures of gaussians.
I’d be curious what you think!
- Q Home 10 Jun 2025 3:40 UTC
  1 point
  0
  Parent
  Got around to interrogating Gemini for a bit.
  Seems like KSF talks about programs generating sets. It doesn’t say anything about the internal structure of the programs (but that’s where the objects such as “real diamonds” live). So let’s say $s$ is a very long video about dogs doing various things. If I apply KSF, I get programs (aka “codes”) generating sets of videos. But it doesn’t help me identify “the most dog-like thing” inside each program. For example, one of the programs might be an atomic model of physics, where “the most dog-like things” are stable clouds of atoms. But KSF doesn’t help me find those clouds. A similarity metric between videos doesn’t help either.
  My conceptual solution to the above problem, proposed in the post: if you have a simple program with special internal structure describing simple statistical properties of “dog-shaped pixels” (such program is guaranteed to exist), there also exists a program with very similar internal structure describing “valuable physical objects causing dog-shaped pixels” (if such program doesn’t exist, then “valuable physical objects causing dog-shaped pixels” don’t exist either).^[1] Finding “the most dog-like things” in such program is trivial. Therefore, we should be able to solve ontology identification by heavily restricting the internal structure of programs (to structures which look similar to simple statistical patterns in sensory data).
  So, to formalize my “conceptual solution” we need models which are visually/structurally/spatially/dynamically similar to the sensory data they model. I asked Gemini about it, multiple times, with Deep Research. The only interesting reference Gemini found is Agent-based models (AFAIU, “agents” just means “any objects governed by rules”).
  1. ^
    This is not obvious, requires analyzing basic properties of human values.