Donald Hobson comments on What’s up with LLMs representing XORs of arbitrary features?

Donald Hobson 6 Jan 2024 19:42 UTC
LW: 4 AF: 2
0
AF
Take ${0, 1}^{n}$ this set contains exponentially many points. Is there Any function $f : {0, 1}^{n} \to R^{p o l y n}$ such that all exponentially many xor combos can be found by a linear probe?
This is a question of pure maths, it involves no neural networks. And I think it would be highly informative.
- Olli Järviniemi 7 Jan 2024 23:14 UTC
  1 point
  0
  Parent
  FWIW, I thought about this for an hour and didn’t really get anywhere. This is totally not my field though, so I might be missing something easy or well-known.
  My thoughts on the subject are below in case someone is interested.
  Problem statement:
  “Is it possible to embed the $2^{n}$ points ${0, 1}^{n}$ to $R^{poly (n)}$ (i.e. construct a function $f : {0, 1}^{n} \to R^{poly (n)}$ ) such that, for any XOR-subset $S$ of ${0, 1}^{n}$ , the set $f (S)$ and its complement $f (S^{c})$ are linearly separable (i.e. there is a hyperplane separating them)?”
  Here XOR-subset refers to a set of form
  $S = {x \in {0, 1}^{n} | x_{i_{1}} \oplus x_{i_{2}} \oplus \dots x_{i_{k}} = 0}$ ,
  where $k$ is a positive integer and $i_{j} \in {1, \dots, n}$ are indices.
  Observations I made:
  - One can embed ${0, 1}^{2}$ to $R ²$ in the desired way: take a triangle and a point from within it.
  - One cannot embed ${0, 1}^{n}$ to $R^{n - 1}$ .
    (Proof: consider the $n$ XOR-subsets where $k = 1$ . The corresponding hyperplanes divide $R^{n - 1}$ to at most $2^{n} - 1$ regions; hence some two points are embedded in the same region. This is a contradiction.)
  - If you have $m$ points in $R^{d}$ , only at most $m^{d + 1}$ of the exponentially many subsets of the $m$ points can be linearly separated from their complement.
    (Proof: If a subset can be separated, it can also be separated by a hyperplane parallel to a hyperplane formed by some $d$ of our points. This limits our hyperplanes to $(\frac{m}{d}) \approx m^{d}$ orientations. Any orientation gives you at most $m$ subsets.)
    This doesn’t really help: the number of XOR-subsets is linear in the number of points. In other words, we are looking at only very few specific subsets, and general observations about all subsets are too crude.
  - XOR-subsets have the property “if $A$ and $B$ are XOR-subsets, then $(A \cap B) \cup (A^{c} \cap B^{c})$ is too” (the “XOR of XOR-subsets is a XOR-subset”)
    This has a nice geometric interpretation: if you draw the separating hyperplanes, which divide the space into 4 regions, the points landing in the “diagonally opposite” regions form a XOR-subset.
    This feels like, informally, that you need to use a new dimension for the XOR, meaning that you need a lot of dimensions.
  I currently lean towards the answer to the problem being “no”, though I’m quite uncertain. (The connection to XORs of features is also a bit unclear to me, as I have the same confusion as DanielVarga in another comment.)
  - Donald Hobson 8 Jan 2024 23:28 UTC
    2 points
    0
    Parent
    The connection to features is that if the answer is no, there is no possible way the network could have arbitrary X-or combos of features that are linearly represented. It must be only representing some small subset of them. (probably the xor’s of 2 or 3 features, but not 100 features.)
    Also, your maths description of the question matches what I was trying to express.