TLW comments on Musings on the Speed Prior

TLW 2 Mar 2022 6:47 UTC
LW: 5 AF: 3
AF
To start with, note that if you push your speed bias far enough (e.g. a strong enough circuit depth complexity or Turing machine time complexity penalty), you just get a lookup table that memorizes everything.
This is true in the TM model^[1]. This is not true in the circuit-depth complexity model. Remember that an arbitrary lookup table is O(log n) circuit depth. If my function I’m trying to memorize is f(x) = (x & 1), the fastest circuit is O(1), whereas a lookup table is O(log n).
(This gets even worse in models where lookup is $O (n^{1 / 3})$ ^[2] or $O (n^{1 / 2})$ ^[3])
I humbly suggest a variant family of priors: realizable-speed^[4] priors. That is:
1. Pick a particular physics model
  1. This could be even things like “Conway’s Game of Life”.
2. Don’t generally worry too much about constant factors^[5].
3. The algorithm that gives an answer in the least time is the most probable.
A simplish example of this prior might be the following: assume a simple 3d circuit model where gates take O(1) time and space and wires take O(length) time and space, and neither gates nor wires can overlap.
This prior discourages giant lookup tables even in the limiting case, while retaining many of the advantages of speed priors.
(It does have the interesting issue that it says nothing about what happens outside the lightcone of the computation...)
(I realize here that I’m being very sloppy with asymptotic notation. I’m bouncing between bits / gates / elements / etc.)
1. ^
  It also isn’t true in the TM model variant where you include the depth of the FSM decode circuit instead of treating a single TM step as constant-time.
2. ^
  Volume accessible within time t scales as $t^{3}$
3. ^
  Assuming power/heat scales linearly with #gates^[6], max gates within a radius scales with $r^{2}$ and so you get $t^{2}$ instead. Ditto, eventually you run into e.g. Bekenstein bound issues, although this isn’t a particularly realistic concern.
4. ^
  This is a terrible name. I am fully open to better ones.
5. ^
  Which is a terrible idea given that galactic algorithms are a thing.
6. ^
  Assuming it scales superlinearly you scale even worse...
- evhub 2 Mar 2022 10:18 UTC
  LW: 4 AF: 2
  AF Parent
  
  This is not true in the circuit-depth complexity model. Remember that an arbitrary lookup table is O(log n) circuit depth. If my function I’m trying to memorize is f(x) = (x & 1), the fastest circuit is O(1), whereas a lookup table is O(log n).
  
  Certainly, I’m assuming that the intended function is not in O(log n), though I think that’s a very mild assumption for any realistic task.
  
  I think the prior you’re suggesting is basically a circuit size prior. How do you think it differs from that?
  - TLW 3 Mar 2022 5:16 UTC
    LW: 5 AF: 3
    AF Parent
    Certainly, I’m assuming that the intended function is not in O(log n), though I think that’s a very mild assumption for any realistic task.
    In $t$ time, the brain (or any realistic agent) can do $O (t)$ processing… but receives $O (t)$ sensory data.
    I think the prior you’re suggesting is basically a circuit size prior. How do you think it differs from that?
    Realizable-speed priors are certainly correlated with circuit size priors to some extent, but there are some important differences:
    The naive circuit size prior assumes gates take O(1) space and wiring takes zero space, and favors circuits that take less space.
    There are more complex circuit size priors that e.g. assign O(1) space to a gate and O(length) space to wiring.
    The $O (l o g (n))$ variant of the realizable-speed prior has no simple analog in the circuit size prior, but roughly corresponds to the circuit-depth prior.
    The $O (n^{1 / 2})$ variant of the realizable-speed prior has no simple analog in the circuit size prior.
    The $O (n^{1 / 3})$ variant of the realizable-speed prior roughly corresponds to the complex circuit-size prior described above, with differences described below.
    Circuit size priors ignore the effects of routing congestion.
    A circuit-size prior will prefer one complex circuit of N-1 gates over two simpler circuits of N/2 gates.
    A realizable-speed prior will tend to prefer the two simpler circuits, as, essentially, they are easier to route (read: lower overall latency due to shorter wiring)
    My (untested) intuition here is that a realizable-speed prior will be better at structuring and decomposing problems than a circuit-size prior, as a result.
    Circuit size priors prefer deeper circuits than realizable-speed priors.
    A circuit-size prior will prefer a circuit of max depth 10D and N gates over a circuit of max depth D and N-1 gates.
    A realizable-speed prior will (typically) prefer the slightly larger far shallower circuit.
    Note that the human brain is surprisingly shallow, when you consider speed of neuron activation versus human speed of response. But also very wide...