I’m trying to find a similar construction that scales like z in the number of parameters required, without just scaling the number of layers up. I’d also be curious if it’s possible to avoid scaling parameters linearly with z, but it seems quite difficult.
Ok I have a neat construction for z=1, https://www.lesswrong.com/posts/g9uMJkcWj8jQDjybb/ping-pong-computation-in-superposition that works pretty well (T=D2d2 with D(1+2d) width and L+3 layers), and zero error. Note that D2d2 is exact here, not asymptotic.
I’m trying to find a similar construction that scales like z in the number of parameters required, without just scaling the number of layers up. I’d also be curious if it’s possible to avoid scaling parameters linearly with z, but it seems quite difficult.