I see another obstruction in attentionspan. I strongly suspect that whenever an LLM is tasked with writing the next token, attention mechanisms compress all potentially relevant information into less than a hundred thousand numbers, preventing the model from taking many nuances into account when writing the token. A human brain, on the other hand, takes into account billions of bits of information stored in neuron activations.
@TsviBT, we had a warning shot of LLMs becoming useful in research or writing a coherent short story in @Tomás B.’s experiment (the post on LW describing it was removed for an unknown reason).
I moved it into my drafts. I published it again for you. I figured it was unlikely to be referenced again and I tend to take stuff down I don’t want people reading as one of the first things on my author’s page.
I see another obstruction in attention span. I strongly suspect that whenever an LLM is tasked with writing the next token, attention mechanisms compress all potentially relevant information into less than a hundred thousand numbers, preventing the model from taking many nuances into account when writing the token. A human brain, on the other hand, takes into account billions of bits of information stored in neuron activations.
@TsviBT, we had a warning shot of LLMs becoming useful in research or writing a coherent short story in @Tomás B.’s experiment (the post on LW describing it was removed for an unknown reason).
I moved it into my drafts. I published it again for you. I figured it was unlikely to be referenced again and I tend to take stuff down I don’t want people reading as one of the first things on my author’s page.
(FWIW, I’ve referenced that post 2-4 times since it was posted)