Why the Architecture of LLMs Makes Them Bad at Deep Thinking: They’re Too Wide
GPT-3 is 96 layers deep (where each layer is only a few “operations”), but 49,152 “neurons” wide at the widest. This is an insanely wide, very shallow network. This is for good reasons: wide networks are easier to run efficiently on GPUs, and apparently deep networks are hard to train.
I don’t find this argument compelling, because the human brain is much wider and possibly shallower than GPT-3. Humans have a conscious reaction time of about 200 milliseconds, while neurons take about 1ms to influence their neighbors, meaning an upper bound on the depth of a conscious reaction is 200 neurons.
For Linux users on US Keyboards, you might want to try making Caps Lock the multi key (also called the compose key). On Cinnamon this can be done by going to Keyboard > Layouts > Options… > Position of Compose key, and other desktop environments probably have similar settings.
This lets me type umlauts (ä, ü, ö), foreign currencies (£, €, ¥,), copyright/trademark (©, ™), and a bunch of other stuff. For example, “ü” is made by typing Compose, u, and ” in sequence. I also added the line
to my ~/.XCompose file so that I can type λ efficiently; this is useful when writing Lisp code.