[Question] How could natural language look like if it hadn’t evolved from speech?

Written language is a linear progression of symbols—in other words, a function from some “time” to a finite alphabet. This fact is a direct result of prehistoric humans primarily communicating by fluctuating air pressure, which one can model as a real-valued function of time.

So suppose it wasn’t that way—i.e. imagine aliens communicating by displaying pictures on their chests or projecting holograms. How might their “language” look like?

I imagine two things (and ask for more ideas in the comments):

  • In a 2D (or 3D) structure, it will be much easier to refer to previously introduced concepts by arranging them closer to each other—the language would look more like a directed acyclic graph than a linear progression of symbols.

  • It is easier to refer to quantities visually (human languages typically have lots of imprecise words for quantities).

Math already uses multidimensional “languages” in some places—e.g. categorical diagrams or tensor networks. Of course, engineers and architects often display their thoughts graphically as well. But as of now, all of these systems are special-purpose. So how might a general artificial language unconstrained by being defined as a one-dimensional function look like?

I think AI/​ML systems might have a much easier time processing such a language (which might be an internal representation of knowledge) because coreference resolution can become much easier or trivial. To elaborate:

For many years, AI researchers have devoted much attention to processing and generating natural language—partly because that’s how an AI can be useful to humans, partly in the hope that a sufficiently advanced language model itself becomes indistinguishable from intelligence.

A basic problem in natural language processing is known as coreference resolution: Understanding which expressions in a text refer to the same concepts, and having the NLP system pay “attention” to them at the right times. This task is nontrivial in natural languages because here, a word has only two direct neighbours—and if one wants to connect more than two concepts semantically, one needs nontrivial grammar rules that have mostly stumped AIs until about a few years ago, and that AIs still get wrong much of the time (consider e. g. Winograd Schemas).

Starting with the original attention mechanisms in NLP (see e.g. here), AI researchers have developed a plethora of tricks to increase the timeframe and accuracy over which models can resolve coreferences (Longformer, Compressive Transformer, Reformer...).

But now imagine an AI architecture using an internal “language” to generate and analyze “intermediate” thoughts—possibly involving a population of agents co-evolving the language. Then the individual neural networks might be substantially unburdened by allowing the language to evolve in a medium that is not just one-dimensional (as “time” or “position” would be). In the extreme case, allowing arbitrary connections between semantic units would make coreference resolution trivial.