I work on NLP AI systems and have spent a lot of the past decade working on developing training data, so I have a degree of expertise here.
There are a lot of things that go into how hard a language is to learn. How close the spelling is to pronunciation is one of them, but not the dominant one for alphabetic languages (grammatical gender is another, which you don’t mention).
But you are totally correct that learning a character-based system is much, much harder than learning an alphabetic (or syllabary) language. I think it’s a huge disadvantage for e.g. China, where kids have to devote years of study and memorization to be literate, many more than in alphabetic languages. I would go further and say that they are forced to have their whole school system revolve around immense amounts of rote memorization, which I think could lead to the kinds of relative lack of creativity that the Chinese school systems are sometimes criticized for. I don’t think you can learn written Chinese without massive amounts of memorization, leaving little room for other ways of thinking for a decade or more.
Of languages that got multiple votes, Catalan and Spanish (and Esperanto) rated as very easy, and written Chinese (and especially literary Sinitic which is old written Chinese) rated very hard.
The linguist survey isn’t a great source since it’s significantly biased towards languages that are easy for *English speakers* to learn, which is quite different from languages that are easy for children to learn as a first language. A lot of the languages that are at the beginning of the list are Romance and Germanic languages, and other languages that are morphosyntactically similar to English (eg. most of the “1” languages don’t have noun classes, like you mentioned).
Do you think there are significant things other than how phonetic/whether it’s logogramatic that make a writing system significantly easier or harder to learn/use?
In terms of language difficulty more generally, what do you think are the most important factors which determine difficulty?
I think characters via alphabets dwarfs everything else for written language, but here are a few other factors:
how phonetic it is if you want to learn the writing system.
how regular it is—English is full of exceptions (because of its 3 language family history). Spanish is very regular, Russian also has a ton of exceptions.
how complicated the morphology is (Finnish is tough because of the super complex morphology, see also Russian). Mandarin is very easy on this dimension.
whether there’s a phoneme distinction that you didn’t learn as a child—so for a Japanese speaker, l vs r is hard in English (“Engrish”), and for an English speaker, o versus ō is hard in Japanese.
In general, of course, the more similar it is to your native tongue, the easier it is. Tones are hard to learn if you don’t speak a tonal language, but if you do then they are super intuitive. Similar with lots of morphology, fixed versus fluid word orderings, etc.
The other angle is spoken versus heard. Portuguese (especially) and French are much easier to speak than understand because of the various ways that sounds are elided or mushed together with fluent speakers. So you can get basic sentences out before you can understand something at full speed—generally true but much more so for some languages.
I work on NLP AI systems and have spent a lot of the past decade working on developing training data, so I have a degree of expertise here.
There are a lot of things that go into how hard a language is to learn. How close the spelling is to pronunciation is one of them, but not the dominant one for alphabetic languages (grammatical gender is another, which you don’t mention).
But you are totally correct that learning a character-based system is much, much harder than learning an alphabetic (or syllabary) language. I think it’s a huge disadvantage for e.g. China, where kids have to devote years of study and memorization to be literate, many more than in alphabetic languages. I would go further and say that they are forced to have their whole school system revolve around immense amounts of rote memorization, which I think could lead to the kinds of relative lack of creativity that the Chinese school systems are sometimes criticized for. I don’t think you can learn written Chinese without massive amounts of memorization, leaving little room for other ways of thinking for a decade or more.
Further reading:
Why Chinese is so damn hard
Difficult and easy languages (Chinese is easy to learn to speak, super hard to learn to read/write)
Linguist survey data on easy/hard languages
Of languages that got multiple votes, Catalan and Spanish (and Esperanto) rated as very easy, and written Chinese (and especially literary Sinitic which is old written Chinese) rated very hard.
The linguist survey isn’t a great source since it’s significantly biased towards languages that are easy for *English speakers* to learn, which is quite different from languages that are easy for children to learn as a first language. A lot of the languages that are at the beginning of the list are Romance and Germanic languages, and other languages that are morphosyntactically similar to English (eg. most of the “1” languages don’t have noun classes, like you mentioned).
I think that’s a fair criticism.
Two questions:
Do you think there are significant things other than how phonetic/whether it’s logogramatic that make a writing system significantly easier or harder to learn/use?
In terms of language difficulty more generally, what do you think are the most important factors which determine difficulty?
I think characters via alphabets dwarfs everything else for written language, but here are a few other factors:
how phonetic it is if you want to learn the writing system.
how regular it is—English is full of exceptions (because of its 3 language family history). Spanish is very regular, Russian also has a ton of exceptions.
how complicated the morphology is (Finnish is tough because of the super complex morphology, see also Russian). Mandarin is very easy on this dimension.
whether there’s a phoneme distinction that you didn’t learn as a child—so for a Japanese speaker, l vs r is hard in English (“Engrish”), and for an English speaker, o versus ō is hard in Japanese.
In general, of course, the more similar it is to your native tongue, the easier it is. Tones are hard to learn if you don’t speak a tonal language, but if you do then they are super intuitive. Similar with lots of morphology, fixed versus fluid word orderings, etc.
The other angle is spoken versus heard. Portuguese (especially) and French are much easier to speak than understand because of the various ways that sounds are elided or mushed together with fluent speakers. So you can get basic sentences out before you can understand something at full speed—generally true but much more so for some languages.