This just came up on the feed of one of my classmates, who’s here on a Fulbright studying the formation of Tatar identity (what that means–still to come).


Can we please make up our minds on whether pan-Turkic culture is a thing? Looks like nope.

Tangentially relevant: I went to an interesting conference today: “3rd International Conference on Computer Processing of Turkic Languages.” It was my first time inside the Tatarstan Academy of Science as well as my first time crashing a party I was not registered for. It hurt a little to leave without a swag bag (they had DICTIONARIES in them, y’all!).

Even though the presentations were all by computer scientists that made me feel like a fish flopping on a beach, it was really useful to see the ways that people on the ground are actually doing the things that linguists spend all day talking about, namely, reassigning domains of use (even in very small ways). Here are some of the topics covered:

  • IT and the Tatar language: Status and prospects (Summary: You guys we should totally work together! Let’s, like, organize a conference or something!)
  • Evaluation of criteria for IT technology  in the case of the Wikimedia interface (How to create computer jargon that’s not borrowed from Russian/English)
  • The experience of creating linguistic software in Uzbekistan (No idea what this guy was on about, but he apparently thought the ceiling was SUUUUUPER cool)
  • Development of a Tatar keyboard for Android (Right on (see below)!)
  • The web as an instrument for the improvement of morphological analysis (flop…flop…flop…)

Basically, the Turkic languages (including most (??) of the languages of Central Asia – Kazakh, Kyrgyz, Uzbek, Uyghur, Tajik, Turkmen, plus many of the languages of native populations of Russia, especially Siberia (Tatar, Bashkir, Chuvash, Sakha/Yakut, Tuvan, Crimean Tatar) are fairly closely related and, being concentrated within the former Soviet republics, have shared a similar political and social trajectory. Thanks to that, they have a lot of the same problems as well as potential to build off each other and help each other out when answering questions like these:

  •  Hey guys, are we ever going to have a keyboard that actually lets us type in our own language? (Example: the very common Tatar letter ә does not exist on the Russian keyboard, so Tatar speakers and Russians alike replace it with e. Which is another letter with a slightly different sound and potentially totally different meanings. It’s also really problematic from a language revitalization standpoint if it’s impossible to correctly write )
  • How do we create corpora and tagging systems that are compatible with the morphology of Turkic languages*?
  • How do we make spell-check programs that are compatible with the morphology of Turkic languages*?
  • How do we create native neologisms for technology terms and ensure that they are viable? (For example – How do you say “Computer” in German? der Computer. Did they come up with a more German-sounding equivalent? Yes: die Rechenmachine. Did it catch on? No. On the other hand, take “link.” In German? der Link. In Russian? Ssylka. What made people accept Ssylka and reject Rechenmachine, and how can we predict that?)
  • Why are online dictionaries sooooooo bad?

* Sorry, I know that phrasing is unnecessarily opaque. What I believe they were mostly referring to is (are?)  the properties of agglutination and vowel harmony. Basically, agglutination means that you have a word root, and then you snap suffixes onto it Lego-style to achieve different functions (pluralization, question marking, tense, subjunctive, etc.). The opposite of an agglutinating language is an isolating language (example: Vietnamese); in those languages, every word serves exactly one function (for instance, Spanish “corrió” means “I ran”, and it fulfills three functions with one word: the meaning “to run,” past tense, and first person singular. In a pure isolating language you would say something like “I run previously” to create the same meaning). English falls somewhere in between, leaning towards isolating. We do use affixes, but typically not more than one or two at a time; instead, we either add more words, or we change the root of the word (go – went – has gone- would have gone – has been going). Isolating languages are super compatible with spell check, because all the computer has to do is say “Is this form in the dictionary?” and then mark any time the answer is no (note: I am deeply, deeply sorry to any computer scientists who are reading this for the way I’m butchering your field. If I knew you were here, I probably wouldn’t be writing this post). In the case of agglutinating languages, however, it’s not so simple, as the number of “words” that can be derived through suffixation is effectively infinite.

Vowel harmony complicates this. Here’s how it works: all vowels are pronounced somewhere in your mouth (we’re starting small here), and that’s how we categorize them. The Turkic languages have two classes of vowels: front (the vowel sounds in “eat”, “apple” and “end”, for instance) and back (“ought”, “up”, “poodle”). You cannot mix front and back vowels within the same word (unless it’s a loanword – for instance, “kitap” means “book” and is from Arabic). That’s why Turkish sometimes looks so crazy with all those umlauts–it’s because if one vowel in a word is fronted, they all have to be, and so you wind up repeating the same vowels lots of times in close proximity. For instance:

Azat  and Zile (Zee-LEH) are proper names. 

muh and –meh are suffixes that indicate a question.

So if I want to ask, “Is that Azat?” I would need to add the question suffix to Azat’s name, using the back-vowel form: “Azat=muh?” If, however, I were asking about Zile, I would use front vowels: “Zile=meh?” (I added the equal signs to improve readability for us poor Westerners–the words should really be written without them.)

The difficulty for the computer is already apparent, but let’s keep going.

Now I want to ask “Does this belong to Azat/Zile?” To form the possessive, we need to add “-nehkeh-” or “-nuhkuh-.” But we still need our question suffix.

Azat=nuhkuh=muh?” “Zile=nehkeh=meh?”

Making that morphological system play nicely with computers that were designed for/by/in other languages appears to be the thing that keeps Turkologists up at night. On the other hand, vowel harmony is a poet’s Candy Land. You win some, you lose some.

(Tired of reading? Listen to the first few seconds of this video to hear the effect of vowel harmony in Kyshka kich/Кышка кич (Winter Evening) by Gabdullah Tukay, the national poet of Tatarstan!)