Index for Spices in Thai and Lao
The Thai Script (akson thai [อักษรไทย]) is (almost) exclusively used to write the Thai Language (phasa thai [ภาษาไทย]), which itself is split into several differring dialects. Thai language is part of the Kradai Language Family (also known as Kadai, Thai–Kadai or Daic) and not related to the majority of South–East Asian languages. It is, thouh, closely related to Lao (phasa lao [ພາສາລາວ]) spoken in neighbouring Laos. These two languages are so similar that they form a dialect continuum and are fairly inter-intellegible. Also, the Lao Script (akson lao [ອັກສອນລາວ]) is so similar to Thai Script that I found it convenient to include both in this index.
Thai Script belongs to the Indic Script Family, although it shows heavy modifications from the Indian original to account for the many more vowels and also for the phonemic tones of Thai. Like all Indisc Scripts, it consists of syllabic consonants letters that ar combined with vowel signs; the vowel signs may appear left, right, above or beow the consonant they are attached to. A rather unique feature of Thai script (sometimes also found in Lao) is that those diacritics going to the top of a consonant arrange in two different vertical levels: The tone marks reside at a higher position than the vowel signs, even if none of the latter are present (not all fonts show this behaviour, although they should).
Introduction
inaspirate unvoiced
aspirate voiced
inaspirate voiced
aspirate nasal
Thai orthography is, like that of most South East Asian languages using Indic scripts, very complicated and full of irregularities. In principle, Thai Script is a member of the Indic Script Family; yet because of largely different language structure, it became nessesary to bend the Indic script principles to their extreme. That means that the writing system had to be massively reworked in order to become fit with Thai, abandoning most of the Indic core featured. At the same time, for cultural reasons, the script still hat to maintain some compatibility to Indic Languges, chiefly Pali and Sanskrit, to ensure easy access to Indian Buddhist writings. Thus, the script has to serve two very different languages, and this means it is asking for trouble.
Indic languages have many consonants (shown on the right side for later reference), few vowels and no tones; Thai has few consonants, a large number of vowels (some of which are complex), and five phonemic tones. While Sanskrit allows word size to grow almost indefinitely, Thai’s core vocabulary is mostly monosyllabic (polysyllabic Thai words exist, but are typically Sanskrit loans); it is essential for the reader to parse the text syllable by syllable, so onset and coda consonants must be easily distinguishable. Many consonants phones are restricted to the syllable onset (or, put differently, in final position many phonemic differences become neutralized). Clearly, it is not easy to have a script catering to both sets of requirements; but Thai does so.
Consonants
The fundamentally Indian stucture of the script is well visible among the consonants. Sanskrit has five series of obstruents (velar, palatal [usually analyzed as an affricate instead of a stop, but I’ll ignore that], retroflex, dental and labial), and all of them have survived in Thai Script. As Thai has no retroflexes, the pronounciaton of the retroflexes and the dentals are merged, and the former appear only in Indic loanwords. Thai has hardly any voiced stops, and thus the two voiced series of Sanskrit code for voiceless aspirated sounds in Thai. Therefore, Thai has three unvoced aspirated and one voiceless inaspirated series.
voiceless
inaspirate
(k,c,ṭ,t,p) ⬊ Indian
voiceless
aspirate
⬋ (kh,ch,ṭh,th,ph) Indian
voiced
inaspirate
⬋ (g,j,ḍ,d,b) Indian
voiced
aspirate
(gh,jh, etc) Indian
nasal
nasal
Thai velar
Thai palatal/
Thai dental
Thai dental
Thai labial
The three voiceless aspirate series are, however, not fully equivalent: The Thai system distinguishes three different consonant classes, which are indicated by colour in the above table. The low class (white) corresponds to voiced Sanskrit sounds, the high class (orange) contains those letters which in Sanskrit denote voiceless aspirates or fricatives/
As a consequence, the few voiced plosives of Thai are not written with their Sanskrit equivalents, but rather with new signs derived from the voiceless inaspirates. There are some more derived letters (in satellites to columns two and three) which mostly mean fricatives (the velar ones are obsolete, and I am not sure about their sound value).
The Indic alphabet has an appendix of 8 approximants (or similar) and sibilants; these appear in exactly the same way also in Thai, but the three sibilants have phonetically merged into one (as in many Indic languages), and they are pronounced as plosives in syllable-final position. The last three letters are non-Sanskrit additions. By far the the most common of those is the glottal stop which is used for two rather different purposes: It forms the syllable onset for those syllables starting with a vowel (these really begin with a glottal stop, as in many versions of English), and inside the syllable it appears in various vowel sequences of varying length, where it usually contributes an O-like sound.
The table on the right side summarises the consonant letter system in a way that stresses the the underlying derivation from Sanskrit. Consonant classes are coded by colour, and each entry gives also the conventional Thai name in the spelling adopted by the Unicode Standard. Thai letters are named acrophonically, and thus their names have two parts: The first is phonetic (letter sound plus O), and the second simply is a selected word from the language featuring the consonant in question. The name Cho Chang thus must be understood as
Thai lacks an established transliteration scheme. The only standard in existence is ISO 11940 which suffers from multiple deficiences and is hardly used at all; I will not even introduce its character set it here. In all real-world examples, Thai is romanized in a more or less phonetic way which loosely resemble the pronunciation hints given in the table. Thus, both cho chan and cho chang are rendered CH in the onset (ignoring the difference in aspiration), and T in the coda. This aids pronunciation, but the native spelling cannot be reconstructed from such a romanization.
I therefore resort to a home-brew transliteration scheme which is based on the Indic model and therefore rather systematic. Points of articulation (velar, palatal etc.) are indicated as in the mainstream Indic transliteration, but the articulation modes get represented in a way that actually reflects Thai usage. All letters spoken as aspirated are written with a superscript H (h), and the base letter is chosen with respect to the Thai sound value, e. g., K for all voiceless velar stops. Those letters deriving from Sanskrit voiced series are distinguished by a diacritic: The former voiced inaspirates get a macron (they are common), and the former voiced aspirates get a circumflex (they are very rare); thus Indic g equals Thai k̄h, and Indic gh is rendered as k̂h. A small inconsistency is the letter So So, which appears in a Thai aspirate series but is rendered as an S with macron s̄ according to the plain S pronunciation (actually, it is a modification of Cho Chang c̄h which in turn corresponds to Sanskrit j).
Consonant classes are easy to rationalize as soon as the Indic roots of the writing systems are understood. The mid class (yellow) holds all sounds which in Sanskrit are voiceless inaspirate plosives (the new Thai glottal stop also follows this rule). The high class (orange) derives from voiceless aspirates and related sounds (the three sibilants and H). The rest, which is voiced in Sanskrit, makes up the low class (white); this means all Sanskrit voiced stops (those with a top diacritic in transliteration) and various continuants (nasals, laterals) including the new Thai letters Lo Chula and Ho Nokhuk (I guess that the latter was voiced at some point in the past).
Sanskrit loanwords (or, in the case of Buddhist scriptures, true Sanskrit texts) are written etymologically, i. e., with the historically corresponding letter, not the closest phonological match. For example, the birthplace of the Buddha, Lumbinī, is naturally spelled Lump̄hinī in Thai, and is pronounced as such, and also with mid, high and mid tones for the three syllables, respectively (following rules that will be explained a few paragraphs later). This makes Sanskrit spoken by a Thai basically unintellegible to an Indian brahmin.
The writing of Thai vowels is extremely involved, owing to the many phonemically different vowels in the language (and the lack of built-in vowel support in the Indic script core). As seen from the point of graphical representation, they fall into three classes: The implicit vowel need not be written because it is implied in each consonant letter. The simple vowels are written with diacritic vowel signs that get attached to the precedig consonant (the syllable onset consonants) according to the Indic model. The remaining ones are the complex vowels, typically diphthongs or triphthongs: They have no Indic counterpart and are written with sometimes lengthy sequences involving one or more vowel sign and/or one or more consonant letter. Three consonant letters can appear in vowel sequences: O Ang (also used for simple vowels in some specific cases), Wo Waen and Yo Yok.
An additional complication comes from the distinction between open and closed syllables. Thai has a rather simple syllable structure C(C)V(C), with only a few allowed onset clusters (phonetically, [kkh]+[rlw], [pph]+[rl] and t+r). The syllable boundary is not indicated directly (there is no virama), yet to allow the reader to isolate the syllables easily, many vowels have different notation in open C(C)V and closed C(C)VC syllables. This method, though indirect, is amazingly effective: I do not speak a single word of Thai, yet following the rules I was able to identify the syllables in all the spice names shown here with only one or two ambigous cases in the whole set.
The table on the right side summarizes all Thai vowel sequences; wherever necessary, one cell has two entries for open and closed syllables, respectively. Most vowels come in short/
Even the implicit vowel is complicated. In open syllables, it sounds a and in closed syllables o. In some cases, e. g. whenever the next syllable starts with a cluster, it may become necesary to explicitly write the implicit vowel; otherwise, a word like kapla were ambigous (ka-pla or kap-la). The sign Sara A is used for the implicit vowel in such cases, and is has furher use in several vowel sequences where it denotes shortness (replaced by Mai Taikhu in closed syllables).
The following signs are used for simple vowels: Sara AA (long A, ā), Sara I (short I, i), Sara II (long I, ī), Sara UE (short Ü, ü), Sara UUE (long Ü, ǖ), Sara U (short U, u), Sara UU (long U, ū), Sara E (long E, ē), Sara AE (long Ä, ǟ) and Sara O (closed long O, ō). The short variants of E, AE and O are arrived at with the shortening marks mentioned above, and two more vowels (open O, which I transliterate as ɔ, and Ö) require short sequences some of which involve the letter O Ang.
Diphthongs ending in U involve sequences with a final Wo Waen, and such ending in I have sequences that end in Yo Yak. Yet, AI and AU have special representions which clearly trace back to the original Sanskrit diphthong signs which have been inherited by Thai script (Sanskrit has diphthongs E,AI,O,AU, where the classification of E and O as diphthongs is just a peculiarity of Sanskrit grammar). AU is basically written by simultaneously applying Sara E and Sara AA (mirroring the construction of the O and AU signs from E and AA in most Indic scripts), and for AI, there are two typographically slightly different versions of the South Indian AI vowel sign. AI can also be written by a sequence with Yo Yak; thus there are three possible representations for that sound, normalized by orthographic rules.
The vowel signs Sara E, Sara AE, Sara O and the two AI-signs graphically appear at the beginning of the syllable, left of the onset consonant (in case of an initial cluster, left of the entire consonant group); in the table, they are marked with an asterisk for clarity. Sara I, Sara II, Sara UE, Sara UUE and the vowel shorteners (Mai Han-Akat, Mai Taikhu) appear on top of the consonant (in case of a cluster, the second consonant), and Sara U and Sara U appear below the consonant. The remaining vowel signs (Sara A, Sara AA and the special case Sara AM) follow the consonant.
The notorious Sanskrit letters for vocalized liquids (RU, its pendent LU and the corresponding long forms) also make an appearance. They are not fully obsolete even when writing Thai, for they appear in some Sanskrit loanwords and, rather amazingly, also in some neologisms derived from English.
The nasal mark Nikhahin (Thai incarnation of the Indian Anusvara) is no exactly a vowel, but behaves typographically similar to vowel signs. It is not used in true Thai words, except in the very frequent combination with Sara AA (open syllables only). The ligature of those two signs is so common that it is usually considered a vowel sign in its own right, Sara AM. Although derived from the long form AA, it is realized with a short a sound. There is phonetic contrast between a syllable ending in AM and one ending in A plus consonant M.
If merits are sticky, then the Unicode Standard certainly has not stained its hands when encoding the Thai Script. Coding Thai texts follows a visual model, meaning that the signs are written and stored in typograpical order, as opposed to the logical order used for nearly all other Brahmi-derived scripts. This means that in the encoded text, the left-attaching vowel signs (E etc.) appear before the consonant they are following in speech. As a consequence, there is no joining behaviour defined for these vowel signs; typographically, they are just letters (the Standard tries to push this to the extreme by also defining no joining behaviour for A and AA, where it could have done easily, but the otherwise very similar AM indeed is a spacing accent). Electronic procession of Thai texts becomes a dire nightmare dwarfing that of Elm Street, because everything is different from every other language and must be done differently. A virama model similar to that used for Khmer was considered, but had to be discarded for compatibility with a misbegotten existing Thai standard.
In transliterating the vowels, I do the obvious and go the phonetic way. Everything long gets a macro somewhere, and this poses a problem with the rounded vowels ÄÖÜ: Their long counterparts need to carry both a diaresis and a macro ǞȪǕ, which isn’t really reader-friendly (thankfully, Unicode offers precomposed letters for all of them, which improves the rendering in real-world engines). The improper diphthongs are marked with a grave accent on their last part (representing the semi-vocalic element). I try to follow Indic conventions wherever possible, and this means that the anusvara would be transliterated as ṃ; however, Sara AM should be different, and so I chose a superscript m (m). The latter character is well known for not being supported by Windows XP, but frankly, that transliteration is so fiendishly overdecorated, having often more diacritics than base letters in a word, and XP would perfom miserably even if it did not fail on m.
Thai has five different tones: Mid 33, low 21, falling 41, high 34 and rising 25. Each syllable can be pronounced with two to five different tones, depending on the consonant/vowel distribution. Syllables differing only in tone may exhibit completely unrelated meanings, and therefore it is vital for the script to code all tones unambigously. In order to archive that goal, four different tone marks are used (Mai Ek, Mai Tho, Mai Tri and Mai Chattawa).
Yet, it would not be Thai if it were easy to determine the tone for a given written syllable. Rather, the tone is a function of consonant class, vowel length and syllable coda, with optional overriding of the last two by a tone mark (in fact, less than 50% of all written Thai syllables need a tone mark). The table at the right side summarizes the rules.
There is an important additional rule: A syllable beginning with a nasal, approximant or lateral (all of which are voiced, thus belonging to the low class) can be preceded by a Ho Hip character with, although mute, lends its high class to the entire syllable. Consequently, many syllables that would be considered low (only two or three different tones possible) can gain access to more possible tones (four or five).
In transcribing the tones, I follow the Thai Script in just rendering the tone marks, which makes the lookup of the correct tone as complicated as in the native writing. Since the names of the tone marks derive from the Sanskrit numerals One to Four (think of, for example, eins, two, treis and quatuor), I just use superscript numbers. In Thai script, all tone marks are nonspacing diacritics attached to the consonants and floating higher than vowel signs (in Unicode, they follow the consonant and, if present, diacritic vowel signs, but they precede any spacing vowel signs). To improve the readability of the transliteration, I have decided to show the superscript numbers at the end of the syllable.
ก์
Another sign hovering as high as the tone marks is the cancellation mark Thanthakat (shown right with the letter K). It marks consonants or syllables that are no longer spoken but have been orthographically fossilized. It never appears in true Thai words, but appears in quite some Sanskrit loanwords in even in more recent English loads (e. g. marking the R in pepper). This sign can be applied to a single consonant or a entire syllable. In transliteration, I represent it by a superscript zero immediately after the consonant (k0).
the letter spoken cho which appears in the word chang
, distinguishing it from the homophone Cho Choe elephant
that letter cho used to write choe
(there are two more cho-letters). Right next to the letter is the approximate pronunciation (initial/final) which is often used in non-scientic Romanization, and below this you will find the transliteration character used in this index.
tree
Vowels
กัก Mai Han-Akat ā กา AA
กึก UUE
เก็ก E* + Mai Taikhu ē เก E*
แก็ก AE* + Mai Taikhu ǟ แก AE*
กก implied ō โก O*
ก็อก Mai Taikhu + O Ang ɔ̄ กอ O Ang
does not occur in closed syllables ȫ เกอ E* + O Ang
เกิก E* + I
กวก Wo Waen
Tone marks etc.
consonant
The Lao Script
voiceless
inaspirate
(k,c,ṭ,t,p) ⬊ Indian
voiceless
aspirate
⬋ (kh,ch,ṭh,th,ph) Indian
voiced
inaspirate
⬋ (g,j,ḍ,d,b) Indian
voiced
aspirate
(gh,jh, etc) Indian
nasal
nasal
Lao velar