- Source: Tai Tham (Unicode block)
Tai Tham is a Unicode block containing characters of the Lanna script used for writing the Northern Thai (Kam Mu'ang), Tai Lü, and Khün languages.
History
123 of the 127 code points initially encoded were proposed in L2/07-007R, two more (U+1A5C and U+1A7C) in L2/08-037R2 and a final pair (U+1A5D and U+1A5E) in L2/08-073. The last of these three documents modified the definitions of U+1A37 and U+1A38 given in the first of the three.
The following Unicode-related documents record the purpose and process of defining specific characters in the Tai Tham block:
Encoding of Subscript Consonants
Base and subscript consonants have different encodings because words such as ᨲᩥ᩠ᨠ and
ᨲᩥᨠ are different in both appearance and sound.
Subscript consonants are encoded as a sequence of 2 characters. The second is the base character and the first is the special character U+1A60 TAI THAM SIGN SAKOT.: Section 2
If a consonant has two subscript forms and the choice affects the meaning, the form typically used for syllable-final consonants will be encoded with SAKOT, and the other form will have its own code point. There are 7 consonants which have different subscript forms in this way, namely ᩁ RA, ᩃ LA, ᨷ BA, ᩈ HIGH SA, ᨾ MA, ᨳ HIGH RATA, and ᨻ LOW PA.
ᨣᩕᩪ (Northern Thai pronunciation: [kʰuː]) is encoded as but
ᨠᩣ᩠ᩁ (IPA: [kaːn]) is encoded as U+1A60 SAKOT, U+1A41 RA>: Section 4
ᩆᩦ᩠ᩃ (IPA: [siːn]) is encoded as U+1A60 SAKOT, U+1A43 LA>: Section 14.5 but ᨸᩖᩦ (IPA: [piː]) is encoded as .: Section 4 (For the use of LA as a syllable final letter, compare ᩁᨭᩛᨷᩣ᩠ᩃ: Section 4 (Northern Thai pronunciation: [lat tha baːn]).
U+1A57 SIGN LA TANG LAI looks like but is in origin a ligature of it with . Tai Lue uses it to write the word ᨴᩢ᩵ᩗᩣ (IPA: [taŋ laːi]).
ᨣᩝᩴ (IPA: [kɔː bɔː])is encoded as , but ᨠᩢ᩠ᨷ (IPA: [kap]) is encoded as and
ᨠᩢᨷ᩠ᨷ᩺ (IPA: [kap]) is encoded as
In the final proposal,: 1 which the Unicode Consortium accepted that what is now SIGN BA (as in ᨣᩝᩴ) would be encoded as
Pali uses HIGH PA instead of BA in Laos and northeast Thailand. One should therefore be prepared to find
Tai Khuen has two ways of writing subscript HIGH SA. They are not interchangeable.
In Tai Khuen, to write ᩃᩮᩞ is correct and to write ᩃᩮ᩠ᩈ is wrong,
but to write ᩈᨶ᩠ᨶᩥᩅᩤ᩠ᩈ is correct while to write ᩈᨶ᩠ᨶᩥᩅᩤᩞ is wrong!
ᩃᩮᩞ is encoded as
while the incorrect ᩃᩮ᩠ᩈ is encoded as .
Tai Khuen has an additional way of writing subscript MA. There is a special codepoint for this additional method: Item 9
The word which Northern Thai writes as ᨵᨾ᩠ᨾ᩺ is written in Tai Khuen both as ᨵᨾ᩠ᨾ᩼ encoded as and as ᨵᨾᩜ᩼ encoded as
.
There are two ways of writing the subscript for both HIGH RATHA and LOW PA.
ᨶᩥᨣᨱᩛ: 368 is encoded as U+1A31 RANA, U+1A5B SIGN HIGH RATHA OR LOW PA>:
ᩁᩣᨩᨽᩢ᩠ᨮ: 3 is encoded
.
ᨶᩥᨻᩛᩣᨶ is encoded as :
ᨴᩮ᩠ᨻ is encoded as .
The latter word is also written as ᨴᩮ᩠ᨷ.
The Lao-style consonant conjunct ᨲ᩠ᨳ (encoded as ) looks as though it is ᨲᩛ encoded as . The shape of U+1A5B depends upon the consonant it is subscript to.
The dependent vowel of words like ᨯᩬᨠ 'flower' is encoded by the special vowel ; one should not use the sequence There is also an encoded dependent vowel for words like Tai Khuen, Tai Lue and Lao words such as ᨶ᩶ᩭ, namely U+1A6D SIGN OY. This vowel is not encoded as (which is what Northern Thai uses for the corresponding words; nor is it the sequence : Section 5
Superscript Consonants
Superscript consonants are encoded independently of the base consonants. Some characters serve both as superscript consonants and in other roles, and are therefore discussed further in this section.
Niggahita and is encoded as U+1A74 MAI KANG. Superscript WA is not encoded separately. It is encoded as MAI KANG. For example, Tai Khuen ᨯ᩠ᨿᩴ (IPA: [deu]) is encoded as . For the purposes of character sequencing, it is generally treated as a vowel.
Superscript cluster-initial NGA is encoded as U+1A58 MAI KANG LAI. Note that Lao generally uses the same glyph for MAI KANG LAI and U+1A59 SIGN FINAL NGA.
U+1A62 MAI SAT serves three roles - it is a vowel, a final consonant, and a vowel shortener.
Choosing the encoding of the superscript form of RA and the vowel killers was difficult. In the 1940s the Tai Khuen wrote the consonant and the vowel killer the same way. The proposers of the encoding made enquiries and were told that the glyphs were still the same and therefore encoded them both as U+1A7A RA HAAM. It was then learnt that the Tai Khuen had changed the glyphs of the vowel killer, and a new character U+1A7C KARAN was added for the Tai Khuen style of the vowel killer. Some Northern Thai writers prefer to use U+1A7C as the vowel killer, and indeed the use of its glyph is not unknown in Northern Thai handwriting.
Special Consonants
The special forms ᩓ and ᩕ are encoded by the code points U+1A53 and U+1A55 respectively.
If the glyphs of U+1A36 NA and U+1A63 SIGN AA would be side by side they are written as the ligature ᨶᩣ rather than as two separate glyphs ᨶᩣ. They are written as a ligature even if the NA has a subscript consonant or a non-following mark attached. Examples: ᨾᨶ᩠ᨲᩣ (IPA: [man taː], encoding ) and ᨶᩮᩢᩣ (IPA: [nau], encoding ). Subscript NA and SIGN AA do not similarly ligate, e.g. ᩉ᩠ᨶᩣ ((IPA: [naː]), encoded )
The geminate consonant ᩔ is encoded separately because the word ᩅᩥᩈᩮ᩠ᩈ (Northern Thai pronunciation: [wiseːt], encoding ) has an appearance very different from ᩅᩥᩔᩮ, but one may have occasion to fold the final syllable to
By contrast, the geminate consonant ᨬ᩠ᨬ is encoded as the conjunct , even though some of its glyphs may resemble the hypothetical conjunct ᨱ᩠ᨬ .
Independent Vowels
The independent vowel ᩋ and the consonant ᩋ are the same character, U+1A4B.
The independent vowel ᩋᩣ and the sequence of the consonant ᩋ and dependent vowel ᩣ have the same appearance ᩋᩣ and are therefore both encoded .
Northern Thai uses 5 independent vowels with their own code points, namely ᩍ, ᩎ, ᩏ, ᩐ and ᩑ.: Section 3
In Northern Thai the 8th independent vowel is no different from the sequence of the consonant ᩋ and dependent vowel ᩰ, i.e. ᩋᩰ, and they are therefore both encoded . Other languages use a distinct character ᩒ U+1A52 LETTER OO for the independent vowel.
Character Order within Text
The encoding proposal defined the ordering of Unicode characters.
Like the way of writing Burmese, Khmer, and Indian languages, Unicode characters are ordered according to the order of the sounds except in special cases or if 2 sounds combine into a single sound and then one uses the old order. This order is usually as in Siamese. If the sound does not have an order then one uses the visual order or a special alternative order.
There are special rules for:
(a) The ordering of vowels
(b) The writing of mai kia in all its variants
(c) Th writing of mai kua in all its variants
(d) The writing of mai kam
(e) The writing of tone marks
The ordering of Unicode characters for consonants and vowels is: onset letters, true vowel marks, coda consonants, onset letters, true vowel marks, coda consonants.: Section 14 For convenience, one reckons that symbols killing vowels are vowels.
The 'onset letters' are consonants, independent vowels or special symbols. The consonants in a group are ordered according to the order in which they are sounded or used to be sounded.
Example: ᨻᩩᨴ᩠ᨵ (Northern Thai pronunciation: [put thaʔ])
onset letter: ᨻ
pure vowel: ᩩ
final 'consonant': ᨴ
onset letter: ᨵ
pure vowel: no symbol
final consonant: none
The encoding is
Example: ᨻᩕ has a single consonant sound Northern Thai pronunciation: [pʰ], but formerly had 2 sounds, namely those of ᨻ and then ᩁ as in central Thai. This word is encoded as
Apart from MEDIAL RA, the order of the consonant glyphs is the same as the order of the sounds. In most cases MEDIAL RA is the last consonant but the WA of /ua/ and the LOW YA of /ia/ follow MEDIAL RA.
Examples:
ᩆᩣᩈ᩠ᨲᩕ᩺ is encoded .
ᨠᩕᩈᩢ᩠ᨲ is encoded ᩈᩕ᩠ᩅᨾ is encoded .
But ᨲᩕ᩠ᨶᩬᨾ (Northern Thai pronunciation: [tʰa nɔːm]): 269 is encoded
For words like ᨧᩮᩢ᩶ᩣ there is the rule that symbols for vowels and tones have the order:: Section 5 first part, 5.3 and 13
(1) leading vowels
(2) vowels below (top to bottom)
(3) vowels above (bottom to top)
(4) tone marks (left to right)
(5) trailing vowels (left to right)
In the application of these rules, MAI KANG is reckoned as a vowel even though it function as niggahita or as a consonant. The Unicode character MAI SAT is reckoned as a vowel even though it function as a consonant, i.e as mai kak, i.e. as a final consonant or function as a vowel shortener as in ᨸᩮᩢ᩠ᨯ.
The relative ordering of the marks above and below should follow Thai and Lao as in เจ้า เกี่ว ชุํ and ບິ່.
Examples:
ᨧᩮᩢ᩶ᩣ is encoded as : Section 5 no. 29
ᨾᩢᩣ (IPA: [maːk]) is encoded as
ᩃᩪᩢ (IPA: [luːk]) is encoded as
ᨶᩮᩢᩣ is encoded as
ᩋᩫᨶ᩠ᨲᩕᩣ᩠ᨿ (Northern Thai pronunciation: [on thaʔ laːi]) is encoded as
For /ia/ and /ua/ in all their forms, subscript LOW YA and WA are reckoned as onset consonants.: Section 14.3
Examples:
ᩈ᩠ᨿᩮ is actually encoded : Section 5 No. 33
ᨸ᩠ᩃ᩠ᨿ᩵ᩁ is actually encoded : Section 14.9
ᨲ᩠ᩅᩫ is actually encoded : Section 14.3
ᩈ᩠ᩅ᩵ᩁ is actually encoded
ᨠᩖ᩠ᩅ᩠᩶ᨿ is actually encoded as
( is canonically equivalent to )
Outside Northern Thailand, the MAI KANG in the symbol for /am/ is written on the SIGN AA component. In Northern Thailand, it is positioned variously – on the consonant, on the SIGN AA and between them. The Unicode Consortium refused a special character for the combination. The word ᨷᩴ᩠᩵ᨾᩣ (Northern Thai pronunciation: [bɔːmaː]) should not appear to have the same vowel as ᨲ᩵ᩣᩴ (IPA: [tam]). The combination for /am/ is therefore encoded as . The word ᨷᩴ᩠᩵ᨾᩣ is encoded as . The word ᨲ᩵ᩣᩴ is encoded as . The combination for /am/ with SIGN TALL AA is encoded as .
U+1A5A SIGN LOW PA is a special case; the Tai Lue word ᨣᨽᩚ (IPA: [kap phaʔ]) is encoded as .: Section 4
Examples showing mai kang lai and la tang lai:
Pali word ᩈᩘᨥᩮᩣ (saṅgho) is encoded .
Northern Thai word ᨴᩘ᩠ᩃᩣ᩠ᨿ (Northern Thai pronunciation: [taŋ laːi]) is encoded Tai Lue word ᨴᩢᩗᩣ (Tai Lue pronunciation: [taŋ laːi]) is encoded .
External links
Chew, P., Saengboon, P., & Wordingham, R. (2015). "Tai Tham: A Hybrid Script that Challenges Current Encoding Models". Presented at the Internationalization and Unicode Conference (IUC 39).