• Source: European ordering rules
  • The European ordering rules (EOR / EN 13710) define an ordering for strings written in languages that are written with the Latin, Greek and Cyrillic alphabets. The standard covers languages used by the European Union, the European Free Trade Association, and parts of the former Soviet Union. It is a tailoring of the Common Tailorable Template of ISO/IEC 14651. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.


    Method


    Just as for ISO/IEC 14651, upon which EOR is based, EOR has 4 levels of weights.


    = Level 1

    =
    The first level sorts the letters. The following Latin letters are concerned by this level, in order:

    a b c d ð e f g h i j k l m n o p q r s t u v w x y z þ
    The Greek alphabet has the following order:

    α β γ δ ε ϝ ϛ ζ η θ ι κ λ μ ν ξ ο π ϟ ρ σ τ υ φ χ ψ ω ϡ
    Cyrillic script has the following order:

    а б в г ґ д ђ ѓ е ё є ж з з́ ѕ и і ї й ј к л љ м н њ о п р с с́ т ћ ќ у ў ф х ц ч џ ш щ ъ ы ь ѣ э ю я
    The order for the three alphabets is:

    Latin alphabet
    Greek alphabet
    Cyrillic alphabet
    The Georgian and Armenian alphabets had not been included in ENV 13710:2000. However, they were covered in CR 14400:2001 "European ordering rulesOrdering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". They have both been incorporated in and replaced by EN 13710:2011.
    All scripts encoded in ISO/IEC 10646 (Unicode) are covered by ISO/IEC 14651 (and its datafile CTT) as well as Unicode collation algorithm (UCA and the associated DUCET), both of which are available at no charge.


    = Level 2

    =
    The second level is where different additions, such as diacritics and variations, to the letters are ordered. Letters with diacritical marks (like ⟨à⟩, ⟨î⟩, ⟨õ⟩, and ⟨ü⟩) are ordered as variants of the base letter. ⟨æ⟩, ⟨œ⟩, ⟨ij⟩ and ⟨ŋ⟩ are ordered as modifications of ⟨ae⟩, ⟨oe⟩, ⟨ij⟩ and ⟨n⟩ respectively, similarly for similar cases.
    Level 2 defines the following order of diacritics and other modifications:

    Acute accent (á)
    Grave accent (à)
    Breve (ă)
    Circumflex (â)
    Caron (š)
    Ring (å)
    Diaeresis (ä)
    Double acute accent (ő)
    Tilde (ã)
    Dot (ż)
    Cedilla (ç)
    Ogonek (ą)
    Macron (ā)
    With stroke through (ø)
    Modified letter(s) (æ)


    = Level 3

    =
    The third level makes the distinction between Capital and small letters, as in "Polish" and "polish".


    = Level 4

    =
    The fourth level concerns punctuation and whitespace characters. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".


    = Level 5

    =
    An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is italic, normal or bold.


    See also


    Collation
    Common Locale Data Repository (CLDR)
    Unicode
    Universal Character Set
    DIN 91379 – a European Unicode subset (also includes Greek and Cyrillic for Bulgarian), uses UTF-8 at interfaces, normalization form C (NFC) – a German 2022 standard; will be mandatory for German authorities and organizations in the exchange of data from 1 November 2024
    UTF-8


    References



    Notes


    External links


    European Ordering Rules, ENV 13710 – a "European Pre-Standard"

Kata Kunci Pencarian: