• Source: Caverphone
  • The Caverphone within linguistics and computing, is a phonetic matching algorithm invented to identify English names with their sounds, originally built to process a custom dataset compound between 1893 and 1938 in southern Dunedin, New Zealand. Started from a similar concept as metaphone, it has been developed to accommodate and process general English since then.


    Etymology


    The Caverphone was created by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002, revised in 2004. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches. The algorithm is optimised for accents present in the study area (southern part of the city of Dunedin, New Zealand).


    Procedure




    = Caverphone 1.0

    =
    The rules of the algorithm are applied consecutively to any particular name, as a series of replacements.
    The algorithm is as follows:

    Convert to lowercase
    Remove anything not A-Z
    If the name starts with...
    cough, replace it by cou2f
    rough, replace it by rou2f
    tough, replace it by tou2f
    enough, replace it by enou2f
    gn, replace it by 2n
    If the name ends with
    mb, replace it by m2
    Replace
    cq with 2q
    ci with si
    ce with se
    cy with sy
    tch with 2ch
    c with k
    q with k
    x with k
    v with f
    dg with 2g
    tio with sio
    tia with sia
    d with t
    ph with fh
    b with p
    sh with s2
    z with s
    any initial vowel with an A
    all other vowels with a 3
    3gh3 with 3kh3
    gh with 22
    g with k
    groups of the letter s with a S
    groups of the letter t with a T
    groups of the letter p with a P
    groups of the letter k with a K
    groups of the letter f with a F
    groups of the letter m with a M
    groups of the letter n with a N
    w3 with W3
    wy with Wy
    wh3 with Wh3
    why with Why
    w with 2
    any initial h with an A
    all other occurrences of h with a 2
    r3 with R3
    ry with Ry
    r with 2
    l3 with L3
    ly with Ly
    l with 2
    j with y
    y3 with Y3
    y with 2
    remove all
    2
    3
    put six 1 on the end
    take the first six characters as the code


    = Caverphone 2.0

    =
    Start with a word
    Convert to lowercase
    Remove anything not in the standard alphabet (typically a-z)
    Remove final e
    If the name starts with
    cough make it cou2f
    rough make it rou2f
    tough make it tou2f
    enough make it enou2f
    trough make it trou2f
    gn make it 2n
    If the name ends with
    mb make it m2
    Replace
    cq with 2q
    ci with si
    ce with se
    cy with sy
    tch with 2ch
    c with k
    q with k
    x with k
    v with f
    dg with 2g
    tio with sio
    tia with sia
    d with t
    ph with fh
    b with p
    sh with s2
    z with s
    an initial vowel with an A
    all other vowels with a 3
    j with y
    an initial y3 with Y3
    an initial y with A
    y with 3
    3gh3 with 3kh3
    gh with 22
    g with k
    groups of the letter s with a S
    groups of the letter t with a T
    groups of the letter p with a P
    groups of the letter k with a K
    groups of the letter f with a F
    groups of the letter m with a M
    groups of the letter n with a N
    w3 with W3
    wh3 with Wh3
    if the name ends in w replace the final w with 3
    w with 2
    an initial h with an A
    all other occurrences of h with a 2
    r3 with R3
    if the name ends in r replace the final r with 3
    r with 2
    l3 with L3
    if the name ends in l replace the final l with 3
    l with 2
    remove all 2s
    if the name end in 3, replace the final 3 with A
    remove all 3s
    put ten 1s on the end
    take the first ten characters as the code


    Examples




    = Caverphone 1.0

    =
    Lee -> lee
    lee -> l33
    l33 -> L33
    L33 -> L
    L -> L111111
    L111111 -> L11111

    Thompson -> thompson
    thompson -> th3mps3n
    th3mps3n -> th3mpS3n
    th3mpS3n -> Th3mpS3n
    Th3mpS3n -> Th3mPS3n
    Th3mPS3n -> Th3MPS3n
    Th3MPS3n -> Th3MPS3N
    Th3MPS3N -> T23MPS3N
    T23MPS3N -> TMPSN
    TMPSN111111 -> TMPSN1


    = Caverphone 2.0

    =
    Lee -> lee
    lee -> le
    le -> l3
    l3 -> L3
    L3 -> LA
    LA -> LA1111111111
    LA1111111111 -> LA11111111

    Thompson -> thompson
    thompson -> th3mps3n
    th3mps3n -> th3mpS3n
    th3mpS3n -> Th3mpS3n
    Th3mpS3n -> Th3mPS3n
    Th3mPS3n -> Th3MPS3n
    Th3MPS3n -> Th3MPS3N
    Th3MPS3N -> T23MPS3N
    T23MPS3N -> TMPSN
    TMPSN1111111111 -> TMPSN11111


    See also


    Soundex
    New York State Identification and Intelligence System
    Match rating approach
    Metaphone
    Cologne phonetics


    References




    External links


    Caversham Project - Caversham data set of names and accents in the southern part of Dunedin, New Zealand in 1893-1938.
    Original (2002) Caverphone algorithm
    Revised (2004) Caverphone algorithm
    Implementations:
    C# Revised Implementation
    Java implementation in the Apache Commons Codec project
    PHP implementation
    Python Implementation caverphone algorithm (version 2.0) - AdvaS Advanced Search project

Kata Kunci Pencarian: