• Source: Tversky index
  • The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index.
    For sets X and Y the Tversky index is a number between 0 and 1 given by




    S
    (
    X
    ,
    Y
    )
    =




    |

    X

    Y

    |




    |

    X

    Y

    |

    +
    α

    |

    X

    Y

    |

    +
    β

    |

    Y

    X

    |






    {\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\alpha |X\setminus Y|+\beta |Y\setminus X|}}}


    Here,



    X

    Y


    {\displaystyle X\setminus Y}

    denotes the relative complement of Y in X.
    Further,



    α
    ,
    β

    0


    {\displaystyle \alpha ,\beta \geq 0}

    are parameters of the Tversky index. Setting



    α
    =
    β
    =
    1


    {\displaystyle \alpha =\beta =1}

    produces the Jaccard index; setting



    α
    =
    β
    =
    0.5


    {\displaystyle \alpha =\beta =0.5}

    produces the Sørensen–Dice coefficient.
    If we consider X to be the prototype and Y to be the variant, then



    α


    {\displaystyle \alpha }

    corresponds to the weight of the prototype and



    β


    {\displaystyle \beta }

    corresponds to the weight of the variant. Tversky measures with



    α
    +
    β
    =
    1


    {\displaystyle \alpha +\beta =1}

    are of special interest.
    Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions
    .




    S
    (
    X
    ,
    Y
    )
    =




    |

    X

    Y

    |




    |

    X

    Y

    |

    +
    β

    (

    α
    a
    +
    (
    1

    α
    )
    b

    )






    {\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\beta \left(\alpha a+(1-\alpha )b\right)}}}





    a
    =
    min

    (


    |

    X

    Y

    |

    ,

    |

    Y

    X

    |


    )



    {\displaystyle a=\min \left(|X\setminus Y|,|Y\setminus X|\right)}

    ,




    b
    =
    max

    (


    |

    X

    Y

    |

    ,

    |

    Y

    X

    |


    )



    {\displaystyle b=\max \left(|X\setminus Y|,|Y\setminus X|\right)}

    ,
    This formulation also re-arranges parameters



    α


    {\displaystyle \alpha }

    and



    β


    {\displaystyle \beta }

    . Thus,



    α


    {\displaystyle \alpha }

    controls the balance between




    |

    X

    Y

    |



    {\displaystyle |X\setminus Y|}

    and




    |

    Y

    X

    |



    {\displaystyle |Y\setminus X|}

    in the denominator. Similarly,



    β


    {\displaystyle \beta }

    controls the effect of the symmetric difference




    |

    X



    Y


    |



    {\displaystyle |X\,\triangle \,Y\,|}

    versus




    |

    X

    Y

    |



    {\displaystyle |X\cap Y|}

    in the denominator.


    Notes

Kata Kunci Pencarian: