TowardsMachineLearning  # Find word similarity using PPMI score

## Introduction :-

So in our last article we’ve understood that how to calculate PMI score. Now in this article we’ll talk about how can we find word similarity using PPMI score.

## Identify word similarity or Vector similarity :-

In distributional models, every word is a point in n-dimensional space. How do we measure the similarity between two points/vectors?

• So, let’s assume we have context vectors for two target words \overrightarrow { V } & \overrightarrow { W }
• Each contains PMI (or PPMI) values for all context words
• One way to think of these vectors: as points in high-dimensional space

Ex. in 2-dim space: cat = (v1, v2), computer = (w1, w2)

One obvious choice could be Euclidean distance.

## Find word similarity using Euclidean Distance:-

We could measure similarity(distance)  using Euclidean distance

Euclidean Distance =\sqrt { \sum _{ i=1 }^{ N }{ { \left( { x }_{ i }-{ y }_{ i } \right)  }^{ 2 } }  }

But this doesn’t work well if even one dimension has an extreme value

There’re few more possible similarity measures to calculate similarity-

## Other possible similarity measures-

Following are predominantly used approaches to calculate distance or similarity –

• Manhattan Distance ( Levenshtein Distance , L1 norm) –
• { dist }_{ L1 }(\overrightarrow { x } ,\overrightarrow { y } )=\sum _{ i=1 }^{ N }{ \left| { x }_{ i }-{ y }_{ i } \right|  }
• Euclidean distance (l2 norm) –
• { dist }_{ L2 }(\overrightarrow { x } ,\overrightarrow { y } )=\sqrt { \sum _{ i=1 }^{ N }{ { \left( { x }_{ i }-{ y }_{ i } \right)  }^{ 2 } }  }
• Jaccard similarity-
• { Sim }_{ Jaccard }(\overrightarrow { V } ,\overrightarrow { w } )=\frac { \sum _{ i=1 }^{ N }{ min({ V }_{ i } } ,W_{ i }) }{ \sum _{ i=1 }^{ N }{ max({ V }_{ i } } ,W_{ i }) }
• Dice similarity-
• { Sim }_{ Dice }(\overrightarrow { V } ,\overrightarrow { w } )=\frac { 2\times \sum _{ i=1 }^{ N }{ min({ V }_{ i } } ,W_{ i }) }{ \sum _{ i=1 }^{ N }{ max({ V }_{ i } } +W_{ i }) }

## Dot product :-

Another possibility is take the dot product of  \overrightarrow { V } & \overrightarrow { W }

The dot product operator from linear algebra, also called the inner product:

{ Sim }_{ DP }(\overrightarrow { V } ,\overrightarrow { W } )=\overrightarrow { V } .\overrightarrow { W }

{ Sim }_{ DP }(\overrightarrow { V } ,\overrightarrow { W } )={ V }_{ 1 }{ W }_{ 1 }+{ V }_{ 2 } W_{ 2 }+{ V }_{ 3 }{ W }_{ 3 }+……{ V }_{ n }W_{ n }

Dot product gets high when  two vectors have high values in each direction, regardless of similarity. So more frequent words have higher dot products.

And low ( in fact 0) for orthogonal vectors with zeros in complementary distribution.

Vector length is given by-

\left| \overrightarrow { V }  \right| =\sqrt { \sum _{ i=1 }^{ N }{ { V }_{ i }^{ 2 } }  }

But we don’t want a similarity metric that’s sensitive to word frequency.

## Normalized dot product (or Cosine Similarity ):-

To correct problem in dot product method, we normalize: divide by the length of each vector:

{ sim }_{ NDP }(\overrightarrow { V } ,\overrightarrow { W } )=\frac { (\overrightarrow { V } .\overrightarrow { W } ) }{ \left| \overrightarrow { V }  \right| \left| \overrightarrow { W }  \right| }

{ Sim }_{ NDP }(\overrightarrow { V } ,\overrightarrow { W } )=\frac { \sum _{ i=1 }^{ N }{ { V }_{ i }\times { W }_{ i } }  }{ \sqrt { \sum _{ i=1 }^{ N }{ { V }_{ i }^{ 2 } }  } \sqrt { \sum _{ i=1 }^{ N }{ { W }_{ i }^{ 2 } }  }  }

=\frac { \overrightarrow { V } .\overrightarrow { W }  }{ \left| \overrightarrow { V }  \right| \left| \overrightarrow { W }  \right|  }

If you really see , the above formula represents cosine angle between two vectors.

\overrightarrow { a } .\overrightarrow { b } =\left| \overrightarrow { a } \right| \left| \overrightarrow { b } \right| \cos { \theta }

\cos { \theta =\frac { \overrightarrow { a } .\overrightarrow { b } }{ \left| \overrightarrow { a } \right| \left| \overrightarrow { b } \right| } }

The normalized dot product is nothing but the cosine of the angle between vectors.

sim(\overrightarrow { V } ,\overrightarrow { W } )=cos(\overrightarrow { V } ,\overrightarrow { W } )

So sim(\overrightarrow { V } ,\overrightarrow { W } )\quad =\quad 1\quad ; V & W are in the same direction.

sim(\overrightarrow { V } ,\overrightarrow { W } )\quad =\quad 0\quad ; V & W are orthogonal.

sim(\overrightarrow { V } ,\overrightarrow { W } )\quad =\quad -1\quad ; V & W are in the opposite direction.

Where

• { V }_{ i } is the PPMI value for word V in context i.
• { W }_{ i } is the PPMI value for word W in context i.

So cosine ranges from -1 (vectors pointing opposite directions) to 1 (same direction ).

So cosine ranges from -1 (vectors pointing opposite directions) to 1 (same direction ).

## Article Credit:-

Name:  Praveen Kumar Anwla