TowardsMachineLearning

CBOW ( Continuous Bag of words)

Introduction

In this article, we’ll talk about CBOW(Continuous Bag of words ) Model ,and how does this model works internally.

CBOW is a variant of the word2vec model predicts the center word from (bag of) context words. So given all the words in the context window (excluding the middle one), CBOW would tell us the most likely the word at the center.

For example, say we have a window size of 2 on the following sentence. Given the words (“PM”, “American”, “and”), we want the network to predict “Modi”.

The input of the Skip-gram network needs to change to take in multiple words. Instead of a “one hot” vector as the input, we use a “bag-of-words” vector. It’s the same concept, except that we put 1s in multiple positions (corresponding to the context words).

CBOW Model Architecture-

The CBOW architecture then looks like the following:

CBOW Architecture                                          Fig : CBOW Architecture

The training samples for CBOW look different than those generated for skip-gram.

Training Samples examples for CBOW
Fig: Training Samples examples for CBOW

With a window size of 2, skip-gram will generate (up to) four training samples per center word, whereas CBOW only generates one. With skip-gram, we saw that multiplying with a one-hot vector just selects a row from the hidden layer weight matrix. What happens when you multiply with a bag-of-words vector instead? The result is that it

selects the corresponding rows and sums them together.

[1 — [27 o 0] o x 17 24 1 23 5 7 4 6 13 11 18 25 36 20] For the CBOW architecture, we also divide this sum by the number of context words to calculate their average word vector. So the output of the hidden layer in the CBOW architecture is the average of all the context word vectors. From there, the output layer is identical to the one in skip-gram.
 

You can read other articles related to Word2vec-

Article Credit:-

Name:  Praveen Kumar Anwla
Founder: TowardsMachineLearning.Org

Leave a Comment