Introduction

In this article, we’ll talk about CBOW(Continuous Bag of words ) Model ,and how does this model works internally.

CBOW is a variant of the word2vec model predicts the center word from (bag of) context words. So given all the words in the context window (excluding the middle one), CBOW would tell us the most likely the word at the center.

For example, say we have a window size of 2 on the following sentence. Given the words (“PM”, “American”, “and”), we want the network to predict “Modi”.

The input of the Skip-gram network needs to change to take in multiple words. Instead of a “one hot” vector as the input, we use a “bag-of-words” vector. It’s the same concept, except that we put 1s in multiple positions (corresponding to the context words).

CBOW Model Architecture-

The CBOW architecture then looks like the following:

The training samples for CBOW look different than those generated for skip-gram.

**Fig: Training Samples examples for CBOW**

With a window size of 2, skip-gram will generate (up to) four training samples per center word, whereas CBOW only generates one. With skip-gram, we saw that multiplying with a one-hot vector just selects a row from the hidden layer weight matrix. What happens when you multiply with a bag-of-words vector instead? The result is that it

selects the corresponding rows and sums them together.