Recurrent Neural Network (RNN) architecture explained in detail


In this article I would assume that you have a basic understanding of neural networks . In this article,we’ll talk about Recurrent Neural Networks aka RNNs that made a major breakthrough in predictive analytics for sequential data. This article we’ll cover the architecture of RNNs ,what is RNN , what was the need of RNNs ,how they work , Various applications of RNNS, their advantage & disadvantage.

What is Recurrent Neural Network (RNN):-

Recurrent Neural Networks or RNNs , are a very important variant of neural networks heavily used in Natural Language Processing . They’re are a class of neural networks that allow previous outputs to be used as inputs while having hidden states.

RNN has a concept of “memory” which remembers all information about what has been calculated till time step t. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations.

Before we deep dive into the details of what a recurrent neural network is, let’s first understand why do we use RNNs in first place.

Why Recurrent Neural Network (RNN):-

In a general neural network, an input is fed to an input layer and is further processed through number of hidden layers and a final output is produced, with an assumption that two successive inputs are independent of each other or input at time step t has no relation with input at timestep t-1.

However this assumption is not true in a number of real-life scenarios. For instance, if one wants to predict the price of a stock at a given time or wants to predict the next word in a sequence then it is imperative that dependence on previous observations is considered.

To understand the need of RNNs or how RNNs can be helpful , let’s understand it with one real time incident that happened recently.

You must have come across a recent incident where Pakistan batsman Umar Akmal has been trolled after he posted a photo on Twitter with an obvious error. And the caption of the post was ‘Brother from another mother’.

And following this incident there has been many such sentences surfaced over internet like below-

  • If being crime is arrest then sexy me
  • You don’t have to be well to travel rich
  • If I’m bad , you’re my dad
  • Policy is the best honesty
  • Health is injurious to smoking
  • If opportunity doesn’t door then build a knock
  • Don’t happy be worry
  • Cure is best prevention
  • Everything is war in fair and love
  • A day a doctor, keeps an apple away
  • I love to cry in walking so nobody see I’m raining
  • Blood is in my cricket
  • Consumption of health is injurious to liquor
  • Always a ravian  , once a ravian
  • Work hard in success , let your silence make noise

So you see a little jumble in the words made the sentence incoherent . There are multiple such tasks in everyday life which get completely disrupted when their sequence is disturbed.

For instance, sentences that we just saw above- the sequence of words define their meaning, a time series data – where time defines the occurrence of events, the data of a genome sequence- where every sequence has a different meaning. There are multiple such cases wherein the sequence of information determines the event itself.

So if we’re trying to  use such data to predict any reasonable output, we need a network ,which has access to some prior knowledge about the data to completely understand it. That’s where Recurrent neural networks come to rescue.

To understand what is memory in RNNs , what is recurrence unit in RNN, how do they store information of previous sequence , let’s first understand the architecture of RNNs.

Architecture of  Recurrent Neural Network :-

So if you consider a simple feed-forward Neural network with one hidden layer . If  [latex] { X }_{ t } [/latex]  is the input and  [latex]{ Y }_{ t }[/latex] is output at time-step t then all we need to do is create a feedback connection from hidden layer to itself to access information at time step t-1. Feedback loop implies that there’s a delay of one time unit. So one of input units into [latex]{ h}_{ t }[/latex] is [latex]{ h }_{ t-1 }[/latex] ,in turns hidden layer takes in both [latex]{ X }_{ t }[/latex] and its own last value. So in nutshell this feedback loop allows information to be passed from one step of the network to the next and hence acts as memory in network.  The right diagram in below figure in below figure represents a simple Recurrent unit.

The right diagram in below figure in below figure represents a simple Recurrent unit.

Below diagram depicts the architecture with weights –

Below diagram depicts the architecture with weights –

So from above figure we can write below equations-

f= Sigmoid , tanh , ReLu

Note:-  function f could be any one of the usual hidden non-linearities that’s usually sigmoid , tanh or ReLu. It’s a hyper parameter just like other types of Neural networks .

This feedback loop makes recurrent neural networks seem kind of mysterious and quite hard to visualize the whole training process of RNNs. So let’s unfold this Recurrent neural to understand its working .

Unfolding a Recurrent Neural Network:-

Here we’d try to visualize the RNNs in terms of a feedforward network.

A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

Now consider what happens if we unroll the loop: Imagine we’ve a sequence of length 5 , if we were to unfold the recurrent neural network in time such that it has no recurrent connections at all then we get this feedforward neural network with 5 hidden layers like shown in below  figure-

It is as if [latex]{ h }_{ 0 }[/latex] is the input and each is just some additional control signal at each step. We can see that hidden to hidden weight is just repeated at every layer. So it’s like a deep network with the same shared weights between each layer. Similarly  is shared between each of the five Xs going into hidden layers.

Where ,

[latex]{ h }_{ t }[/latex]− Current state i.e. state at time step t

[latex]{ h }_{ t-1 }[/latex] − previous state i.e. state at time step t−1

[latex]{ X }_{ t }[/latex]− Input at time step 6

[latex]{ W }_{ h }[/latex]− Weight at recurrent neuron

[latex]{ W }_{ x }[/latex]− Weight at input neuron

[latex]{ Y }_{ t }[/latex]− Output at time step t

How Recurrent Neural Network works:-

The recurrent neural network works as follows:

  1. Unlike a traditional deep neural network, which uses different parameters at each layer ,RNN converts the independent activations into dependent activations by providing the same weights and biases to all the layers . And since RNN shares the same parameters [latex] (W_{ x\quad },W_{ h\quad },W_{ 0 }) [/latex] across all steps. This reflects the fact that we are performing the same task at each step, just with different inputs. This greatly reduces the total number of parameters we need to learn . Thus reducing the complexity of increasing parameters and memorizing each previous outputs by giving each output as input to the next hidden layer.
  2. These all 5 layers of the same weights and bias merge into one single recurring structure.

The above diagram has outputs at each time step, but depending on the task this may not be necessary. For example, when predicting the sentiment of a sentence we may only care about the final output, not the prediction after each word. Similarly, we may not need inputs at each time step. The main feature of an RNN is its hidden state, which captures some information about a sequence.

Applications of RNNs in real life:-

Before we deep dive into the details of what a recurrent neural network is, let’s take a glimpse of what are kind of tasks that one can achieve using such networks.

The beauty of recurrent neural networks lies in their diversity of application such as one can use RNNs to leverage entire sequence of information for classification or prediction. On the other hand,one can use RNNs to predict next value in a sequence with the help of information about past words or sequence  . Data Scientists have praised RNNs for their ability to deal with various input and output types.

  • Varying input size and Fixed output size – I’ve listed down couple of examples ,where one can leverage RNNs for varying input size but fixed size output .
    • Voice classification:-  For example you want to classify between male & female voices then you’d have sound samples from male & female voices. So you classify in to either of the classes by making use of  entire sequence of information. And So here the input would be a voice sample of varying lengths, while output is of a fixed type and size.
  • One can extrapolate the same idea to classify different animal or birds’ voices.
    • Sentiment classification – One can leverage RNNs to classify sentiment of a texts like movie or product review or tweet etc into different classes . So here the input would be of varying lengths in each scenario, while output is of a fixed type and size. So here we leverage the entire sequence of information for classification.

Fixed input size and  varying output size –

  • Image Captioning – For tasks like Image captioning , we have a single input – the image, and a series or sequence of words as output. Here the image might be of a fixed size, but the output is a description of varying lengths .

varying input size and  varying output size –

  • Language Translation – This basically means that we have some text in a particular language let’s say Spanish , and we wish to translate it in English. Each language has it’s own semantics and would have varying lengths for the same sentence. So here the inputs as well as outputs are of varying lengths.

So now we’ve fair idea of how RNNs are used for mapping inputs to outputs of varying types, lengths and are fairly generalized in their application. Let’s list down few advantages of RNNs now.


Since now we understand what is RNN , architecture of RNN , how they work & how they store the previous information so let’s list down couple of advantages of using RNNs.

  • Possibility of processing input of any length
  • Model size not increasing with size of input
  • Computation takes into account historical information
  • Weights are shared across time

Note:-In theory, RNNs can make use of information in arbitrarily long sequences, but in practice, they are limited to look back only a few steps. This is also called problem of Long-Term Dependencies .

So in next article we’ll talk about what is

  • Long-Term Dependencies
  • BPTT (Back Propagation through time)
  • Disadvantages of RNNs
  • Vanishing Gradient
  • Exploding Gradient
  • LSTM
  • GRU






[5] Excellent blog herewith Awesome illustrations.


Leave a Reply