DL-5-Sequence models by deeplearning.ai

All videos → youtube
Main course → coursera
All slides → Github
TOC

Week 1 - RNN (Recurrent Neural Networks)

Why sequence models?

1 of the most exciting areas in DL
Models: RNN transforns speech recognition, NLP,...
Examples

Different/Equal length, X and/or Y is a sequence,...

Notations

$X^{<i>}$: index of position of a word in the sequences
$T_X$: length of sequence $X$
$X^{(i)}$: index of training example
Representing words → based on Vocabulary (built based on occurent words in the sequences or some online already-built vocabs) → a common vector of all words
- Each words represents by a one-hot vector based on vocabulary vector
- If some words are not in the vocab, we use "<UNK>" (Unknown)

RNN Model

Why don't use a standard networks?
1. Input and output can be different lengths in diff examples ($T_X \ne T_y$) even if you could use padding to the max length of all texts but it's not a good representation!
2. Doesn't share features learned across diff positions of text (ex: word "Harry" in some position and other positions give some info abt person's name)
  - Like in CNN, something learnt from 1 part of the image can be generalized quickly to other part of the image.
3. Reduce #params in model ← we don't want very large input layer (with one-hot vector)
RNN (Unidirectional)
- at time step 2, it uses not only the input $X^{<2>}$ but also the info from time step 1 (activation $a^{<1>}$)
![- The right version is rolled one but the same meaning with the left one (it appears in some textbook but unclear/difficult to implement, Andrew doesn't use it in the course)
- This is "Unidirectional RNN" which means that we can only use the info of the previous words!!! → not very strong because (ex:) He said, "Teddy Roosevelt was a great President" → Teddy is a name of a person He said, "Teddy bears are on sale!" → Teddy is not a name of a person!
- We use notations $W_{aX}, W_{aa}, W_{ya}$ to indicate the params](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a6162c16-005b-4393-a617-4e601f548551/Untitled.png)
- The right version is rolled one but the same meaning with the left one (it appears in some textbook but unclear/difficult to implement, Andrew doesn't use it in the course)
- This is "Unidirectional RNN" which means that we can only use the info of the previous words!!! → not very strong because (ex:) He said, "Teddy Roosevelt was a great President" → Teddy is a name of a person He said, "Teddy bears are on sale!" → Teddy is not a name of a person!
- We use notations $W_{aX}, W_{aa}, W_{ya}$ to indicate the params
Forward propagation
- Use $a^{<i-1>}, X^{<i>}$ to compute $a^{<i>}, \hat{y}^{<i>}$

Backpropagation through time (red arrows in below fig)