Why sequence models?
1 of the most exciting areas in DL
Models: RNN transforns speech recognition, NLP,...
Examples
Different/Equal length, X and/or Y is a sequence,...
Notations
$X^{<i>}$: index of position of a word in the sequences
$T_X$: length of sequence $X$
$X^{(i)}$: index of training example
Representing words → based on Vocabulary (built based on occurent words in the sequences or some online already-built vocabs) → a common vector of all words
RNN Model
Why don't use a standard networks?
RNN (Unidirectional)
![- The right version is rolled one but the same meaning with the left one (it appears in some textbook but unclear/difficult to implement, Andrew doesn't use it in the course)
This is "Unidirectional RNN" which means that we can only use the info of the previous words!!! → not very strong because (ex:) He said, "Teddy Roosevelt was a great President" → Teddy is a name of a person He said, "Teddy bears are on sale!" → Teddy is not a name of a person!
We use notations $W_{aX}, W_{aa}, W_{ya}$ to indicate the params](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a6162c16-005b-4393-a617-4e601f548551/Untitled.png)
The right version is rolled one but the same meaning with the left one (it appears in some textbook but unclear/difficult to implement, Andrew doesn't use it in the course)
This is "Unidirectional RNN" which means that we can only use the info of the previous words!!! → not very strong because (ex:) He said, "Teddy Roosevelt was a great President" → Teddy is a name of a person He said, "Teddy bears are on sale!" → Teddy is not a name of a person!
We use notations $W_{aX}, W_{aa}, W_{ya}$ to indicate the params
Forward propagation
Backpropagation through time (red arrows in below fig)