DL by DL.AI — Course 2: Improving DNN - Tuning, Regularization and Optimization

This is my note for the course (Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization). The codes in this note are rewritten to be more clear and concise.

All the courses
Course’s information

This course will teach you the "magic" of getting deep learning to work well. Rather than the deep learning process being a black box, you will understand what drives performance, and be able to more systematically get good results. You will also learn TensorFlow.

Initialization step

layers_dims contains the size of each layer from $0$ to $L$.

zero initialization

parameters['W'+str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))
parameters['b'+str(l)] = np.zeros((layers_dims[l], 1))

The performance is really bad, and the cost does not really decrease.
initializing all the weights to zero ⇒ failing to break symmetry → every neuron in each layer will learn the same thing → $n^{[l]}=1$ for every layer → no more powerful than a linear classifier such as logistic regression.

Random initialization

To break symmetry, lets intialize the weights randomly.

parameters['W'+str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * 10
# 👆 LARGE (just an example of SHOULDN'T)
parameters['b'+str(l)] = np.zeros((layers_dims[l], 1))

High initial weights ⇒ The cost starts very high (near 0 or 1 or infinity).
Poor initialization ⇒ vanishing/exploding gradients ⇒ slows down the optimization algorithm.
If you train this network longer ⇒ better results, BUT initializing with overly large random numbers ⇒ slows down the optimization.

He initialization

Multiply randomly initial $W$ with $\sqrt{\frac{2}{n^{[l-1]}}}$. It's similar to Xavier initialization in which multipler factor is $\sqrt{\frac{1}{n^{[l-1]}}}$.

parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1])
parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))

Initialization step

zero initialization

Random initialization

He initialization

Regularization step