This is my note for the course (Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization). The codes in this note are rewritten to be more clear and concise.
All the courses
Course’s information
This course will teach you the "magic" of getting deep learning to work well. Rather than the deep learning process being a black box, you will understand what drives performance, and be able to more systematically get good results. You will also learn TensorFlow.
layers_dims
contains the size of each layer from $0$ to $L$.
parameters['W'+str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))
parameters['b'+str(l)] = np.zeros((layers_dims[l], 1))
To break symmetry, lets intialize the weights randomly.
parameters['W'+str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * 10
# 👆 LARGE (just an example of SHOULDN'T)
parameters['b'+str(l)] = np.zeros((layers_dims[l], 1))
Multiply randomly initial $W$ with $\sqrt{\frac{2}{n^{[l-1]}}}$. It's similar to Xavier initialization in which multipler factor is $\sqrt{\frac{1}{n^{[l-1]}}}$.
parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1])
parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))