Reading: Hands-On ML - Quick notes (Chapter 14 & 15)

👉 List of all notes for this book. IMPORTANT UPDATE November 18, 2024: I've stopped taking detailed notes from the book and now only highlight and annotate directly in the PDF files/book. With so many books to read, I don't have time to type everything. In the future, if I make notes while reading a book, they'll contain only the most notable points (for me).

Chapter 14 — Deep Computer Vision Using Convolutional Neural Networks (CNN)

Related notes: DL by DL.AI — Course 4: CNN , R-CNN & Fast R-CNN & Faster R-CNN & Mask R-CNN, TF by DL.AI — Course 2: CNN in TF
Good: [PDF] A guide to convolution arithmetic for deep learning + hình động của mấy hình. ← Tài liệu này giải thích về convolution, pooling, các thông số và các công thức arithmetic giữa các thông số (padding, strides,…)
CNNs are not restricted to visual perception: they are also successful at many other tasks, such as voice recognition and natural language processing
In NN, some neurons react only to images of horizontal lines, while others react only to lines with different orientations

Figure 14-1
An important milestone was a 1998 paper by Yann LeCun et al. that introduced the famous LeNet-5 architecture, which became widely used by banks to recognize handwritten digits on checks.
Why not simply use a deep neural network with fully connected layers for image recognition tasks? → big image → huge number of params → CNN solves this issue.
The most important building block of a CNN is the convolutional layer
Convolution

No padding and 1x1 strides

1x1 border zeros padding and 2x2 strides.
Zero padding: current layer has the same height and width as the previous layer → add zeros around the input.

Figure 14-3. Connections between layers and zero padding
Stride: the horizontal or vertical step size from one receptive field to the next.

Figure 14-4. Reducing dimensionality using a stride of 2
Filters = convolution kernels = kernels.
(Not in the book) Understanding components in plain English.
Feature map: a layer full of neurons using the same filter outputs a feature map, which highlights the areas in an image that activate the filter the most.
CNN has many feature maps
CNN vs FCN: The fact that all neurons in a feature map share the same parameters dramatically reduces the number of parameters in the model. Once the CNN has learned to recognize a pattern in one location, it can recognize it in any other location. In contrast, once a fully connected neural network has learned to recognize a pattern in one location, it can only recognize it in that particular location.
padding="valid" means no zero-padding
Some cases with padding and stride
Note that the height and width of the input images do not appear in the kernel’s shape