Sometimes we need to "compress" our data to speed up algorithms or to visualize data. One way is to use dimensionality reduction which is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. We can think of 2 approaches:
Figure 1. An idea of using PCA from 2D to 1D.
Figure 2. An idea of using PCA from 5D to 2D.
<aside> ❓ Questions: How can we choose the green arrows like in Figure 1 and 2 (their directions and their magnitudes)?
</aside>
From a data points, there are many ways of projections, for examples,
Figure 3. We will project the points to the green line or the violet line? Which one is the best choice?
Intuitively, the green line is better with more separated points. But how can we choose it "mathematically" (precisely)? We need to know about:
Subtract the mean to move to the original axes.
From the original data (a lot of features $x_1, x_2, \ldots, x_N$), we construct a covariance matrix $U$.
Find the eigenvalues $\lambda_1, \lambda_2,\ldots$ and correspondent eigenvectors $v_1, v_2, \ldots$ of that matrix (we call them eigenstuffs). Choose $K < N$ couples $\lambda$ and $v$ (the highest eigenvalues) and we get a reduced matrix $U_K$.
Projection original data points to the $K$-dimensional plane created based on these new eigenstuffs. This step creates new data points on a new dimensional space ($K$).
$$ Z = U_K^TX $$
Now, instead of solving the original problem ($N$ features), we only need to solve a new problem with $K$ features ($K<N$).
Figure 5. A big picture of the idea of PCA algorithm. "Eigenstuffs" are eigenvalues and eigenvectors. Source.