The key idea is that for each point of a cluster, the neighborhood of a given radius has to contain at least a minimum number of points.

- "DBSCAN" =
**D**ensity-**b**ased-**s**patial**c**lustering of**a**pplication with**n**oise. - Separate clusters of high density from ones of low density.
- Can sort data into clusters of varying shapes.
**Input**: set of points & neighborhood N & minpts (density)**Output**: clusters with density (+ noises)- Each point is either:
*Core point*: has at least minpts points in its neighborhood.*Border point*: not a core but has at least 1 core point in its neighborhoods.*Noise point*: not a core or border point.

**Phase**:- Choose a point → it's a core point?
- If yes → expand → check core / check border
- If no → form a cluster

- Repeat to form other clusters
- Eliminate noise points.

- Choose a point → it's a core point?
**Pros**:- Discover any number of clusters (different from K-Means & K-Medoids Clustering which need an input of number of clusters).
- Cluster of varying sizes and shapes.
- Detect and ignore outliers.

**Cons**:- Sensitive → choice of neighborhood parameters (eg. If minpts is too small → wrong noises)
- Produce noise: unclear → how to calculate metric indexes when there is noise.

**H**igh DBSCAN.- Difference between DBSCAN and HDBSCAN:
- HDBSCAN: focus much on high density.
- DBSCAN: create right clusters but also create clusters with very low density of examples (Figure 1).
- Check more in this note.

- Reduce the speed of clustering in comparision with other methods (Figure 2).
- HDBScan has the parameter minimum cluster size (
`min_cluster_size`

), which is how big a cluster needs to be in order to form.

**Figure 1**. Difference between DBSCAN (left) and HDBSCAN (right). Source of figure.

**Figure 2**.Performance comparison of difference clustering methods. HDBSCAN is much faster than DBSCAN with more data points. Source of figure.