DBSCAN
Description
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is an unsupervised clustering algorithm that groups together closely packed points (points with many nearby neighbors), while marking points that lie alone in low-density regions as outliers. Unlike K-Means, it does not require the number of clusters to be specified in advance and can find arbitrarily shaped clusters.
How It Works
- Core Point: A point with at least a minimum number of neighboring points (MinPts) within a given radius (ε).
- Border Point: A point within ε of a core point, but with fewer than MinPts neighbors.
- Noise Point: A point that is neither a core point nor a border point.
- The algorithm expands clusters from core points by including density-reachable points.
Examples
Python Example of DBSCAN
from sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
# Generate sample data
X, _ = make_moons(n_samples=300, noise=0.05, random_state=0)
# Apply DBSCAN
db = DBSCAN(eps=0.2, min_samples=5)
labels = db.fit_predict(X)
# Plot the clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='plasma')
plt.title("DBSCAN Clustering")
plt.show()
Real-World Applications
DBSCAN Applications
- Geospatial Data: Identifying geographical clusters such as hotspots or cities
- Anomaly Detection: Identifying outliers in transaction records or sensor data
- Astronomy: Detecting celestial bodies in space images based on density
- Social Media Analysis: Grouping users based on behavior without predefining groups

Resources
The following resources will be manually added later:
Video Tutorials
PDF/DOC Materials
Interview Questions
1. What makes DBSCAN different from K-Means?
Unlike K-Means, DBSCAN does not require the number of clusters to be specified beforehand and can find clusters of arbitrary shapes. It also handles outliers naturally by labeling them as noise.
2. What are the parameters of DBSCAN and how do they affect clustering?
eps (ε): Radius to consider neighbors around a point. Too small may result in more noise; too large may merge clusters.
min_samples: Minimum number of points to form a dense region. Affects sensitivity to noise and size of clusters.
3. What are the limitations of DBSCAN?
DBSCAN struggles with clusters of varying densities and can be sensitive to the choice of ε and MinPts. It also does not scale well to high-dimensional data.
4. How does DBSCAN classify noise?
Any point that is not a core point and not reachable from any other core point is classified as noise and assigned a label of -1.
5. Is DBSCAN deterministic?
Yes, DBSCAN is deterministic as long as the input data and parameters remain the same. It does not involve random initialization like K-Means.