K-Means Clustering
Description
K-Means Clustering is an unsupervised machine learning algorithm used to partition a dataset into K distinct clusters based on similarity. It works by iteratively assigning data points to clusters and updating the cluster centers (centroids) until convergence.
How It Works
- Choose the number of clusters (K)
- Initialize K centroids randomly
- Assign each data point to the nearest centroid
- Recompute the centroids as the mean of assigned points
- Repeat assignment and update steps until centroids do not change significantly
Examples
Python Example of K-Means Clustering
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
# Create sample data
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Fit KMeans
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
# Plot results
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.75)
plt.title('K-Means Clustering')
plt.show()
Real-World Applications
K-Means Clustering Applications
- Customer Segmentation: Grouping customers based on purchasing behavior
- Image Compression: Reducing the number of colors in an image using color quantization
- Document Clustering: Grouping similar documents for topic discovery
- Anomaly Detection: Identifying unusual data patterns by analyzing cluster outliers

Resources
The following resources will be manually added later:
Video Tutorials
PDF/DOC Materials
Interview Questions
1. What is the main objective of the K-Means algorithm?
The main objective is to minimize the within-cluster sum of squares (WCSS) by finding cluster centers that reduce the total distance between points and their centroids.
2. How do you choose the value of K in K-Means?
The "elbow method" is commonly used, where WCSS is plotted for various K values. The "elbow" point (where WCSS starts to decrease slowly) indicates the optimal K.
3. Is K-Means sensitive to the initial placement of centroids?
Yes, different initial placements can lead to different results. K-Means++ initialization is commonly used to improve centroid selection.
4. Can K-Means handle non-spherical clusters?
No, K-Means assumes clusters are spherical and equally sized, which makes it less effective for irregular or overlapping clusters.
5. What is the computational complexity of K-Means?
The time complexity is O(n * k * i * d), where n = number of data points, k = number of clusters, i = number of iterations, d = dimensionality.