K-Means Clustering

Description

K-Means Clustering is an unsupervised machine learning algorithm used to partition a dataset into K distinct clusters based on similarity. It works by iteratively assigning data points to clusters and updating the cluster centers (centroids) until convergence.

How It Works

Choose the number of clusters (K)
Initialize K centroids randomly
Assign each data point to the nearest centroid
Recompute the centroids as the mean of assigned points
Repeat assignment and update steps until centroids do not change significantly

Examples

Python Example of K-Means Clustering

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

# Create sample data
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Fit KMeans
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

# Plot results
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.75)
plt.title('K-Means Clustering')
plt.show()

Real-World Applications

K-Means Clustering Applications

Customer Segmentation: Grouping customers based on purchasing behavior
Image Compression: Reducing the number of colors in an image using color quantization
Document Clustering: Grouping similar documents for topic discovery
Anomaly Detection: Identifying unusual data patterns by analyzing cluster outliers

Resources

The following resources will be manually added later:

Video Tutorials

YouTube video link.

PDF/DOC Materials

Drive links for PDF/DOC files .

Interview Questions

1. What is the main objective of the K-Means algorithm?

Show Answer

The main objective is to minimize the within-cluster sum of squares (WCSS) by finding cluster centers that reduce the total distance between points and their centroids.

2. How do you choose the value of K in K-Means?

Show Answer

The "elbow method" is commonly used, where WCSS is plotted for various K values. The "elbow" point (where WCSS starts to decrease slowly) indicates the optimal K.

3. Is K-Means sensitive to the initial placement of centroids?

Show Answer

Yes, different initial placements can lead to different results. K-Means++ initialization is commonly used to improve centroid selection.

4. Can K-Means handle non-spherical clusters?

Show Answer

No, K-Means assumes clusters are spherical and equally sized, which makes it less effective for irregular or overlapping clusters.

5. What is the computational complexity of K-Means?

Show Answer

The time complexity is O(n * k * i * d), where n = number of data points, k = number of clusters, i = number of iterations, d = dimensionality.