k-Nearest Neighbors (k-NN)
Description
k-Nearest Neighbors (k-NN) is a simple, non-parametric, instance-based learning algorithm used for both classification and regression tasks. It makes predictions based on the closest training examples in the feature space.
How k-NN Works
When a new data point is encountered, the algorithm:
- Calculates the distance (e.g., Euclidean) from the new point to all training points
- Selects the
k
nearest neighbors - For classification: Predicts the majority class among the neighbors
- For regression: Averages the values of the neighbors
It is a lazy learning algorithm, meaning it does not learn an explicit model but stores all training data.
Examples
Python Code for k-NN Classification
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train k-NN classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Predict and evaluate
y_pred = knn.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
Real-World Applications
k-Nearest Neighbors Applications
- Healthcare: Disease classification based on patient symptoms
- Finance: Credit risk prediction, fraud detection
- Recommender Systems: Suggesting products based on user similarity
- Pattern Recognition: Handwriting and facial recognition

Resources
The following resources will be manually added later:
Video Tutorials
PDF/DOC Materials
Interview Questions
1. What is k-Nearest Neighbors (k-NN) and how does it work?
k-NN is a supervised learning algorithm that classifies or predicts a data point based on the majority label (for classification) or average value (for regression) of its k closest neighbors in the training dataset.
2. What are the main advantages and disadvantages of k-NN?
Advantages: Simple, effective, no training phase, adaptable to multi-class problems.
Disadvantages: Computationally expensive at prediction time, sensitive to irrelevant features and feature scaling, affected by noisy data.
3. How do you choose the value of k in k-NN?
The value of k is chosen using cross-validation. A small k can lead to overfitting, while a large k may smooth out class boundaries too much (underfitting). Odd values are often preferred to break ties in binary classification.
4. What distance metrics can be used in k-NN?
Common metrics include:
- Euclidean distance
- Manhattan distance
- Minkowski distance
- Cosine similarity
5. How does feature scaling impact k-NN?
Feature scaling is crucial because k-NN relies on distance metrics. Features with larger scales can dominate the distance calculation, so normalization (e.g., Min-Max or Standard Scaling) is often applied.