t-SNE
Description
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a powerful non-linear dimensionality reduction technique primarily used for data visualization. It converts high-dimensional data into a low-dimensional space (usually 2D or 3D) while preserving the local structure and relationships between data points.
How t-SNE Works
t-SNE models the probability distributions of pairs of points in high-dimensional and low-dimensional spaces, minimizing the difference between these distributions using a cost function called Kullback-Leibler divergence. This helps t-SNE to keep similar points close together while separating dissimilar points in the visualization.
- Focuses on preserving local neighborhoods
- Uses a probabilistic approach to model similarities
- Produces visually meaningful 2D or 3D representations of complex data
- Effective for clustering and exploratory data analysis
Examples
Python Example: t-SNE with scikit-learn
from sklearn.manifold import TSNE
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
# Load Digits dataset
digits = load_digits()
X = digits.data
y = digits.target
# Apply t-SNE to reduce dimensions to 2
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)
# Plotting the t-SNE result
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y, cmap='tab10', edgecolor='k', s=50)
plt.xlabel('t-SNE Dimension 1')
plt.ylabel('t-SNE Dimension 2')
plt.title('t-SNE on Digits Dataset')
plt.colorbar()
plt.show()
Real-World Applications
t-SNE Applications
- Image Processing: Visualizing complex image datasets to identify patterns or clusters
- Genomics and Bioinformatics: Exploring gene expression data and identifying cell types
- Natural Language Processing: Visualizing word embeddings and document clusters
- Anomaly Detection: Identifying outliers in high-dimensional datasets
- Exploratory Data Analysis: Understanding data structure before applying other algorithms

Resources
The following resources will be manually added later:
Video Tutorials
PDF/DOC Materials
Interview Questions
1. What is the primary goal of t-SNE?
The primary goal of t-SNE is to reduce high-dimensional data to a lower-dimensional space (2D or 3D) for visualization, preserving local similarities between points.
2. How does t-SNE differ from PCA?
Unlike PCA, which is a linear technique focused on maximizing variance globally, t-SNE is a non-linear technique that preserves local neighborhood structures and is mainly used for visualization.
3. What are some limitations of t-SNE?
t-SNE is computationally expensive, sensitive to hyperparameters like perplexity, and does not preserve global data structure well. It is mainly for visualization, not general-purpose dimensionality reduction.
4. What is the role of perplexity in t-SNE?
Perplexity controls the balance between local and global aspects of the data by defining the effective number of neighbors considered during the embedding. It affects how clusters are formed.