Decision Trees

Description

Decision Trees are supervised learning models used for classification and regression tasks. They work by splitting the dataset into branches based on feature values, ultimately leading to a prediction outcome at the leaves.

How Decision Trees Work

The tree is built by choosing the best features and thresholds to split the data in a way that improves prediction purity (e.g., reducing entropy or Gini impurity). This process continues recursively until a stopping criterion is met (like maximum depth or minimum samples).

Key characteristics:

  • Simple to understand and visualize
  • Can handle both categorical and numerical data
  • Prone to overfitting, often mitigated using pruning techniques

Examples

Python Code for Decision Tree Classifier

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
clf = DecisionTreeClassifier(criterion='gini', max_depth=3)
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

Real-World Applications

Decision Tree Applications

  • Healthcare: Diagnosing diseases based on patient symptoms
  • Finance: Credit scoring, loan approval decision-making
  • Marketing: Customer segmentation, targeted promotions
  • Manufacturing: Quality control and fault detection
Business decision making using trees

Resources

The following resources will be manually added later:

Video Tutorials

Interview Questions

1. What is a Decision Tree?

Show Answer

A decision tree is a tree-like model used to make decisions based on input features. Each internal node represents a decision on a feature, each branch represents an outcome, and each leaf represents a class label or value.

2. How does a decision tree decide where to split?

Show Answer

It uses criteria such as Gini impurity, entropy (information gain), or mean squared error to choose the best split that improves the homogeneity of the resulting subsets.

3. What are the advantages and disadvantages of decision trees?

Show Answer

Advantages: Easy to interpret, no need for feature scaling, handles both numerical and categorical data.

Disadvantages: Prone to overfitting, can be biased with imbalanced data, small changes in data can lead to different trees.

4. What is pruning in decision trees?

Show Answer

Pruning is the process of removing nodes from a tree to reduce its complexity and avoid overfitting. It can be done using pre-pruning (setting limits like max depth) or post-pruning (removing nodes after tree construction).

5. How are decision trees different from random forests?

Show Answer

Decision trees are single models that can overfit easily. Random forests are ensembles of decision trees that reduce overfitting by averaging predictions from multiple trees trained on random subsets of data and features.