Logistic Regression

Description

Logistic Regression is a supervised machine learning algorithm used for binary classification problems. Instead of predicting continuous values like linear regression, it predicts the probability that a given input belongs to a particular class.

How Logistic Regression Works

Logistic Regression uses the logistic (sigmoid) function to map predicted values to probabilities between 0 and 1.

Key points:

  • Predicts the probability of the default class (e.g., class 1)
  • Uses the sigmoid function: σ(z) = 1 / (1 + e^(-z)), where z = β₀ + β₁x₁ + ... + βₙxₙ
  • Outputs probabilities which are then converted to class labels using a threshold (usually 0.5)
  • Optimizes parameters using Maximum Likelihood Estimation

Examples

Python Code for Logistic Regression

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset and prepare binary classification problem
iris = load_iris()
X = iris.data
y = (iris.target == 2).astype(int)  # Classify if species is Iris-Virginica or not

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Real-World Applications

Logistic Regression Applications

  • Healthcare: Disease diagnosis (e.g., predicting presence or absence of cancer)
  • Finance: Credit scoring and loan default prediction
  • Marketing: Customer churn prediction and targeted advertising
  • Natural Language Processing: Spam email detection, sentiment analysis (binary sentiment)
Healthcare diagnosis illustration

Resources

The following resources will be manually added later:

Video Tutorials

Interview Questions

1. What is the main difference between Linear Regression and Logistic Regression?

Show Answer

Linear Regression predicts continuous numerical values, whereas Logistic Regression predicts probabilities of discrete classes for classification tasks.

2. Why do we use the sigmoid function in Logistic Regression?

Show Answer

The sigmoid function maps any real-valued number into the [0,1] range, making it suitable for modeling probabilities in classification.

3. How do you interpret the coefficients of a logistic regression model?

Show Answer

The coefficients represent the log-odds change in the outcome for a one-unit change in the predictor variable, holding others constant.

4. What are some common assumptions of Logistic Regression?

Show Answer
  • Linear relationship between log-odds of the outcome and predictors
  • Independent observations
  • No or little multicollinearity among predictors
  • Large sample size for stable estimates

5. How can you evaluate the performance of a logistic regression model?

Show Answer

Using metrics like accuracy, precision, recall, F1-score, ROC-AUC curve, and confusion matrix.