Back to Topics

Metrics: Accuracy, Precision, Recall, F1, AUC

Description

In machine learning, evaluating the performance of a model is crucial. Several metrics are used to quantify how well a model performs, especially for classification tasks:

  • Accuracy: Measures the overall correctness of the model’s predictions.
  • Precision: Indicates how many predicted positives are actually positive.
  • Recall: Shows how many actual positives are correctly identified.
  • F1 Score: The harmonic mean of precision and recall, balancing both.
  • AUC (Area Under the Curve): Evaluates the model’s ability to distinguish between classes at various threshold levels.

Choosing the right metric depends on the problem context, such as the balance of classes and the cost of false positives vs false negatives.

Examples

Here is an example using scikit-learn to calculate these metrics in Python:


from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Example true and predicted labels
y_true = [0, 1, 1, 0, 1, 1, 0]
y_pred = [0, 0, 1, 0, 1, 1, 1]
y_scores = [0.1, 0.4, 0.8, 0.35, 0.7, 0.9, 0.2]  # predicted probabilities

accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
auc = roc_auc_score(y_true, y_scores)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"AUC: {auc:.2f}")
        

Real-World Applications

Medical Diagnosis

Precision and recall are critical to balance false positives and false negatives in disease detection.

Spam Detection

High precision reduces the chance of misclassifying important emails as spam.

Fraud Detection

Recall is important to catch as many fraudulent cases as possible while minimizing false alarms.

Marketing Campaigns

Accuracy and F1 score help evaluate customer targeting models to maximize response rates.

Credit Scoring

AUC helps assess the model's ability to rank potential defaulters accurately.

Resources

PDFs

The following documents

Recommended Books

Interview Questions

What is accuracy and when is it useful?

Accuracy is the ratio of correctly predicted observations to the total observations.

Formula: (TP + TN) / (TP + TN + FP + FN)

It is useful when the dataset is balanced but can be misleading with imbalanced data.

What is precision and why is it important?

Precision measures how many of the positive predictions were actually correct.

Formula: TP / (TP + FP)

High precision means fewer false positives, which is important in cases like spam detection.

Explain recall and when it matters.

Recall (or sensitivity) measures how many actual positives were correctly identified.

Formula: TP / (TP + FN)

It is crucial in scenarios where missing positive cases is costly, like disease diagnosis.

What is the F1 score and why is it used?

The F1 score is the harmonic mean of precision and recall.

Formula: 2 × (Precision × Recall) / (Precision + Recall)

It is used to balance precision and recall, especially on imbalanced datasets.

What does AUC-ROC represent in model evaluation?

AUC (Area Under the Curve) measures the model’s ability to distinguish between classes.

Values range from 0 to 1, with 1 being perfect classification and 0.5 meaning random guessing.

A higher AUC indicates better overall model performance across all classification thresholds.