Metrics: Accuracy, Precision, Recall, F1, AUC
Description
In machine learning, evaluating the performance of a model is crucial. Several metrics are used to quantify how well a model performs, especially for classification tasks:
- Accuracy: Measures the overall correctness of the model’s predictions.
- Precision: Indicates how many predicted positives are actually positive.
- Recall: Shows how many actual positives are correctly identified.
- F1 Score: The harmonic mean of precision and recall, balancing both.
- AUC (Area Under the Curve): Evaluates the model’s ability to distinguish between classes at various threshold levels.
Choosing the right metric depends on the problem context, such as the balance of classes and the cost of false positives vs false negatives.
Examples
Here is an example using scikit-learn
to calculate these metrics in Python:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
# Example true and predicted labels
y_true = [0, 1, 1, 0, 1, 1, 0]
y_pred = [0, 0, 1, 0, 1, 1, 1]
y_scores = [0.1, 0.4, 0.8, 0.35, 0.7, 0.9, 0.2] # predicted probabilities
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)
auc = roc_auc_score(y_true, y_scores)
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"AUC: {auc:.2f}")
Real-World Applications
Medical Diagnosis
Precision and recall are critical to balance false positives and false negatives in disease detection.
Spam Detection
High precision reduces the chance of misclassifying important emails as spam.
Fraud Detection
Recall is important to catch as many fraudulent cases as possible while minimizing false alarms.
Marketing Campaigns
Accuracy and F1 score help evaluate customer targeting models to maximize response rates.
Credit Scoring
AUC helps assess the model's ability to rank potential defaulters accurately.
Resources
Recommended Books
- Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron
- Deep Learning with Python by François Chollet
Interview Questions
What is accuracy and when is it useful?
Accuracy is the ratio of correctly predicted observations to the total observations.
Formula: (TP + TN) / (TP + TN + FP + FN)
It is useful when the dataset is balanced but can be misleading with imbalanced data.
What is precision and why is it important?
Precision measures how many of the positive predictions were actually correct.
Formula: TP / (TP + FP)
High precision means fewer false positives, which is important in cases like spam detection.
Explain recall and when it matters.
Recall (or sensitivity) measures how many actual positives were correctly identified.
Formula: TP / (TP + FN)
It is crucial in scenarios where missing positive cases is costly, like disease diagnosis.
What is the F1 score and why is it used?
The F1 score is the harmonic mean of precision and recall.
Formula: 2 × (Precision × Recall) / (Precision + Recall)
It is used to balance precision and recall, especially on imbalanced datasets.
What does AUC-ROC represent in model evaluation?
AUC (Area Under the Curve) measures the model’s ability to distinguish between classes.
Values range from 0 to 1, with 1 being perfect classification and 0.5 meaning random guessing.
A higher AUC indicates better overall model performance across all classification thresholds.