Evaluation Metrics: Accuracy, Precision, Recall, F1-Score, etc.
Description
Evaluation metrics are crucial for assessing the performance of machine learning models. They help determine how well a model is performing and guide improvements. Different problems (classification vs. regression) require different evaluation metrics. For classification tasks, key metrics include accuracy, precision, recall, and F1-score.
Accuracy
Accuracy is the ratio of correctly predicted instances to the total number of instances.
Formula: (TP + TN) / (TP + TN + FP + FN)
Precision
Precision measures the proportion of true positive predictions among all positive predictions.
Formula: TP / (TP + FP)
Recall
Recall (also known as sensitivity or true positive rate) measures the proportion of true positives identified out of all actual positives.
Formula: TP / (TP + FN)
F1-Score
The F1-Score is the harmonic mean of precision and recall. It provides a balance between the two, especially when classes are imbalanced.
Formula: 2 * (Precision * Recall) / (Precision + Recall)
Examples
Python Example Using Classification Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
# Load dataset
X, y = load_iris(return_X_y=True)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred, average='macro'))
print("Recall:", recall_score(y_test, y_pred, average='macro'))
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))
Real-World Applications
Evaluation Metric Applications
- Healthcare: Evaluating diagnostic models (e.g., precision in cancer detection models)
- Cybersecurity: Intrusion detection systems rely heavily on precision and recall to avoid false alarms
- Search Engines: Precision helps measure result relevance
- Recommendation Systems: F1-score and precision help tune recommendation accuracy

Resources
The following resources will be manually added later:
Video Tutorials
PDF/DOC Materials
Interview Questions
1. What is the difference between precision and recall?
Precision: Measures how many of the predicted positives are actually positive.
Recall: Measures how many actual positives were correctly predicted by the model.
2. When should you prefer using F1-score over accuracy?
F1-score should be used when the dataset is imbalanced. Accuracy can be misleading in such cases, while F1-score provides a better measure by considering both precision and recall.
3. What does a high recall and low precision indicate?
It indicates that the model is identifying most of the actual positive cases but is also predicting many false positives.
4. Can a model have high precision and high recall?
Yes, but it is often a trade-off. Optimizing both requires a well-tuned model and typically sufficient and clean data.
5. How do you calculate F1-score?
F1 = 2 * (Precision * Recall) / (Precision + Recall)