Back to Blogs
How Confusion Matrices are Useful in Machine Learing?
Confusion matrices can be confusing as the name suggests but they can be in removing your during solving .
May 26, 2021 · 5 min read
Confusion matrices can be confusing as the name suggests but they can be in removing your during solving . A Confusion matrix is an N x N matrix where N are the number of target values for .
It is a table layout that allows us to understand the of during . They are useful in having an overall about the classifications done by our model.
The sub-classifications done by a confusion matrix can be further analyzed to determine various of such models. The matrix contains comparisons of actual values with the predicted values of the model.
There are four sub-classifications on which a confusion matrix is defined and laid out:
1. True Positives
Consider there are two classes (a binary classification problem): One being Positive (or Correct or whatever you want it to be), other being Negative (or Incorrect or opposite of what you wanted it to be).
True positives are the values which are correctly classified by our model as being in the right class. These are the values which were put in the right by our model.
The patient was classified as being infected and they’re infected.
2. True Negatives
On the other hand, True Negatives are the values which were correctly classified as being in the second class of our . These are the values which were put in the right by our model but they were put in with a different .
Negative in True Negatives doesn’t mean that the values were misclassified but they were put in the other than before. Hence, they are being correctly classified and are called ‘True’.
…another patient was classified as not having an infection and they were not having an infection…
3. False Positives
Now, what if the values which were supposed to be put in the second box were actually put in box by our classifier. This is an error, more a Type I error, in which the negative values are misclassified as being positive.
The model predicted a patient as having an infection but they were not having an infection and were fine. False Alarm.
Such errors are not good but they are not completely because the misclassifications can be tested further to verify their actual class.
4. False Negatives
As with False Positives, False Negatives indicate the values which were misclassified. They show the values which were actually positive but were put in the negative box making a Type II error.
The patient was infected but they were sent home as they were tested negative by the model. True alarm but not alarmed at all.
Such errors can actually be lethal and dangerous. Probably most dangerous in the medical, banking or judicial fields.
But what do we infer from this?
We have various methods to analyze the performance of our model:
Accuracy or Classification Accuracy:
We can calculate of values which were correctly classified out of the total values which were given to our model. This is of our classifier.
Accuracy tells us how well our model performed, but the limitation is that it requires the number of observations in each class to be around same.
Precision
Precision tells us that out of all the values put in the positive class by the model how many are actually positive. It tells us how precise our model was in classifying positive values.
It tells how much we should trust on our model predictions. It is also known as Positive Predictive Value.
Recall or Sensitivity:
If you think about it False Negatives are actually positive values misclassified as being negative, hence False Negatives.
We can use False Negatives and True Positives to determine how many positive values the model is able to recall from the total actual positive values given to it.
Recall gives us a percentage of how sure the model is about the positive results. This is also known as the True Positive Rate.
Specificity:
This is similar to Recall but for the negatives. It gives a ratio of how many values were classified as negative to how many values were actually negative.
Specificity gives us a percentage of how sure the model is about the negative results. This is also known as the True Negative Rate.
F1 Score
F1 Score is the harmonic mean of precision and recall. It varies from 0 to 1, where 1 indicates highest Precision and Recall possible and 0 indicates the lowest possible value.
How to create a confusion Matrix in Python?
We can use Scikit-Learn’s metrics method to import confusion matrix into our code. We can then pass our test values and our predicted values into the confusion_matrix() instance. The printed values will be a nested array containing four values which are TP, FN, FP and TN.
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)
#prints [[TP FP]
# [FN TN]]
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)
#prints [[TP FP]
# [FN TN]]
Further we can use classification_report() to create a chart containing Precision, Recall and F1 score for analyzing the performance of our model.
from sklearn.metrics import classification_report
classification_report = classification_report(y_test, y_pred)
print(classification_report)
#prints a table with columns: precision, recall, f1-score, support
classification_report = classification_report(y_test, y_pred)
print(classification_report)
#prints a table with columns: precision, recall, f1-score, support