Introduction

Evaluate the model we developed while performing research for either machine learning or deep learning projects is crucial. The best technique to see if the predicted value is well-classified is to use a confusion matrix. The confusion matrix function in the sklearn package, however, has a different interpretation than the one we usually find on other websites.

Compare WIKI with SKLEARN

In wiki page, we can see that each row of the matrix represents the instances in an actual class (ground truth) while each column represents the instances in a predicted class. But in sklearn confusion_matrix() function, each row of the matrix represents the instances in an predicted class while each column represents the instances in a actual class.

from sklearn import metrics

y_true = ["cat", "dog", "cat", "cat", "dog", "penguin"]
y_pred = ["dog", "dog", "cat", "cat", "dog", "cat"]
metrics.confusion_matrix(y_true, y_pred, labels=["cat", "dog", "penguin"])

This will return

array([[2, 1, 0],
       [0, 2, 0],
       [1, 0, 0]], dtype=int64)

Conclusion

A confusion matrix is a two-row, two-column table in predictive analytics that provides the value of false positives, false negatives, true positives, and true negatives. This enables for more in-depth analysis than simply the fraction of right classifications, such as accuracy, f1, precision, and recall scores.

References

Yang Wang

https://penguinwang96825.github.io/Yang-Tech-Blog/Yang-Tech-Blog/2021/10/12/2021-10-12-blind-spot-about-sklearn-confusion-matrix/

All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Yang Wang !

Python Data Science Sklearn

Alumni Profiles on Sheffield Website

I have recently been asked by my supervisor, Prof Chenghua Lin, to add my alumni profile to University of Sheffield website. There are some questions that has been asked I found it interesting, and I want to share them here.

2021-11-18 Growth

UoS Alumni Mindset

Self-Attention for NLP

In short, an attention-based model "focuses" on each element of the input (a word in a sentence or a different position in an image, etc.). "Focusing" means projecting different levels of attention so that the input elements are treated differently and each element of the input is weighted differently to influence the result; a non-attention model treats each element "equally".

2021-09-21 NLP

Python NLP Deep Learning

Blind Spot about Sklearn Confusion Matrix

Introduction

Compare WIKI with SKLEARN

Conclusion

References