Confusion Matrix 완전 가이드

개요 및 정의

Confusion Matrix(혼동 행렬 또는 오차 행렬)는 분류 모델의 성능을 평가하기 위한 가장 기본적이고 중요한 도구입니다[1][2]. 이 행렬은 모델이 예측한 결과와 실제 결과를 비교하여 분류 모델이 얼마나 정확하게 예측했는지, 그리고 어떤 유형의 오류가 발생했는지를 시각적으로 보여줍니다[1][3].

이진 분류에서의 Confusion Matrix 구성

기본 구조

이진 분류 문제에서 Confusion Matrix는 2×2 행렬로 구성되며, 다음 네 가지 요소로 이루어집니다[1][4]:

TP (True Positive): 실제 양성(Positive)을 양성으로 올바르게 예측한 경우
TN (True Negative): 실제 음성(Negative)을 음성으로 올바르게 예측한 경우
FP (False Positive): 실제 음성을 양성으로 잘못 예측한 경우 (1종 오류)
FN (False Negative): 실제 양성을 음성으로 잘못 예측한 경우 (2종 오류)

의학적 예시를 통한 이해

암 진단을 예로 들면[4][5]:

TP: 실제 암 환자를 암으로 진단 (정확한 진단)
TN: 실제 건강한 사람을 건강하다고 진단 (정확한 진단)
FP: 건강한 사람을 암으로 오진 (과잉 진단)
FN: 암 환자를 건강하다고 오진 (놓친 진단)

주요 성능 지표 계산

Confusion Matrix로부터 다양한 성능 지표를 계산할 수 있습니다[1][6]:

정확도 (Accuracy)

Accuracy = (TP + TN) / (TP + TN + FP + FN)

전체 예측 중 올바른 예측의 비율을 나타냅니다[1][6].

정밀도 (Precision)

Precision = TP / (TP + FP)

모델이 양성으로 예측한 것 중 실제로 양성인 비율입니다[1][6]. 스팸 메일 분류에서 중요한 지표로, 정상 메일을 스팸으로 분류하는 오류를 줄이는 데 초점을 맞춥니다[3].

재현율 (Recall/Sensitivity)

Recall = TP / (TP + FN)

실제 양성 중 모델이 양성으로 정확히 예측한 비율입니다[1][6]. 암 진단과 같이 실제 양성을 놓치면 안 되는 상황에서 중요합니다[3].

F1 Score

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

정밀도와 재현율의 조화평균으로, 두 지표를 균형있게 고려합니다[5][7]. 클래스 불균형 문제에서 특히 유용합니다[7].

다중 클래스 분류에서의 Confusion Matrix

구조와 계산

다중 클래스 분류에서는 클래스 수만큼의 N×N 행렬이 됩니다[8][9]. 각 클래스에 대해 “해당 클래스 vs 나머지 클래스”의 이진 분류로 간주하여 TP, TN, FP, FN을 계산합니다[8][9].

Macro vs Micro Average

Macro Average[10][11]:

각 클래스별로 지표를 계산한 후 평균을 구하는 방식
모든 클래스를 동등하게 취급
클래스 균형 데이터에 적합

Micro Average[10][11]:

모든 클래스의 TP, FP, FN을 합산한 후 지표를 계산
샘플 수가 많은 클래스에 더 큰 가중치 부여
클래스 불균형 데이터에 더 효과적

클래스 불균형 문제와 Confusion Matrix

정확도의 한계

클래스 불균형이 심한 경우, 정확도만으로는 모델 성능을 올바르게 평가할 수 없습니다[2][12]. 예를 들어, 99%가 음성이고 1%가 양성인 데이터에서 모든 것을 음성으로 예측해도 99%의 정확도를 얻을 수 있습니다[2][13].

대안적 평가 지표

불균형 데이터에서는 다음 지표들이 더 유용합니다[2][12]:

정밀도 (Precision)
재현율 (Recall)
F1 Score
AUC-ROC

ROC Curve와 AUC

ROC Curve

ROC (Receiver Operating Characteristic) Curve는 다양한 임계값에서 TPR(True Positive Rate)과 FPR(False Positive Rate)의 관계를 나타낸 그래프입니다[14][15]:

TPR (Sensitivity) = TP / (TP + FN)
FPR = FP / (FP + TN)

AUC (Area Under the Curve)

ROC 곡선 아래의 면적으로, 0.5에서 1.0 사이의 값을 가집니다[14][16]:

AUC = 1.0: 완벽한 분류 성능
AUC = 0.5: 무작위 분류와 같은 성능
AUC > 0.7: 일반적으로 좋은 성능으로 간주

Python을 이용한 Confusion Matrix 구현

scikit-learn 사용법

from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt

# Confusion Matrix 계산
cm = confusion_matrix(y_true, y_pred)

# 시각화
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# 성능 지표 계산
tn, fp, fn, tp = cm.ravel()
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * (precision * recall) / (precision + recall)

정규화 옵션

sklearn의 confusion_matrix 함수는 다양한 정규화 옵션을 제공합니다[17][18]:

normalize=None: 원본 카운트 (기본값)
normalize=’true’: 실제 클래스별로 정규화
normalize=’pred’: 예측 클래스별로 정규화
normalize=’all’: 전체 샘플 수로 정규화

실제 활용 사례

의료 진단

높은 Recall 중시: 암과 같은 치명적 질병에서는 실제 환자를 놓치면 안 되므로 재현율이 중요합니다[2][3]
FN 최소화: 실제 양성을 음성으로 판정하는 오류를 최소화해야 합니다

스팸 메일 분류

높은 Precision 중시: 정상 메일을 스팸으로 분류하는 오류를 줄이는 것이 중요합니다[2][3]
FP 최소화: 중요한 메일을 스팸으로 분류하는 오류를 방지해야 합니다

사기 탐지

균형잡힌 접근: 사기를 놓치는 것과 정상 거래를 사기로 분류하는 것 모두 문제가 되므로 F1 Score가 유용합니다[12]

결론

Confusion Matrix는 분류 모델의 성능을 평가하는 가장 기본적이면서도 강력한 도구입니다. 단순히 정확도만 보는 것이 아니라, 모델이 어떤 종류의 오류를 범하는지 파악하고, 비즈니스 요구사항에 맞는 적절한 성능 지표를 선택하는 것이 중요합니다[1][2]. 특히 클래스 불균형이 있는 실제 문제에서는 정밀도, 재현율, F1 Score 등의 지표를 함께 고려하여 모델의 성능을 종합적으로 평가해야 합니다[12][13].

출처
[1] 분류모델 Confusion Matrix – 파이썬으로 데이터 다루기 기초 – 위키독스 https://wikidocs.net/194087
[2] [머신러닝] 분류 모델의 성능 평가 지표 (Confusion Matrix, Accuracy … https://velog.io/@kimjo/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-v6c30r1l
[3] [멋사] [AI] 분류성능평가지표 – Confusion matrix – velog https://velog.io/@chjy100418/%EB%A9%8B%EC%82%AC-AI-%EB%B6%84%EB%A5%98%EC%84%B1%EB%8A%A5%ED%8F%89%EA%B0%80%EC%A7%80%ED%91%9C-Confusion-matrix
[4] Confusion Matrix의 손쉬운 이해 https://shinminyong.tistory.com/28
[5] F1 Score, Confusion Matrix, Precision & Recall (trade-off) 왕초보를 위한 설명 https://chanmuzi.tistory.com/137
[6] [머신러닝] 분류모형 평가 Confusion Matrix (Accuracy, Recall … https://seo-seon.tistory.com/entry/%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EB%B6%84%EB%A5%98%EB%AA%A8%ED%98%95-%ED%8F%89%EA%B0%80-Confusion-Matrix
[7] F1 스코어란? – EITCA 아카데미 https://ko.eitca.org/artificial-intelligence/eitc-ai-gcml-google-cloud-machine-learning/introduction/what-is-machine-learning/what-is-an-f1-score/
[8] 머신러닝 모델의 평가 (2. 다중 분류) https://rython.tistory.com/14
[9] 이진 분류 및 멀티 클래스 분류에서 TP, TN, FP, FN, Recall, Precision … https://cn-c.tistory.com/67
[10] 매크로 평균(Macro-average)과 마이크로 평균(Micro-average) https://euriion.com/413796/
[11] 매크로 평균(Macro-average) vs 마이크로 평균(Micro-average) https://junklee.tistory.com/116
[12] Unbalanced Classes (Machine Learning) – How to balance your data https://marini.systems/en/glossary/unbalanced-classes-machine-learning/
[13] Failure of Classification Accuracy for Imbalanced Class Distributions https://www.machinelearningmastery.com/failure-of-accuracy-for-imbalanced-class-distributions/
[14] AUC-ROC 커브 – BioinformaticsAndMe – 티스토리 https://bioinformaticsandme.tistory.com/328
[15] AUC와 ROC Curve – Data Analysis & Study – 티스토리 https://shinminyong.tistory.com/29
[16] [용어 설명] ROC curve와 AUC란? – BLOG – 티스토리 https://bioinfoblog.tistory.com/221
[17] confusion_matrix() – 파이썬으로 데이터 다루기 기초 – 위키독스 https://wikidocs.net/194464
[18] confusion_matrix — scikit-learn 1.7.0 documentation https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
[19] 혼동 행렬 (Confusion matrix) – 공예소 – 티스토리 https://d-craftshop.tistory.com/27
[20] 오차 행렬 (Confusion Matrix), 정밀도 (Precision), 재현율 (Recall) https://velog.io/@gangjoo/ML-%ED%8F%89%EA%B0%80-%EC%98%A4%EC%B0%A8-%ED%96%89%EB%A0%AC-Confusion-Matrix-%EC%A0%95%EB%B0%80%EB%8F%84-Precision-%EC%9E%AC%ED%98%84%EC%9C%A8-Recall
[21] 딥러닝 모델 평가 지표: Confusion Matrix, Accuracy, Precision, Recall … https://www.blog.data101.io/432
[22] F1 score란? – velog https://velog.io/@jadon/F1-score%EB%9E%80
[23] Confusion Matrix 이해하기 https://velog.io/@zxxzx1515/Confusion-matrix-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0
[24] confusion matrix (혼동행렬) python – DataAnalyst – 티스토리 https://signature95.tistory.com/48
[25] Confusion Matrix로 분류모델 성능평가 지표(precision, recall, f1 … https://kyull-it.tistory.com/99
[26] confusion matrix 이해하기 – 일편단씸의 블로그 https://mechurak.github.io/2023-11-25_confusion-matrix/
[27] [AI/ML] 파이썬 머신러닝 완벽가이드 (10) – 평가, Confusion Matrix … https://velog.io/@2jihan000/AIML-%ED%8C%8C%EC%9D%B4%EC%8D%AC-%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D-%EC%99%84%EB%B2%BD%EA%B0%80%EC%9D%B4%EB%93%9C-10-%ED%8F%89%EA%B0%80-Confusion-Matrix-%EC%A0%95%EB%B0%80%EB%8F%84-%EC%9E%AC%ED%98%84%EC%9C%A8
[28] Confusion Matrix(혼돈 행렬)과 분류 성능 평가 지표 – velog https://velog.io/@jjw9599/ConfusionMatrix-ClassificationEvaluation
[29] 4. 임계값 영상 (threshold) – 띠그랭 – 티스토리 https://ikso2000.tistory.com/42
[30] [OpenCV] Threshold 처리 – 영화처럼 Tistory – 티스토리 https://cho001.tistory.com/130
[31] The Role of the Confusion Matrix in Addressing Imbalanced Datasets https://opendatascience.com/the-role-of-the-confusion-matrix-in-addressing-imbalanced-datasets/
[32] 8. 스레시홀딩(Thresholding), 오츠의 알고리즘(Otsu’s Method) https://bkshin.tistory.com/entry/OpenCV-8-%EC%8A%A4%EB%A0%88%EC%8B%9C%ED%99%80%EB%94%A9Thresholding
[33] [파이썬 sklearn] 오차행렬(혼동행렬, confusion matrix) 공부하기 https://spine-sunbi.tistory.com/entry/%ED%8C%8C%EC%9D%B4%EC%8D%AC-sklearn-%EC%98%A4%EC%B0%A8%ED%96%89%EB%A0%AC%ED%98%BC%EB%8F%99%ED%96%89%EB%A0%AC-confusion-matrix-%EA%B3%B5%EB%B6%80%ED%95%98%EA%B8%B0-%ED%8F%89%EA%B0%80-%EC%A7%80%ED%91%9C-%EC%9D%B4%ED%95%B41
[34] 분류 모델 성능 평가 지표 – Confusion Matrix란? – 슈퍼짱짱 – 티스토리 https://leedakyeong.tistory.com/entry/%EB%B6%84%EB%A5%98-%EB%AA%A8%EB%8D%B8-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-%EC%A7%80%ED%91%9C-Confusion-Matrix%EB%9E%80-%EC%A0%95%ED%99%95%EB%8F%84Accuracy-%EC%A0%95%EB%B0%80%EB%8F%84Precision-%EC%9E%AC%ED%98%84%EB%8F%84Recall-F1-Score
[35] [모델 평가] Confusion matrix (TP, TN, FP, FN) 및 단일/다중 클래스 … https://neosla.tistory.com/18
[36] Classification: ROC and AUC | Machine Learning https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
[37] f1-score 종류와 의미 (macro, weighted, micro) – data-minggeul https://data-minggeul.tistory.com/11
[38] 컴프레서, Threshold 개념!?………….아.. 도저히.. ㅜㅜ – 큐오넷 https://www.cuonet.com/bbs/board.php?bo_table=qna2&wr_id=1174642