๐Ÿง  ์ „์ดํ•™์Šต์˜ ์ •ํ™•๋„, ํƒ€๋‹น๋„์ผ๊นŒ ์‹ ๋ขฐ๋„์ผ๊นŒ?

“๋ชจ๋ธ ์ •ํ™•๋„๊ฐ€ ๋†’์€๋ฐ๋„ ์„ฑ๋Šฅ์ด ๋ฏฟ์Œ์ง์Šค๋Ÿฝ์ง€ ์•Š๋‹ค๊ณ  ๋А๋‚€ ์  ์žˆ์œผ์‹ ๊ฐ€์š”?”

๋”ฅ๋Ÿฌ๋‹์—์„œ accuracy๋Š” ๊ฐ€์žฅ ๋„๋ฆฌ ์“ฐ์ด๋Š” ์„ฑ๋Šฅ ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค. ํŠนํžˆ ์ „์ดํ•™์Šต(transfer learning)์—์„œ๋Š” ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ƒˆ๋กœ์šด ์ž‘์—…์— ์ ์šฉํ•  ๋•Œ ์ •ํ™•๋„๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ณค ํ•˜์ฃ .

ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ ํ•œ ๋ฒˆ์ฏค ์งˆ๋ฌธํ•ด๋ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ „์ดํ•™์Šต์—์„œ ์šฐ๋ฆฌ๊ฐ€ ๋ณด๋Š” accuracy๋Š”
๊ณผ์—ฐ **ํƒ€๋‹น๋„(validity)**์ผ๊นŒ์š”, ์•„๋‹ˆ๋ฉด **์‹ ๋ขฐ๋„(reliability)**์ผ๊นŒ์š”?


๐ŸŽฏ ํƒ€๋‹น๋„์™€ ์‹ ๋ขฐ๋„, ๋ฌด์—‡์ด ๋‹ค๋ฅธ๊ฐ€์š”?

์‹ฌ๋ฆฌํ•™, ๊ต์œกํ•™ ๋“ฑ์—์„œ ์ž์ฃผ ์“ฐ์ด๋Š” ๊ฐœ๋…์ด์ง€๋งŒ, ๋จธ์‹ ๋Ÿฌ๋‹ ํ‰๊ฐ€์—์„œ๋„ ์ค‘์š”ํ•œ ๊ธฐ์ค€์ด ๋ฉ๋‹ˆ๋‹ค.

๊ฐœ๋…์„ค๋ช…์˜ˆ์‹œ
ํƒ€๋‹น๋„ (Validity)๋ชจ๋ธ์ด ์˜๋„ํ•œ ์ž‘์—…์„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋Š”๊ฐ€๊ณ ์–‘์ด vs ๊ฐœ ๋ถ„๋ฅ˜์—์„œ 95% ์ •ํ™•๋„๋ผ๋ฉด ๋†’์€ ํƒ€๋‹น๋„
์‹ ๋ขฐ๋„ (Reliability)๊ฐ™์€ ์ž…๋ ฅ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ๋•Œ ๊ฒฐ๊ณผ๊ฐ€ ์ผ๊ด€์ ์ธ๊ฐ€๋งค๋ฒˆ ์‹คํ—˜ํ•  ๋•Œ๋งˆ๋‹ค accuracy๊ฐ€ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜์˜ค๋Š”๊ฐ€

๐Ÿ” Accuracy๋Š” โ€˜ํƒ€๋‹น๋„โ€™๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค

์ „์ดํ•™์Šต์—์„œ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹์— fine-tuningํ•œ ํ›„ ์–ป๋Š” accuracy๋Š” ๊ทธ ๋ชจ๋ธ์ด ํ•ด๋‹น task๋ฅผ ์ œ๋Œ€๋กœ ์ˆ˜ํ–‰ํ•˜๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ์ง€ํ‘œ, ์ฆ‰ **ํƒ€๋‹น๋„(validity)**์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ImageNet์œผ๋กœ ์‚ฌ์ „ํ•™์Šต๋œ ResNet ๋ชจ๋ธ์„ ์˜๋ฃŒ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์— ์ „์ดํ•™์Šตํ•œ ํ›„ 90% ์ •ํ™•๋„๋ฅผ ๋ณด์˜€๋‹ค๋ฉด:

  • โ†’ ์ด ๋ชจ๋ธ์€ ์˜๋ฃŒ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ์ž‘์—…์—๋„ “ํƒ€๋‹นํ•˜๋‹ค”๊ณ  ํŒ๋‹จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿค” ์‹ ๋ขฐ๋„๋Š” ์–ด๋–ป๊ฒŒ ๋ณผ ์ˆ˜ ์žˆ์„๊นŒ?

์ „์ดํ•™์Šต์—์„œ์˜ ์‹ ๋ขฐ๋„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค:

  1. ๋ฐ˜๋ณต ์‹คํ—˜ (Repeated trials)
    โ†’ ํ•™์Šต์„ ์—ฌ๋Ÿฌ ๋ฒˆ ๋ฐ˜๋ณตํ–ˆ์„ ๋•Œ accuracy๊ฐ€ ํฌ๊ฒŒ ๋‹ฌ๋ผ์ง€์ง€ ์•Š์œผ๋ฉด ์‹ ๋ขฐ๋„๊ฐ€ ๋†’๋‹ค๊ณ  ๋ด…๋‹ˆ๋‹ค.
  2. ๊ต์ฐจ ๊ฒ€์ฆ (Cross-validation)
    โ†’ Fold๋งˆ๋‹ค ์ •ํ™•๋„๊ฐ€ ํฐ ์ฐจ์ด๋ฅผ ๋ณด์ด์ง€ ์•Š๋Š”๋‹ค๋ฉด ์ผ๊ด€์„ฑ์ด ์žˆ๋‹ค๊ณ  ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.
  3. Confusion Matrix, F1-score ๋“ฑ ๋ณด์กฐ ์ง€ํ‘œ ํ™œ์šฉ
    โ†’ ํŠน์ • ํด๋ž˜์Šค์—๋งŒ ๊ณผ๋„ํ•˜๊ฒŒ ํŽธํ–ฅ๋˜์–ด ์žˆ๋‹ค๋ฉด, ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  4. Validation Curve, Learning Curve ๋ถ„์„
    โ†’ ๋ถˆ์•ˆ์ •ํ•œ ํ•™์Šต ํŒจํ„ด(๊ณผ์ ํ•ฉ/๊ณผ์†Œ์ ํ•ฉ)์€ ๋‚ฎ์€ ์‹ ๋ขฐ๋„๋ฅผ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

โœ… ์š”์•ฝ: ์ „์ดํ•™์Šต ์„ฑ๋Šฅํ‰๊ฐ€, ์ด๋ ‡๊ฒŒ ๋ณด์„ธ์š”

์ง€ํ‘œํ•ด์„ํƒ€๋‹น๋„ or ์‹ ๋ขฐ๋„
Accuracyํ•ด๋‹น ์ž‘์—…์„ ์ž˜ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ๋Š”๊ฐ€ํƒ€๋‹น๋„
์‹คํ—˜ ๋ฐ˜๋ณต ํ›„ ๋ถ„์‚ฐ๊ฒฐ๊ณผ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ผ๊ด€์ ์ธ๊ฐ€์‹ ๋ขฐ๋„
Confusion Matrix ๋ถ„์„ํŠน์ • ํด๋ž˜์Šค์—๋งŒ ๊ฐ•ํ•œ๊ฐ€์‹ ๋ขฐ๋„ ๋ณด์™„
Cross-validationFold๋งˆ๋‹ค ์ •ํ™•๋„๊ฐ€ ์•ˆ์ •์ ์ธ๊ฐ€์‹ ๋ขฐ๋„

๐Ÿ“Œ ๋งˆ๋ฌด๋ฆฌํ•˜๋ฉฐ

๋‹จ ํ•˜๋‚˜์˜ accuracy ์ˆ˜์น˜๋งŒ์œผ๋กœ ๋ชจ๋ธ์˜ “์ข‹๊ณ  ๋‚˜์จ”์„ ํŒ๋‹จํ•˜๊ธฐ์—” ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์ดํ•™์Šต์—์„œ๋Š” ํŠนํžˆ ๋‹ค์Œ ๋‘ ๊ฐ€์ง€๋ฅผ ํ•จ๊ป˜ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

  • ํƒ€๋‹น๋„: ๋‚ด๊ฐ€ ์›ํ•˜๋Š” ์ž‘์—…์— ๋งž๋Š”๊ฐ€?
  • ์‹ ๋ขฐ๋„: ๊ฒฐ๊ณผ๊ฐ€ ์ผ๊ด€์ ์œผ๋กœ ์žฌํ˜„๋˜๋Š”๊ฐ€?

๋”ฅ๋Ÿฌ๋‹์˜ ์„ฑ๋Šฅ ํ‰๊ฐ€๋„ ๊ฒฐ๊ตญ์€ ์ •ํ™•์„ฑ๊ณผ ์‹ ๋ขฐ์„ฑ์˜ ๊ท ํ˜•์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค.

์ฝ”๋ฉ˜ํŠธ

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค