scaling ์ •๋ฆฌ

๐Ÿ”น 1. ๋‹จ์ˆœ ๋ฒ”์œ„ ๊ธฐ๋ฐ˜

1.1 Min-Max Scaling

  • ์ˆ˜์‹: xโ€ฒ=xโˆ’minโก(x)maxโก(x)โˆ’minโก(x)x’ = \frac{x – \min(x)}{\max(x) – \min(x)}xโ€ฒ=max(x)โˆ’min(x)xโˆ’min(x)โ€‹
  • ๋ฒ”์œ„: [0, 1]
  • ์žฅ์ : ์ง๊ด€์ , ๊ฐ€์ค‘์น˜ ํ•ด์„ ์šฉ์ด
  • ๋‹จ์ : ์•„์›ƒ๋ผ์ด์–ด์— ์ทจ์•ฝ

1.2 Max Normalization

  • ์ˆ˜์‹: xโ€ฒ=xmaxโก(โˆฃxโˆฃ)x’ = \frac{x}{\max(|x|)}xโ€ฒ=max(โˆฃxโˆฃ)xโ€‹
  • ๊ฐ€์žฅ ํฐ ๊ฐ’์„ 1๋กœ, ๋‚˜๋จธ์ง€๋ฅผ ๋น„๋ก€ ์ถ•์†Œ
  • IR์—์„œ cosine์ด๋‚˜ dot product ์Šค์ฝ”์–ด๋ฅผ ์กฐ์ •ํ•  ๋•Œ ์ž์ฃผ ์‚ฌ์šฉ

๐Ÿ”น 2. ํ‰๊ท ยท๋ถ„์‚ฐ ๊ธฐ๋ฐ˜

2.1 Z-score Standardization

  • ์ˆ˜์‹: xโ€ฒ=xโˆ’ฮผฯƒx’ = \frac{x – \mu}{\sigma}xโ€ฒ=ฯƒxโˆ’ฮผโ€‹
  • ํ‰๊ท  0, ํ‘œ์ค€ํŽธ์ฐจ 1
  • ์žฅ์ : ๋ถ„ํฌ ๋น„๊ต์— ๊ฐ•ํ•จ
  • ๋‹จ์ : heavy-tailed ๋ถ„ํฌ์—์„œ ๊ทน๋‹จ๊ฐ’ ์˜ํ–ฅ ํผ

2.2 Robust Scaling (Median & IQR)

  • ์ˆ˜์‹: xโ€ฒ=xโˆ’median(x)IQR(x)x’ = \frac{x – \text{median}(x)}{\text{IQR}(x)}xโ€ฒ=IQR(x)xโˆ’median(x)โ€‹ (IQR = Q3 – Q1)
  • ์žฅ์ : ์•„์›ƒ๋ผ์ด์–ด์— ๊ฐ•ํ•จ
  • ์ถ”์ฒœ ์‹œ์Šคํ…œ, ๋กœ๊ทธ ์ ์ˆ˜ ์กฐ์ •์— ๋งŽ์ด ์“ฐ์ž„

๐Ÿ”น 3. ๋น„์„ ํ˜• ์••์ถ•(ํ‰ํƒ„ํ™”)

3.1 ๋กœ๊ทธ ๋ณ€ํ™˜ (Log Scaling)

  • ์ˆ˜์‹: xโ€ฒ=logโก(1+x)x’ = \log(1 + x)xโ€ฒ=log(1+x)
  • ๊ธด ๊ผฌ๋ฆฌ(long-tail) ๋ถ„ํฌ ํ‰ํƒ„ํ™”
  • BM25์ฒ˜๋Ÿผ ๋ถ„ํฌ๊ฐ€ ํ•œ์ชฝ์œผ๋กœ ์ ๋ฆฐ ์ ์ˆ˜์— ์œ ๋ฆฌ

3.2 ์ œ๊ณฑ๊ทผ ๋ณ€ํ™˜ (Sqrt Scaling)

  • ์ˆ˜์‹: xโ€ฒ=xx’ = \sqrt{x}xโ€ฒ=xโ€‹
  • ๊ทน๋‹จ๊ฐ’์„ ๋ˆ„๋ฅด๊ณ  ์ค‘๊ฐ„๊ฐ’์„ ๊ฐ•์กฐ
  • Count ๊ธฐ๋ฐ˜ ์ ์ˆ˜(์ถœํ˜„๋นˆ๋„) ์กฐ์ •ํ•  ๋•Œ ์ž์ฃผ ์‚ฌ์šฉ

3.3 Sigmoid / Logistic Scaling

  • ์ˆ˜์‹: xโ€ฒ=11+eโˆ’xx’ = \frac{1}{1 + e^{-x}}xโ€ฒ=1+eโˆ’x1โ€‹
  • (-โˆž, โˆž) โ†’ (0, 1) ๋งคํ•‘
  • ๋ถ„ํฌ๋ฅผ ํ™•๋ฅ ์ฒ˜๋Ÿผ ๋ณ€ํ™˜
  • ๋žญํ‚น ์ ์ˆ˜ ์œตํ•ฉ ์‹œ “ํ™•๋ฅ  ํ•ด์„” ๊ฐ€๋Šฅ

3.4 Tanh Scaling

  • ์ˆ˜์‹: xโ€ฒ=0.5ร—(tanhโกโ€‰โฃ(0.01โ‹…(xโˆ’ฮผ))+1)x’ = 0.5 \times \left(\tanh\!\left(0.01 \cdot (x – \mu)\right) + 1\right)xโ€ฒ=0.5ร—(tanh(0.01โ‹…(xโˆ’ฮผ))+1)
  • ํ‰๊ท  ์ค‘์‹ฌ + [-1,1] ์••์ถ•
  • IR ์‹คํ—˜์—์„œ ์ •๊ทœํ™”๋œ ์ ์ˆ˜ ์Šค์ผ€์ผ๋ง์˜ ํ‘œ์ค€ ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜

๐Ÿ”น 4. ๋žญํฌ ๊ธฐ๋ฐ˜ (๊ฐ’ ๋Œ€์‹  ์ˆœ์œ„๋งŒ ์‚ฌ์šฉ)

4.1 Rank Normalization

  • ์ˆœ์œ„๋ฅผ 0~1 ์‚ฌ์ด ๊ฐ’์œผ๋กœ ๋งคํ•‘ xโ€ฒ=rank(x)Nx’ = \frac{\text{rank}(x)}{N}xโ€ฒ=Nrank(x)โ€‹
  • ์žฅ์ : ๋ถ„ํฌ ๋ฌด๊ด€, ๊ณต์ •ํ•จ
  • ๋‹จ์ : ์ ์ˆ˜ ๊ฐ„ ์ฐจ์ด๋ฅผ ๋ฒ„๋ฆผ

4.2 Reciprocal Rank Fusion (RRF)

  • ์ˆ˜์‹: score(d)=โˆ‘sโˆˆsystems1C+ranks(d)\text{score}(d) = \sum_{s \in \text{systems}} \frac{1}{C + \text{rank}_s(d)}score(d)=sโˆˆsystemsโˆ‘โ€‹C+ranksโ€‹(d)1โ€‹
  • ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰๊ธฐ ์ˆœ์œ„๋ฅผ ์œตํ•ฉํ•  ๋•Œ ๊ฐ•๋ ฅ

4.3 Borda Count

  • ๊ฐ ์ˆœ์œ„์— ์ ์ˆ˜ ๋ถ€์—ฌ (์˜ˆ: N-rank) ํ›„ ํ•ฉ์‚ฐ
  • ํˆฌํ‘œ ์ด๋ก  ๊ธฐ๋ฐ˜, ๋‹จ์ˆœํ•˜๊ณ  ์•ˆ์ •์ 

๐Ÿ”น 5. ํ•™์Šต ๊ธฐ๋ฐ˜

5.1 Platt Scaling

  • ์„ ํ˜• ๋ถ„๋ฅ˜๊ธฐ ์ ์ˆ˜๋ฅผ sigmoid ๋ณ€ํ™˜ ํ›„ ํ•™์Šต
  • ๋ณดํ†ต SVM, IR ์ ์ˆ˜๋ฅผ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ ์‚ฌ์šฉ

5.2 Isotonic Regression

  • ๋น„๋ชจ์ˆ˜์  ๋‹จ์กฐ ํšŒ๊ท€๋กœ ์ ์ˆ˜๋ฅผ ํ™•๋ฅ ๋กœ ๋ณด์ •
  • ๋ฐ์ดํ„ฐ ์ถฉ๋ถ„ํ•  ๋•Œ ํšจ๊ณผ์ 

๐Ÿ”น 6. ํ˜ผํ•ฉยทํŠน์ˆ˜ ๊ธฐ๋ฒ•

6.1 CombSUM

  • ์ •๊ทœํ™”๋œ ์ ์ˆ˜๋ฅผ ๋‹จ์ˆœ ํ•ฉ์‚ฐ

6.2 CombMNZ

  • ์ •๊ทœํ™”๋œ ํ•ฉ ร— (๋น„์˜ ๊ธฐ์—ฌ ์‹œ์Šคํ…œ ์ˆ˜)
  • ์—ฌ๋Ÿฌ ๋ชจ๋ธ์ด ๋™์˜ํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์˜ฌ๋ฆผ

6.3 Softmax Normalization

  • ์ˆ˜์‹: xiโ€ฒ=exiโˆ‘jexjx’_i = \frac{e^{x_i}}{\sum_j e^{x_j}}xiโ€ฒโ€‹=โˆ‘jโ€‹exjโ€‹exiโ€‹โ€‹
  • ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜
  • ํ•˜์ง€๋งŒ outlier์— ๋ฏผ๊ฐ โ†’ ์˜จ๋„(temperature) ์กฐ์ ˆ ์ž์ฃผ ํ•จ

โœ… ์ •๋ฆฌ

  • ๋ถ„ํฌ๊ฐ€ ํ•œ์ชฝ์— ๋ชฐ๋ ค์žˆ๋‹ค โ†’ ๋กœ๊ทธ, sqrt, sigmoid, tanh
  • ์•„์›ƒ๋ผ์ด์–ด๊ฐ€ ๋งŽ๋‹ค โ†’ Robust scaling (median/IQR)
  • ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ์œตํ•ฉํ•œ๋‹ค โ†’ Rank ๊ธฐ๋ฐ˜(RRF, Borda)
  • ํ™•๋ฅ ๋กœ ํ•ด์„ํ•˜๊ณ  ์‹ถ๋‹ค โ†’ Sigmoid, Softmax, Platt, Isotonic

์ฝ”๋ฉ˜ํŠธ

๋‹ต๊ธ€ ๋‚จ๊ธฐ๊ธฐ

์ด๋ฉ”์ผ ์ฃผ์†Œ๋Š” ๊ณต๊ฐœ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•„์ˆ˜ ํ•„๋“œ๋Š” *๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค