샘플 대비 파라미터 비율에 주목한 주요 논문

모델의 절대적 샘플 수(n)보다 모델 파라미터 수(p) 대비 샘플 비율(p/n)이 일반화 성능을 결정짓는 핵심 지표임을 밝힌 대표적 연구들은 다음과 같습니다.

논문 제목저자·연도주요 내용
Reconciling modern machine learning practice and the bias–variance trade-off[1]Belkin et al., PNAS 2019모델 용량(capacity)을 p/n 비율로 측정하여, p≈n 일 때 테스트 오차가 급격히 증가함을 실험적으로 제시하고, p≫n 영역에서 오히려 성능이 회복(“double descent”)된다는 일반화 곡선을 제안.
Two Models of Double Descent for Weak Features[2]Belkin, Hsu & Xu, SIAM J. Math. Data Sci. 2020최소제곱 회귀 모델을 이용해 p/n≈1일 때 위험도가 최대가 되고, p/n이 증가할수록 위험도가 다시 감소함을 수학적으로 엄밀 분석.
Deep double descent: Where bigger models and more data hurt[3]Nakkiran et al., J. Stat. Mech. 2021딥러닝에서 모델 크기와 학습 epoch 관점 모두에서 p/n 비율 기반 “double descent” 현상을 관찰·정의하고, 효과적 모델 복잡도(effective complexity)를 제안.
Deep Double Descent[4]Nakkiran et al., ICLR 2020딥넷에서도 p/n 비율에 따른 일반화 성능 곡선이 U자 형태를 넘어 두 번의 하강을 보이며, 인터폴레이션 임계(p/n=1)가 핵심임을 규명.

이들 논문은 모두
1) 인터폴레이션 임계점 p/n≈1에서 일반화 오류가 극대화되고,
2) 그 이후 p/n 비율을 키우면 성능이 회복(“double descent”)된다는 점에서,
절대적 샘플 수가 아니라 파라미터 대비 샘플 비율이 일반화 성능을 가늠하는 더 중요한 지표임을 보여줍니다.

출처
[1] Reconciling modern machine learning practice and the bias … https://arxiv.org/abs/1812.11118
[2] [PDF] Two Models of Double Descent for Weak Features https://par.nsf.gov/servlets/purl/10290019
[3] [PDF] Deep double descent: where bigger models and more data hurt | Semantic Scholar https://www.semanticscholar.org/paper/Deep-double-descent:-where-bigger-models-and-more-Nakkiran-Kaplun/ea415809bf87ef4b99966c6c50de6cb996a02a97
[4] Meta-Learning for Relative Density-Ratio Estimation | OpenReview https://openreview.net/forum?id=NBpwZs6sm2
[5] [PDF] Meta-Learning for Relative Density-Ratio Estimation https://papers.neurips.cc/paper_files/paper/2021/file/ff49cc40a8890e6a60f40ff3026d2730-Paper.pdf
[6] Published as a conference paper at ICLR 2025 http://arxiv.org/pdf/2503.04111.pdf
[7] What is the optimal ratio of sample size to the number of parameters in a multiple regression? https://stats.stackexchange.com/questions/60955/what-is-the-optimal-ratio-of-sample-size-to-the-number-of-parameters-in-a-multip
[8] On Tighter Generalization Bounds for Deep Neural Networks: CNNs, ResNets, and Beyond https://ar5iv.labs.arxiv.org/html/1806.05159
[9] An Introduction to Relative Distribution Methods http://www.stat.ucla.edu/~handcock/RelDist/csde01.pdf
[10] [R] References on the generalization theory of neural networks https://www.reddit.com/r/MachineLearning/comments/jjni5b/r_references_on_the_generalization_theory_of/
[11] Copyright c⃝August 27, 2020 by NEH http://users.stat.umn.edu/~helwig/notes/ParameterEstimation.pdf
[12] GENERALIZATION BOUNDS FOR NEURAL NETWORKS https://courses.engr.illinois.edu/ece543/sp2019/projects/siqim2.pdf
[13] R: Calculating difference between proportions for categorical data https://stackoverflow.com/questions/66840036/r-calculating-difference-between-proportions-for-categorical-data
[14] Generalization in neural networks: a broad survey – arXiv https://arxiv.org/html/2209.01610v3
[15] Train longer, generalize better: closing the generalization gap in large batch training of neural networks https://proceedings.neurips.cc/paper_files/paper/2017/file/a5e0ff62be0b08456fc7f1e88812af3d-Paper.pdf
[16] Published as a conference paper at ICLR 2025 https://openreview.net/pdf/a9db28432ea19d3164280c9f18a71e7040dbff33.pdf
[17] [PDF] Benign Overfitting in Deep Neural Networks under Lazy Training https://proceedings.mlr.press/v202/zhu23h/zhu23h.pdf
[18] [PDF] A U-turn on Double Descent: Rethinking Parameter Counting in … https://proceedings.neurips.cc/paper_files/paper/2023/file/aec5e2847c5ae90f939ab786774856cc-Paper-Conference.pdf
[19] How many parameters are appropriate for a neural network trained … https://www.reddit.com/r/learnmachinelearning/comments/1fq6513/how_many_parameters_are_appropriate_for_a_neural/
[20] [PDF] Generalization in Deep Learning https://lis.csail.mit.edu/pubs/kawaguchi-techreport18.pdf
[21] Sample Complexity Bounds for Recurrent Neural Networks with Application to Combinatorial Graph Problems https://arxiv.org/pdf/1901.10289.pdf
[22] A Classical View on Benign Overfitting: The Role of Sample Size https://arxiv.org/pdf/2505.11621.pdf
[23] Exact expressions for double descent and implicit regularization https://arxiv.org/pdf/1912.04533.pdf
[24] Double Descent: new approach of bias-variance trade-off https://trivia-starage.tistory.com/239
[25] Power and Sample Size for https://www.ohsu.edu/sites/default/files/2023-11/Power%20and%20Sample%20Size%20for%20RNA-seq%20Experiments.pdf
[26] [PDF] Generalization Bounds via Convex Analysis http://cs.bme.hu/~gergo/files/LN22.pdf
[27] Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target https://arxiv.org/html/2405.13787v1/
[28] An In-depth Analysis through the Lens of Learned Feature Space https://arxiv.org/html/2310.13572v2
[29] A Practical Guide to Analyzing Nucleic Acid Concentration and Purity with Microvolume Spectrophotometers https://www.neb.com/-/media/nebus/files/application-notes/technote_mvs_analysis_of_nucleic_acid_concentration_and_purity.pdf?rev=c24cea043416420d84fb6bf7b554dbbb
[30] Generalization Bounds and Stability https://ocw.mit.edu/courses/9-520-statistical-learning-theory-and-applications-spring-2006/9a5f87123d8e36531b5959b031920fa8_class14.pdf
[31] Understanding Why Neural Networks Generalize Well Through … https://openreview.net/forum?id=HyevIJStwH
[32] Double descent – Wikipedia https://en.wikipedia.org/wiki/Double_descent
[33] Wide and deep neural networks achieve consistency for classification https://www.pnas.org/doi/10.1073/pnas.2208779120
[34] Investigating the Impact of Model Width and Density on … – arXiv https://arxiv.org/html/2208.08003v5
[35] Rethinking density ratio estimation based hyper-parameter optimization – PubMed https://pubmed.ncbi.nlm.nih.gov/39581044/
[36] [PDF] Overparameterization Improves Robustness to Covariate Shift in … https://proceedings.neurips.cc/paper/2021/file/73fed7fd472e502d8908794430511f4d-Paper.pdf
[37] [PDF] Generalization Bounds via Convex Analysis https://proceedings.mlr.press/v178/lugosi22a/lugosi22a.pdf
[38] The Double Descent Behavior in Two Layer Neural Network … – arXiv https://arxiv.org/html/2504.19351v1
[39] Towards Sample-efficient Overparameterized https://par.nsf.gov/servlets/purl/10312039
[40] On the Benefits of Over-parameterization for Out-of-Distribution … https://arxiv.org/abs/2403.17592
[41] Introduction to Statistical Learning Theory – Lecture 10 https://www.wisdom.weizmann.ac.il/~/ethanf/teaching/ItSLT_16/lectures/lec10_no_anim.pdf
[42] [PDF] The double descent phenomenon – Generalization of … – Geelon So https://geelon.github.io/assets/talks/double-descent-handout.pdf
[43] Not All Samples Are Created Equal: Deep Learning with Importance Sampling https://proceedings.mlr.press/v80/katharopoulos18a/katharopoulos18a.pdf
[44] [PDF] Two models of double descent for weak features – arXiv https://arxiv.org/pdf/1903.07571.pdf
[45] [1903.07571] Two models of double descent for weak features – arXiv https://arxiv.org/abs/1903.07571
[46] Two Models of Double Descent for Weak Features – SIAM.org https://epubs.siam.org/doi/10.1137/20M1336072
[47] I tried this small wall charger from Belkin and it was super-useful, albeit a little underpowered https://www.techradar.com/phones/phone-accessories/belkin-boostcharge-pro-gan-dual-wall-charger-45w-review
[48] Belkin BoostCharge Pro 2-in-1 Review: Minimal MagSafe Made For StandBy https://www.howtogeek.com/belkin-boostcharge-pro-magsafe-2-in-1-review/
[49] Belkin Double N+ Wireless Router F6D6230au4 review: Belkin Double N+ Wireless Router F6D6230au4 https://www.cnet.com/reviews/belkin-double-n-plus-wireless-router-f6d6230au4-review/
[50] Belkin Power Bank Review: A Reliable Powerhouse with a Few Flaws https://www.youtube.com/watch?v=JMtZvuWTPF8
[51] Published as a conference paper at ICLR 2020 http://arxiv.org/pdf/2001.07384.pdf
[52] Reconciling modern machine learning practice and the bias … – ar5iv https://ar5iv.labs.arxiv.org/html/1812.11118
[53] [PDF] DEEP DOUBLE DESCENT: WHERE BIGGER MODELS AND MORE … https://openreview.net/pdf/2313f8e4c1bbcb174b1e34904fbc5f638c589efa.pdf
[54] STATS 200: Introduction to Statistical Inference https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture22.pdf
[55] Reconciling modern machine-learning practice and the classical … https://www.pnas.org/doi/10.1073/pnas.1903070116
[56] Published as a conference paper at ICLR 2020 https://openreview.net/pdf?id=B1g5sA4twr
[57] Great product – Belkin Dual USB Charger 24W (Dual USB Wall Charger for iPhone 13, 12, 11, Pro, Pro https://www.youtube.com/watch?v=GgKUdVuIkjM
[58] I Tested EVERY Belkin Charger And Was SHOCKED / MagSafe / Qi 2 Chargers https://www.youtube.com/watch?v=tzH0s0JBuQ0
[59] Reconciling modern machine-learning practice and the … – PNAS https://www.pnas.org/doi/abs/10.1073/pnas.1903070116

코멘트

답글 남기기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다