- Understanding Deep Learning Requires Rethinking Generalization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. ICLR 2017.
대규모 신경망이 훈련 데이터에 무작위 레이블을 붙여도 완벽하게 학습할 수 있음을 보이며, 전통적 일반화 이론(Vapnik–Chervonenkis 차원, 정규화 등)만으로는 딥러닝의 일반화 성능을 설명하기 어렵다는 점을 실험 및 이론적으로 제시[1]. - Spectrally-normalized Margin Bounds for Neural Networks
Peter L. Bartlett, Dylan J. Foster, Matus J. Telgarsky. NeurIPS 2017.
스펙트럼 노름(각 층 가중치 행렬의 최대 특잇값의 곱)을 이용해 심층 네트워크의 마진 기반 일반화 경계를 도출하고, SGD 학습 모델에서 이 경계가 실제 과제 난이도와 상관관계가 있음을 실험적으로 검증[2]. - Norm-Based Capacity Control in Neural Networks
Behnam Neyshabur, Ryota Tomioka, Nathan Srebro. TTIC Tech Report, 2015.
ℓ_p/ℓ_q 규범 제약을 사용한 피드포워드 신경망의 라데마허 복잡도(Rademacher complexity)를 분석하여, 깊이에 따른 의존성을 완화하고 폭·깊이 모두와 독립적인 일반화 경계를 제시[3]. - Size-Independent Sample Complexity of Neural Networks
Noah Golowich, Alexander Rakhlin, Ohad Shamir. arXiv 2019.
각 층의 파라미터 행렬 노름 제약 하에서, 폭·깊이에 무관한(크기 독립적) 라데마허 복잡도 경계를 증명하고 이로부터 샘플 복잡도가 네트워크 크기와 무관하게 결정될 수 있음을 보임[4]. - A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs
Devansh Bisla, Apoorva Nandini Saridena, Anna Choromanska. CVPRW 2021.
DNN의 일반화 오차를 테스트 포인트와 훈련 포인트 간 특징 공간 거리로 모델링하여, 무한 차원 용량 측정을 피하고 실험적으로 검증 가능한 샘플 복잡도 추정 기법을 제안[5]. - Deep Double Descent: Where Bigger Models and More Data Hurt
Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever. ICLR 2020.
모델 크기 및 학습 에포크 수에 따른 일반화 오차가 전통적 U자 곡선뿐 아니라 과적합 경계를 넘어선 이중 하강(double descent) 형태를 보이며, 과잉 파라미터화에서 테스트 오류가 다시 감소함을 실험으로 제시[6]. - The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Ben Adlam, Jeffrey Pennington. NeurIPS 2020.
넓은 신경망의 무한 폭 극한을 설명하는 NTK(Neural Tangent Kernel) 회귀 분석을 통해, 과적합 임계점 근처에서 추가적인 하강·상승(triple descent) 행동을 고차원 비정상근사(asymptotic) 관점에서 이론적으로 규명[7]. - Overparameterized ReLU Neural Networks Learn the Simplest Model: Neural Isometry and Phase Transitions
Yifei Wang, Yixuan Hua, Emmanuel J. Candès, Mert Pilanci. IEEE Trans. Inf. Theory, Jan 2025.
가중치 감쇠(weight decay) 정규화 적용한 2층 ReLU 네트워크가 특정 임계 샘플 수 비율(n > 2d)에서 진정한 단순 선형 모델을 정보이론적 관점으로 정확 복구함을 보이며, phase transition 현상을 이론화[8]. - Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures
Paul Viallard, Rémi Emonet, Amaury Habrard, Emilie Morvant, Valentina Zantedeschi. AISTATS 2024.
임의 복잡도 측정에도 적용 가능한 분해형(disintegrated) PAC-Bayes 경계를 개발하고, Gibbs 분포 기법을 통해 과제·가설 클래스에 맞춘 일반화 오차 상한을 제시[9]. - Deterministic PAC-Bayesian Generalization Bounds for Deep Networks via Generalizing Noise-Resilience
Vaishnavh Nagarajan, Zico Kolter. ICLR 2019.
SGD가 탐색하는 평평한 평활(minima)에 대한 노이즈 회복성(noise-resilience)을 훈련 데이터에서 테스트 데이터로 확장해, 결정적·비압축 네트워크에 대한 PAC-Bayes 일반화 보장을 제공[10]. - Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Colin Wei, Tengyu Ma. NeurIPS 2019.
층 간 합성 함수의 Jacobian 노름 등 추가적인 데이터 종속적 특성을 활용해, 전통적 라데마허 경계에서 피할 수 없던 깊이 의존성을 완화하는 샘플 복잡도 상한을 제시[11]. - The Role of Over-Parametrization in Generalization of Neural Networks
Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro. ICLR 2019.
단위별 용량(unit-wise capacity) 기반 새로운 일반화 복잡도 측정을 제안하고, 네트워크 크기 증가에 따른 테스트 오류 개선(trend)을 설명할 수 있는 상한을 도출[12]. - Sample-Complexity of Estimating Convolutional and Recurrent Neural Networks
지원: Berkeley Simons Institute 강연 자료, 2019.
선형 CNN·RNN의 내재 차원(intrinsic dimension)에 비례해 샘플 복잡도가 선형으로 스케일하며, 완전 연결 네트워크에 비해 훨씬 낮음을 상한·하한 쌍으로 보임[13]. - On Size-Independent Sample Complexity of ReLU Networks
Mark Sellke. arXiv 2023.
Frobenius 노름 제약 하에서 깊이에 의존하지 않는(ReLU 네트워크 깊이에 무관한) 샘플 복잡도 상한을 유도하고, 전통적 Rademacher·VC 차원 경계를 넘어서는 이론적 개선을 제안[14].
출처
[1] Understanding deep learning requires rethinking generalization https://openreview.net/forum?id=Sy8gdB9xx
[2] Spectrally-normalized margin bounds for neural networks – NIPS https://papers.nips.cc/paper/7204-spectrally-normalized-margin-bounds-for-neural-networks
[3] [PDF] Norm-Based Capacity Control in Neural Networks http://proceedings.mlr.press/v40/Neyshabur15.pdf
[4] [PDF] Size-Independent Sample Complexity of Neural Networks – arXiv https://arxiv.org/pdf/1712.06541.pdf
[5] A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs https://openaccess.thecvf.com/content/CVPR2021W/TCV/papers/Bisla_A_Theoretical-Empirical_Approach_to_Estimating_Sample_Complexity_of_DNNs_CVPRW_2021_paper.pdf
[6] Published as a conference paper at ICLR 2020 https://openreview.net/pdf?id=B1g5sA4twr
[7] The Neural Tangent Kernel in High Dimensions: http://proceedings.mlr.press/v119/adlam20a/adlam20a.pdf
[8] ACCEPTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY, JAN 2025 https://web.stanford.edu/~pilanci/papers/Neural_Recovery.pdf
[9] Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures https://proceedings.mlr.press/v238/viallard24a.html
[10] Deterministic PAC-Bayesian generalization bounds for deep networks… https://openreview.net/forum?id=Hygn2o0qKX
[11] Data-dependent Sample Complexity of Deep Neural Networks via … https://papers.nips.cc/paper/9166-data-dependent-sample-complexity-of-deep-neural-networks-via-lipschitz-augmentation
[12] The role of over-parametrization in generalization of neural networks https://openreview.net/forum?id=BygfghAcYX
[13] Sample-complexity of Estimating Convolutional and Recurrent Neural Networks https://simons.berkeley.edu/talks/sample-complexity-estimating-convolutional-recurrent-neural-networks
[14] [PDF] On Size-Independent Sample Complexity of ReLU Networks – arXiv https://arxiv.org/pdf/2306.01992.pdf
[15] A Theoretical-Empirical Approach to Estimating Sample Complexity … https://arxiv.org/abs/2105.01867
[16] Double descent: understanding deep learning’s curve – Telnyx https://telnyx.com/learn-ai/double-descent-deep-learning
[17] [PDF] Machine Learning I Lecture 28 Sample and Model Complexity https://engineering.purdue.edu/ChanGroup/ECE595/files/Lecture28_complexity.pdf
[18] Leveraging PAC-Bayes Theory and Gibbs Distributions for https://proceedings.mlr.press/v238/viallard24a/viallard24a.pdf
[19] Lecture 21 [4mm] Sample Complexity of Neural Networks https://www.mit.edu/~9.520/fall19/slides/Class21.pdf
[20] Spectrally-normalized margin bounds for neural networks https://dl.acm.org/doi/10.5555/3295222.3295372
[21] [PDF] Generalization Bounds of Stochastic Gradient Descent for Wide and … https://arxiv.org/pdf/1905.13210.pdf
[22] Journal of Machine Learning Research 17 (2016) 1-15 https://www.jmlr.org/papers/volume17/15-389/15-389.pdf
[23] Post navigation https://blog.acolyer.org/2017/05/11/understanding-deep-learning-requires-re-thinking-generalization/
[24] The Optimal Sample Complexity of PAC Learning https://web.ics.purdue.edu/~hanneke/docs/2015/opt-pac.pdf
[25] Spectrally-normalized margin bounds for neural networks https://papers.nips.cc/paper_files/paper/2017/hash/b22b257ad0519d4500539da3c8bcf4dd-Abstract.html
[26] [PDF] vc dimensions for deep neural networks with – OpenReview https://openreview.net/pdf?id=oV72wHuRNy
[27] [PDF] Spectrally-normalized margin bounds for neural networks – arXiv https://arxiv.org/pdf/1706.08498.pdf
[28] arXiv:2411.05453v1 [stat.ML] 8 Nov 2024 http://www.arxiv.org/pdf/2411.05453.pdf
[29] [PDF] Data-dependent Sample Complexity of Deep Neural Networks via … http://papers.neurips.cc/paper/9166-data-dependent-sample-complexity-of-deep-neural-networks-via-lipschitz-augmentation.pdf
[30] Under review as submission to TMLR https://openreview.net/notes/edits/attachment?id=1zsvCaeGTD&name=pdf
[31] VC Dimensions for Deep Neural Networks with Bounded-Rank … https://dclibrary.mbzuai.ac.ae/mletd/9/
[32] [PDF] Learning One-hidden-layer ReLU Networks via Gradient Descent http://proceedings.mlr.press/v89/zhang19g/zhang19g.pdf
[33] The Shape of Generalization through the Lens of Norm-based … https://openreview.net/forum?id=mGVxHENfM6
[34] MIT Open Access Articles https://dspace.mit.edu/bitstream/handle/1721.1/138309/1712.06541.pdf?sequence=2&isAllowed=y
[35] Norm-Based Generalisation Bounds for Deep Multi-Class Convolutional Neural Networks https://ojs.aaai.org/index.php/AAAI/article/view/17007
[36] Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation https://papers.nips.cc/paper_files/paper/2019/hash/0e79548081b4bd0df3c77c5ba2c23289-Abstract.html
[37] arXiv:1712.06541v2 [cs.LG] 10 Jan 2018 https://cbmm.mit.edu/sites/default/files/publications/1712.06541.pdf
[38] [PDF] DETERMINISTIC PAC-BAYESIAN GENERALIZATION – OpenReview https://openreview.net/pdf?id=Hygn2o0qKX
답글 남기기