An Empirical Study of Data Scale, Model Complexity, and Input Modalities in Visual Generalization
Title: Empirical Analysis of Data Volume, Model Architecture, and Input Types on Visual Generalization
Abstract:
While modern deep neural networks have demonstrated remarkable capabilities in computer vision thanks to their extensive parameter counts and complex nonlinear hierarchical designs, traditional statistical learning theories struggle to account for their generalization capabilities. We focus on three fundamental, controllable variables that likely influence visual generalization: data scale, model complexity, and input modalities. This paper presents an empirical investigation into how these factors impact generalization performance.
Our study begins with a preliminary experiment involving a one-dimensional nonlinear function, where we manipulated the number of training samples and the polynomial degree to isolate the effects of data scale and model complexity. The primary experiments subsequently evaluate model performance on the CIFAR-10 and CIFAR-100 datasets, varying the size of the training data, the network architecture, and the type of input data.
Our findings indicate that expanding the training dataset consistently enhances generalization. In contrast, adjustments to model complexity do not yield reliable improvements. Furthermore, the removal of color information leads to a decline in performance, whereas the inclusion of explicit prior features—such as gradients, edges, and wavelets—produces mixed results that vary depending on the specific model architecture. This work offers a comprehensive empirical examination of the interplay between data scale, model complexity, input modalities, and visual generalization.
Code and experimental logs are available at: https://github.com/zlyd-CV/DeepLearning-Empirical-Studies.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




