总的来说,模型选择最关注两个问题:一是选择模型的预测准确性,又称有效性;一是选择模型与真实模型的相合性,也可称为可解释性。例如在惩罚因子模型选择法中,NG方法、LASSO方法、EN方法等属于预测指向型方法;而Adaptive LASSO、SCAD方法等则属于解释指向型的范畴。调整参数的选择方法多依据交叉验证的思想得来,如舍一验证、广义交叉验证、广义近似交叉验证等都是典型的预测指向型方法;但也有少数方法如B类广义近似舍一验证、稳定路径法等从选择模型的相合性角度选取调整参数,因而属于解释指向型方法。在处理实际问题时,应根据实际需要选取相应的方法。
[1]Fan J, Li R. Statistical challenges with high dimensionality: Feature selection in knowledge discovery[A]. In: Sanz-Sole M, Soria J, Varona J L,et al, eds. Proceedings of the International Congress of Mathematicians[C]. Zurich: European Mathematical Society, 2006, 3: 595-622.
[2]Claeskens G, Hjort N L. Model Selection and Model Averaging[M]. Cambridge University Press, 2008.
[3]Hocking R R. The analysis and selection of variables in linear regression[J]. Biometrics, 1976, 32:1-49.
[4]Guyon I, Elisseeff A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research, 2003, 3: 1157-1182.
[6]Li X, Xu R. High-Dimensional Data Analysis in Cancer Research[M]. Springer, 2009.
[7]Hesterberg T, Choi N H, Meier L, Fraley C. Least angle and penalized regression: A review[J]. Statistics Surveys, 2008, 2: 61-93.
[8]Fan J, Lv J. A selective overview of variable selection in high dimensional feature space[J]. Statistica Sinica, 2010, 20: 101-148.
[9]Bertin K, Lecue G. Selection of variables and dimension reduction in high-dimensional non-parametric regression[J]. Electronic Journal of Statistics, 2008, 2: 1224-1241.
[10]Li R, Liang H. Variable selection in semiparametric regression modeling[J]. Annals of Statistics, 2008, 36: 261-286.
[11]Tibshirani R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society(Series B), 1996, 58: 267-288.
[12]Breiman L. Better subset regression using the nonnegative garrote[J]. Technometrics, 1995, 37:373-384.
[13]Yuan M, Lin Y. On the non-negative garrote estimator[J]. Journal of the Royal Statistical Society(Series B), 2007, 69: 143-161.
[14]Fu W J. Penalized regressions: The bridge versus the lasso[J]. Journal of Computational and Graphical Statistics, 1998, 7: 397-416.
[15]Efron B, Hastie T, Johnstone I M, Tibshirani R. Least angle regression[J]. Annals of Statistics, 2004, 32: 407-499.
[16]Zhao P, Yu B. On model selection consistency of lasso[J]. Journal of Machine Learning Research, 2006, 7: 2541-2563.
[17]Meinshausen N. Lasso with relaxation[J]. Computational Statistics and Data Analysis, 2007, 52: 374-393.
[18]Zou H. The adaptive lasso and its oracle properties[J]. Journal of the American Statistical Association, 2006, 101: 1418-1429.
[19]Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K. Sparsity and smoothness via the fused lasso[J]. Journal of the Royal Statistical Society(Series B), 2005, 67:91-108.
[20]Zou H, Hastie T. Regularization and variable selection via the Elastic Net[J]. Journal of the Royal Statistical Society(Series B), 2005, 67: 301-320.
[21]Zou H, Zhang H H. On the adaptive elastic-net with a diverging number of parameters[J]. Annals of Statistics, 2009, 37: 1733-1751.
[22]Yuan M, Lin Y. Model selection and estimation in regression with grouped variables[J]. Journal of the Royal Statistical Society(Series B), 2006, 68: 49-67.
[23]Zhao P, Rocha G, Yu B. The composite absolute penalties family for grouped and hierarchical variable selection[J]. Annals of Statistics, 2009, 37: 3468-3497.
[24]Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties[J]. Journal of the American Statistical Association, 2001, 96: 1348-1360.
[25]Leeb H, Ptscher B M. Sparse estimators and the oracle property, or the return of Hodges' estimator[J]. Journal of Econometrics, 2008a, 142:201-211.
[26]Leeb H, Ptscher B M. Model selection and inference: Facts and fiction[J]. Econometric Theory, 2005, 21: 21-59.
[27]Leeb H, Ptscher B M. Can one estimate the unconditional distribution of post-model-selection estimators?[J]. Econometric Theory, 2008b, 24: 338-376.
[28]Candes E, Tao T. The Dantzig selector: Statistical estimation when p is much lager than n[J]. Annals of Statistics, 2007, 35: 2313-2351.
[29]Meinshausen N, Rocha G, Yu B. Discussion: A tale of three cousins: LASSO, L2 boosting and Dantzig[J]. Annals of Statistics, 2007, 35: 2373-2384.
[30]Meinshausen N, Yu B. Lasso-type recovery of sparse representations for high-dimensional data[J]. Annals of Statistics, 2008, 37: 246-270.
[31]Massy W. Principal components regression in exploratory statistical research[J]. Journal of the American Statistical Association, 1965, 60: 234-256.
[32]Naes T, Martens H. Principal component regression in NIR analysis: Viewpoints, background details and selection of components[J]. Journal of Chemometrics, 1988, 2: 155-167.
[33]Wold H. Estimation of principal components and related models by iterative least squares[J]. Multivariate Analysis, 1966: 391-420.
[34]Tobias R D. An introduction to partial least squares regression[R/OL]. Technical Report at SAS Institute, http://support.sas.com/techsup/technote/ts509.pdf, 1997.
[35]Hoskuldsson A. PLS regression methods[J]. Journal of Chemometrics, 1988, 2:211.
[36]Jong S. SIMPLS: An alternative approach to partial least squares regression[J]. Chemometrics and Intelligent Laboratory Systems, 1993, 18: 251.
[37]Stone M, Brooks R. Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression[J]. Journal of the Royal Statistical Society(Series B), 1990, 52: 237-269.
[38]Datta S, Le-Rademacher J, Datta S. Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO[J]. Biometrics, 2007, 63: 259-271.
[39]Nguyen D, Rocke D. On partial least squares dimension reduction for microarray-based classification: A simulation study[J]. Computational Statistics and Data Analysis, 2004, 46: 407-425.
[40]Nguyen D. Partial least squares dimension reduction for microarray gene expression data with a censored response[J]. Mathematical Biosciences. 2005, 193: 119-137.
[41]Cook D, Ni L. Sufficient dimension reduction via inverse regression: A minimum discrepancy approach[J]. Journal of the American Statistical Association, 2005, 100: 410-428.
[42]Adragni K P, Cook D. Sufficient dimension reduction and prediction in regression[J]. Philos Transact A Math Phys Eng Sci, 2009, 367: 4385-4405.
[43]Yoo J K, Cook D. Optimal sufficient dimension reduction for the conditional mean in multivariate regression[J]. Biometrika, 2007, 94:231-242.
[44]Li K C. Sliced inverse regression for dimension reduction[J]. Journal of the American Statistical Association, 1991, 86: 316-342.
[45]Cook D, Forzani L. Likelihood-based sufficient dimension reduction[J]. Journal of the American Statistical Association, 2009, 104: 197-208.
[46]Golub G H, Heath M, Wahba G. Generalized cross-validation as a method for choosing a good ridge parameter[J]. Technometics, 1979, 21: 215-223.
[47]Li K C. Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing[J]. Annals of Statistics, 1986, 14:1101-1112.
[48]Hutchinson M. A stochastic estimator for the trace of the influence matrix for Laplacian smoothing splines[J]. Communications in Statistics, 1989, 18: 1059-1076.
[49]Golub G H, VonMatt U. Generalized cross-validation for large-scale problems[J]. Journal of Computational and Graphical Statistics, 1997, 6: 1-34.
[50]Xiang D, Wahba G. A generalized approximate cross validation for smoothing splines with non-Gaussian data[J]. Statistica Sinica, 1996, 6: 675-692.
[51]Wu Y, Boos D D, Stefanski L A. Controlling variable selection by the addition of pseudo variables[J]. Journal of the American Statistical Association, 2007, 102: 235-243.
[52]Luo X, Stefanski L A, Boos D D. Tuning variable selection procedures by adding noise[J]. Technometrics, 2006, 48: 165-175.
[53]Boos D D, Stefanski L A, Wu Y. Fast FSR variable selection with applications to clinical trials[J]. Biometrics, 2009, 65: 692-700.
[54]Chen Y. False selection rate methods in the Cox proportional hazards model[D/OL]. Ph. D Thesis in NCSU, http://gradworks.umi.com/32/33/3233025.html,2009.
[55]Meinshausen N, Buhlmann P. Stability selection[J]. Journal of the Royal Statistical Society(Series B), 2010, 72: 1-32.
[56]Lv J, Fan Y. A unified approach to model selection and sparse recovery using regularized least squares[J]. Annals of Statistics, 2009, 37: 3498-3528.
[57]Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of LASSO and Dantzig selector[J]. Annals of Statistics, 2009, 37: 1705-1732.
[58]Donoho D L. High-dimensional data analysis: The curses and blessings of dimensionality[R/OL]. Aide-Memoire of the lecture in AMS conference. http://www-stat.stanford.edu/donoho/Lectures/AMS2000/AMS2000.html,2000.
[59]Bickel P. J, Ritov Y, Tsybakov A B. Hierarchical selection of variables in sparse high-dimensional regression[A]. In: Berger J O, Cai T T, Johnstone I M, eds. In Borrowing Strength: Theory Powering Applications—A Festschrift for Lawrence D. Brown[C]. IMS Collections, 2008, 6: 56-69.
[60]Greenshtein E. Best subset selection, persistence in high-dimensional statistical learning and optimization under L1-constraint[J]. Annals of Statistics, 2006, 34: 2367-2386.
[61]Greenshtein E, Ritov Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization[J]. Bernoulli, 2004, 10: 971-988.
[62]Cai J, Fan J, Li R, Zhou H. Variable selection for multivariate failure time data[J]. Biometrika, 2005, 92: 303-316.
[63]Liang H, Li R. Variable selection for partially linear models with measurement errors[J]. Journal of the American Statistical Association, 2009, 104: 234-248.^