Volume 1,Issue 4
Fall 2025
基于LightGBM模型的信贷违约概率预测研究
信用评级是信贷业务的核心,为此各种统计建模方法应运而生. 随着大数据时代的到来,收集数据的范围显著扩大,可用于信用评级的特征数量也随之增加.这些带来了特征冗余的风险, 因此特征选择是建模过程中至关重要的一步.本文提出了一种两阶段信用评分建模方法.首先对全部特征进行基于Mean Variance的独立性检验, 进行初步筛选, 然后采用基于LightGBM的分类模型得到最终的违约概率预测模型.此外, 我们构建了一个虚拟特征,用于检测模型中是否仍然存在冗余特征.最后,将该方法应用于实际的在线信贷业务数据,以评估该方法的有效性。
[1] KE G, MENG Q, FINLEY T, et al. LightGBM: A highly efficient gradient
boosting decision tree[J]. Advances in Neural Information Processing Systems, 2017,
30: 3146-3154.
[2] BANASIK J, CROOK J, THOMAS L. Sample selection bias in credit scoring
models[J]. Journal of the Operational Research Society, 2003, 54(8): 822-832.
[3] CHEN G G, ÅSTEBRO T. Bound and collapse Bayesian reject inference for credit
scoring[J]. Journal of the Operational Research Society, 2012, 63(10): 1374-1387.
[4] FENG X, XIAO Z, ZHONG B, et al. Dynamic ensemble classification for credit
scoring using soft probability[J]. Applied Soft Computing, 2018, 65: 139-151.
[5] DIRICK L, CLAESKENS G, JERUSALEM G, et al. Macro-economic factors in
credit risk calculations: including time-varying covariates in mixture cure models[J].
Journal of Business & Economic Statistics, 2019, 37(1): 40-53.
[6] FANG F, CHEN Y. A new approach for credit scoring by directly maximizing the
Kolmogorov-Smirnov statistic[J]. Computational Statistics & Data Analysis, 2019,
133: 180-194.
[7] SHEN F, ZHAO X, KOU G. Three-stage reject inference learning framework for
credit scoring using unsupervised transfer learning and three-way decision theory[J].
Decision Support Systems, 2020, 137: 113366.
[8] KOZODOI N, JACOB J, LESSMANN S. Fairness in credit scoring: Assessment,
implementation and profit implications[J]. European Journal of Operational Research,
2022, 297(3): 1083-1094.
[9] MUSHAVA J, MURRAY M. A novel XGBoost extension for credit scoring class
imbalanced data combining a generalized extreme value link and a modified focal loss
function[J]. Expert Systems with Applications, 2022, 202: 117233.
[10] HE H, ZHANG S, SHEN F, et al. A privacy-preserving decentralized credit
scoring method based on multi-party information[J]. Decision Support Systems,
2023, 166: 113910.
[11] CHATTERJEE S, CORBAE D, NAKAJIMA M, et al. A quantitative theory of
the credit score[J]. Econometrica, 2023, 91(5): 1803-1840.
[12] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of
the Royal Statistical Society: Series B (Statistical Methodology), 1996, 58(1): 267
288.
[13] FAN J, LI R. Variable selection via nonconcave penalized likelihood and its oracle
properties[J]. Journal of the American Statistical Association, 2001, 96(456): 1348
1360.
[14] CUI H, LI R, ZHONG W. Model-free feature screening for ultrahigh dimensional
discriminant analysis[J]. Journal of the American Statistical Association, 2015,
110(510): 630-641.
[15] 陈秋华, 杨慧荣, 崔恒建. 变量筛选后的个人信贷评分模型与统计学习[J]. 数理统
计与管理, 2020, 39(2): 13.
[16] 王冠鹏, 秦双燕, 崔恒建. 员工流失的影响因素分析与预测[J]. 系统科学与数学,
2022, 42(6): 1616-1632.1