Journal: Journal of Computer Science and Engineering Research (JCSER), Volume:2, Issue:1, Pages: 1-8 Download pdf
Authors: Deepa Shukla, Sunil Gupta
Date: 3-2025
Abstract: Credit scoring is a fundamental component of financial decision-making, enabling institutions to evaluate the creditworthiness of individuals and manage risk effectively. However, traditional credit scoring models, heavily reliant on historical credit data, often exclude thin-file consumers—individuals with little or no formal credit history—thereby limiting financial inclusion. This paper presents a comprehensive review of alternative datasets and machine learning (ML) techniques as innovative solutions to this challenge. Alternative datasets, such as social media activity, web browsing behaviours, digital footprints, telecom usage, and hybrid approaches, offer a broader perspective on consumer behaviours and financial reliability. When integrated with advanced ML algorithms, including neural networks, support vector machines, ensemble methods, and hybrid models, these datasets provide enhanced predictive capabilities, addressing data sparsity and capturing complex patterns in consumer behaviours. The findings underscore the potential of hybrid models that combine multiple datasets to achieve superior performance in credit risk assessment. This review also highlights critical challenges, such as data privacy, bias mitigation, and model interpretability, which remain significant barriers to the widespread adoption of alternative datasets and ML models. By synthesizing insights from over 75 studies spanning two decades (2000–2023), this research identifies key trends, evaluates the effectiveness of various approaches, and suggests actionable recommendations for future work. The implications of this review extend to financial institutions seeking to expand credit access to underserved populations, improve decision-making accuracy, and promote financial inclusion. Furthermore, it calls for the development of fairness-aware and transparent algorithms to ensure ethical and equitable credit scoring practices. Future research should focus on integrating emerging datasets, such as geolocation and behavioural analytics, and conducting longitudinal studies to validate the real-world impact of these advanced credit scoring methodologies.
Keywords: Alternative Datasets, Credit Scoring, Thin-File Consumers, Machine Learning, Financial Inclusion, Fairness, Interpretability.
References:
[1] Smith, M., & Henderson, C. (2018). Beyond Thin Credit Files. Social Science Quarterly, 99, 24-42. https://doi.org/10.1111/SSQU.12389.
[2] Cheney, J. (2008). Alternative Data and its Use in Credit Scoring Thin- and No-File Consumers. Banking & Insurance. https://doi.org/10.2139/ssrn.1160283.
[3] Rozo, B., Crook, J., & Andreeva, G. (2021). The Role of Web Browsing in Credit Risk Prediction. Econometrics: Econometric & Statistical Methods - Special Topics eJournal. https://doi.org/10.1016/j.dss.2022.113879.
[4] Fu, G., Sun, M., & Xu, Q. (2020). An Alternative Credit Scoring System in China's Consumer Lending Market: A System Based on Digital Footprint Data. Decision-Making in Economics eJournal. https://doi.org/10.2139/ssrn.3638710.
[5] Zhou, J., Wang, C., Ren, F., & Chen, G. (2021). Inferring multi-stage risk for online consumer credit services: An integrated scheme using data augmentation and model enhancement. Decis. Support Syst., 149, 113611. https://doi.org/10.1016/J.DSS.2021.113611.
[6] Djeundje, V., Crook, J., Calabrese, R., & Hamid, M. (2021). Enhancing credit scoring with alternative data. Expert Syst. Appl., 163, 113766. https://doi.org/10.1016/j.eswa.2020.113766.
[7] Sustersic, M., Mramor, D., & Zupan, J. (2007). Consumer Credit Scoring Models with Limited Data. Banking & Financial Institutions eJournal. https://doi.org/10.2139/ssrn.967384.
[8] Huang, C., Chen, M., & Wang, C. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl., 33, 847-856. https://doi.org/10.1016/j.eswa.2006.07.007.
[9] Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2020). Deciphering Big Data in Consumer Credit Evaluation. International Political Economy: Investment & Finance eJournal. https://doi.org/10.2139/ssrn.3312163.
[10] Dastile, X., Çelik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Appl. Soft Comput., 91, 106263. https://doi.org/10.1016/j.asoc.2020.106263.
[11] Bequé, A., & Lessmann, S. (2017). Extreme learning machines for credit scoring: An empirical evaluation. Expert Syst. Appl., 86, 42-53. https://doi.org/10.1016/j.eswa.2017.05.050.
[12] Jiang, J., Liao, L., Lu, X., Wang, Z., & Xiang, H. (2021). Deciphering big data in consumer credit evaluation. Journal of Empirical Finance. https://doi.org/10.1016/J.JEMPFIN.2021.01.009.
[13] He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl., 98, 105-117. https://doi.org/10.1016/j.eswa.2018.01.012.
[14] Munkhdalai, L., Munkhdalai, T., Namsrai, O., Lee, J., & Ryu, K. (2019). An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments. Sustainability. https://doi.org/10.3390/SU11030699.
[15] Aggarwal, N. (2018). Machine Learning, Big Data and the Regulation of Consumer Credit Markets: The Case of Algorithmic Credit Scoring. Discrimination. https://doi.org/10.2139/ssrn.3309244.
[16] Zhu, B., Yang, W., Wang, H., & Yuan, Y. (2018). A hybrid deep learning model for consumer credit scoring. 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), 205-208. https://doi.org/10.1109/ICAIBD.2018.8396195.
[17] McCanless, M. (2023). Banking on alternative credit scores: Auditing the calculative infrastructure of U.S. consumer lending. Environment and Planning A: Economy and Space, 55, 2128 - 2146. https://doi.org/10.1177/0308518X231174026.
[18] Wiginton, J. (1980). A Note on the Comparison of Logit and Discriminant Models of Consumer Credit Behavior. Journal of Financial and Quantitative Analysis, 15, 757 - 770. https://doi.org/10.2307/2330408.
[19] Ala’raj, M., Abbod, M., & Majdalawieh, M. (2021). Modelling customers credit card behaviour using bidirectional LSTM neural networks. Journal of Big Data, 8, 1-27. https://doi.org/10.1186/s40537- 021-00461-7.
[20] Saberi, M., Mirtalaei, M., Hussain, F., Azadeh, A., Hussain, O., & Ashjari, B. (2013). A granular computing-based approach to credit scoring modeling. Neurocomputing, 122, 100-115. https://doi.org/10.1016/j.neucom.2013.05.020.
[21] Wei, Y., Yildirim, P., Bulte, C., & Dellarocas, C. (2014). Credit Scoring with Social Network Data. Economics of Networks eJournal. https://doi.org/10.2139/ssrn.2475265.
[22] Wang, C., Han, D., Liu, Q., & Luo, S. (2019). A Deep Learning Approach for Credit Scoring of Peer- to-Peer Lending Using Attention Mechanism LSTM. IEEE Access, 7, 2161-2168. https://doi.org/10.1109/ACCESS.2018.2887138.
[23] West, D. (2000). Neural network credit scoring models. Comput. Oper. Res., 27, 1131-1152. https://doi.org/10.1016/S0305-0548(99)00149-5.
[24] Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl., 39, 3446-3453. https://doi.org/10.1016/j.eswa.2011.09.033.
[25] Lee, T., & Chen, I. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl., 28, 743-752. https://doi.org/10.1016/j.eswa.2004.12.031.
[26] Mahjoub, R., & Afsar, A. (2019). A hybrid model for customer credit scoring in stock brokerages using data mining approach. Int. J. Bus. Inf. Syst., 31, 195-214. https://doi.org/10.1504/IJBIS.2019.10022044.
[27] Arram, A., Ayob, M., Albadr, M., Sulaiman, A., & Albashish, D. (2023). Credit card score prediction using machine learning models: A new dataset. ArXiv, abs/2310.02956. https://doi.org/10.48550/arXiv.2310.02956.
[28] Junior, L., Nardini, F., Renso, C., Trani, R., & Macêdo, J. (2020). A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems. Expert Syst. Appl., 152, 113351. https://doi.org/10.1016/j.eswa.2020.113351.