Can machine learning be used for credit risk modelling?
Loans are a major business of banks. It is important for banks to predict whether a particular applicant is affordable to repay the loan. Bad loans become restricted by ensuring that loans originate to only strong borrowers. Who are likely to be able to repay, and who are unlikely to become insolvent.
Except for traditional financial institutions…
Internet finance which combines traditional financial industry and Internet technology rapidly developed and produced many network-based financial products. Including P2P credit platform, online funding platform, payment platform. However, these activities lacking regulations are likely to bring risks to investment institutions and financial enterprises. Such as overdue repayment by borrowers, credit card fraud and operator’s fraud.
As a result, we can establish an effective risk control system like building credit scoring card models. In this way, this model learns historical clients’ behavior information. Such as income status, credit history, jobs, and debt condition. As a result, taking advantage of some machine learning methods to predict a credit score.
The model would group different clients into multiple scoring levels, which indicates the ability to repay the loans.
To improve the accuracy of predictions, data mining and feature selection is the key factor. The dataset is so large that it is difficult to select which features are important to classify. Moreover, how many weights are assigned to each feature. Another problem of predicting credit risk is an unbalanced dataset. The majority of data we collected is that most people would repay the loan. So we need to figure out how to train models to recognize the negative label.
Tree models are applicable to predict financial data, like LightGBM since it is interpretable and more accurate. Certainly, we compare the accuracy with logistic regression to figure out which one is the best model. The assessment metrics we use are accuracy and AUC to select a high performance and efficient model. AUC score stands for area under the curve of a ROC curve. ROC curve is the graph between False Positive Rate and True Positive Rate at different thresholds.