Annotated Bibliography for below attached aricles
A Two-Stage Dynamic Credit Risk Assessment System
Rui Li Department of Electronic
Engineering Shanghai Jiao Tong University
Shanghai, China [email protected]
Shizhe Deng Department of Electronic
Engineering Shanghai Jiao Tong University
Shanghai, China [email protected]
Jianquan Zhang Department of Electronic
Engineering Shanghai Jiao Tong University Shanghai,
China [email protected]
Hao He Shanghai Institute for Advanced
Communication and Data Science Department of Electronic Engineering
Shanghai Jiao Tong University Shanghai, China
Yaohui Jin MoE Key Lab of Artifcial Intelligence
AI Institute Shanghai Jiao Tong University
Shanghai, China
Jiangang Duan Seassoon Technology
Shanghai, China
financial crisis, most financial institutions explore the more
advanced statistical model instead of traditional manual
approaches[1]. The primary technique behind this task is the
default probability prediction. Many models have been proposed
in this filed are based on traditional classification techniques such
as Logistic Regression(LR)[2], Decision Tree(DT)[3], or Support
Vector Machine(SVM)[1]. Such approaches, however, have failed
to address sequential client transaction data, limiting their
effectiveness and scalability.
Although many statistical models have been evaluated in practice,
there are certain limitations. First, there is still very little scientific
understanding of sequential transaction data in credit risk
assessment fields. Second, the currently existing methods do not
fine-grained process dynamic and static features, respectively.
In this paper, a novel two-stage approach is proposed to learn the
credit risk of a client dynamically. We design the aggregation
layer for learning the static representation of a client at moment T,
and the distinct moment representation constructs the dynamic
features of a client. For the purpose of dynamic process sequential
features, Recurrent Neural Network(RNN) has been used.
The rest of this paper is organized as follows. In Section 2, we
summarize the related work. Section 3 formally define our
problem. Section 4 presents the approach to predict default
probability. In Section 5, we detailed the experimental plan and
results. We conclude this paper in Section 6.
2. RELATED WORK Research into credit risk assessment has a long history. As early
as more than seventy years ago, Durand[4] proposed the concept
of credit score. To date, several studies have investigated the
theory and application of credit risk assessment. Over the past
decade, most research in default probability prediction has
emphasized the use of machine learning and deep learning,
because of the flourish in data science and several essential
advancements in the field of artificial intelligence.
Most researchers investigating credit risk assessment have utilized
Logistic Regression(LR), due to the interpretability and efficient.
Chen[2] show that construct a hybrid scoring model based
https://doi.org/10.1145/3417188.3417193
ACM ISBN 978-1-4503-7548-1/20/07...$15.00 © 2020 Association for Computing Machinery. ICDLT 2020, July 10–12, 2020, Beijing, China [email protected].
specific permission and/or a fee. Request permissions from republish, to post on servers or to redistribute to lists, requires prior honored. Abstracting with credit is permitted. To copy otherwise, or for components of this work owned by others than ACM must be copies bear this notice and the full citation on the first page. Copyrights not made or distributed for profit or commercial advantage and that personal or classroom use is granted without fee provided that copies are Permission to make digital or hard copies of all or part of this work for
around the world. Notably, in light of recent events in the Credit risk assessment is essential to many financial institutions 1. INTRODUCTION
credit risk assessment; machine learning; long short-term memory
Keywords
approaches ➝Neural networks methodologies ➝ Machine learning ➝ Machine learning
• Applied computing ➝ Electronic commerce; • Computing CCS Concepts
over various baselines.
Experimental results on the real-world dataset show its superiority dynamic features could be learned by the RNN layer. representation constructs the dynamic features of a client. The from the static feature at time T. Second, the distinct moment design the aggregation layer to extract representative information Network(FNN) and Recurrent Neural Network(RNN). First, we present a two-stage model using FeedForward Neural features, which limits their effectiveness. Thus, in this paper, we studies have not fine-grained dealt with static and dynamic risk assessment using deep learning methods. However, previous economics. Recently, there has been renewed interest in credit financial companies and banks in the history of development Credit risk assessment has been thought of as a critical factor in ABSTRACT
99
Logistic Regression by data augmentation a more accurate credit
score can be obtained. Despite the widely used LR, there remains
a paucity of ability on nonlinear relationships. To better
understand the relationship behind data, more sophisticated
methods have been investigated, such as DT and SVM. Kabari[3]
used Decision Tree and Artificial Neural Network to build up the
credit model, and results indicated that it is successful technology.
Harris[1] proposed a novel credit model using the clustered
support vector machine(CSVM) and demonstrated that CSVM
could achieve high classification performance while remaining
relatively cheap computationally. However, much of the research
based traditional machine learning method up to now has could
not address sequential input data.
In recent times, the use of deep learning techniques to build a
credit risk model has seen significant increases in the reported
accuracy on benchmarking datasets[1]. Addo[7] compared the
classification abilities of machine and deep learning models, and
results show that the tree-based models are more stable than the
models based on multilayer artificial neural networks[7]. In [5] [6]
depict the only use of LSTM to predict default probability on
sequential input data. Zhao[8] proposed a dynamic forecasting
system to predict the default probability of a company, which
utilizes long short-term memory model to incorporate daily news
from social media. Babaev et al.[10] combined embedding
techniques in NLP and RNN to mining sequential transaction
data.
3. PROBLEM STATEMENT Suppose the input at time can be represented by ⃗
m represents the number of features. The
features can be either numeric or categorical. Therefore, the input
data of a client can be dynamically expressed as ⃗ ⃗ ⃗ Each sample has a corresponding label , y=1 means default sample, and vice versa.
Our purpose is to predict the default probability of clients based on their characteristics. We can use Equation (1) to describe this process.
̃ (1)
The function f is what we need to approximate through deep
learning methods.
4. METHODOLOGY At time T, we could access the output of a client through Equation
(2).
(2)
By analyzing the data of a client at different times, we could predict credit risk probability y.
(3)
Figure 1. The overall architecture of the network.
We dismantled the assessment of the default probability into two
stages. First, we construct the aggregation layer to learn the static
feature representation of the client at time T. Second, the RNN
layer could learn the dynamic feature composed of the static
features at different times. The overall system structure is shown
in Figure 1. Below we will introduce each structure of the model
in detail.
4.1 Input Layer The input layer receives the data after preprocessing, which
mainly includes variable encoding, normalization, and sample
balancing. At time , all features can be regarded as static. Inputs at different times constitute dynamic features. The specific data
processing method will be introduced in detail in Section 5.
Let
denote the value corresponding to feature i at time T,
and the input data of each sample can be expressed by Equation (4).
(
) ⃗ ⃗ ⃗ (4)
4.2 Aggregation Layer The primary function of the aggregation layer is to learn the
representation of all features at time T. This layer is implemented
in Feedforward Neural Network(FNN)[11]. We have constructed
a multi-layer FNN module to learn the representation space of
clients at a particular moment.
At time T, the output of aggregation layer is
⃗ ⃗ ⃗
4.3 RNN Layer Here we choose a variant of the network Long Short-term
Memory(LSTM)[12] to build the RNN layer. LSTM can
effectively alleviate the problem of gradient vanishing during
training, which is more conducive to the actual deployment and
application of the model.
The aggregation layer can generate the feature of a client at time t, and the sequential features can be sampled from different moments. The dynamic features can be expressed as ⃗ ⃗ ⃗ ⃗ .
100
The basic structure of LSTM can refer to Figure 2. Using the sequential data A as the input of the RNN layer can access the output at time t. The detailed calculation method can refer to Equation (5)~(10).
(5)
(6)
(7)
(8)
(9)
(10)
Figure 2. LSTM Cell.
4.4 Predict Layer The predict layer uses logistic regression to obtain the default
probability. The input is the generated by the RNN layer at time t, and the mathematical expression is as follows.
̃ (11)
5. DATA AND RESULTS ANALYSIS The original data set we used is the default of credit card clients
dataset in Taiwan[13], and the dataset is serialized according to its
characteristics. The experiment compares various traditional
methods such as SVM, MLP, and results show that our model has
outstanding performance.
5.1 Data Description and Pre-processing The dataset was payment data in October 2005, from an important
bank in Taiwan, and the targets were credit cardholders of the
bank[13]. The dataset used the following 23 variables[13]:
X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family
(supplementary) credit.
X2: Gender (1 = male; 2 = female).
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
X4: Marital status (1 = married; 2 = single; 3 = others).
X5: Age (year).
X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005)
as follows: X6 = the repayment status in September, 2005;
X7 = the repayment status in August, 2005; . . .;X11 = the
repayment status in April, 2005. The measurement scale for
the repayment status is: -1 = pay duly; 1 = payment delay
for one month; 2 = payment delay for two months; . . .; 8 =
payment delay for eight months; 9 = payment delay for nine
months and above.
X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount
of bill statement in August, 2005; . . .; X17 = amount of bill
statement in April, 2005.
X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in
August, 2005; . . .;X23 = amount paid in April, 2005.
It can be seen from above that the features X6-X23 represent
dynamic features; that is, they represent the values of a different
time, so we need to do the necessary process of the original data
set.
The dataset contains a total of six months of data. We can think
that each month of data is composed of static features X1-X5 and
dynamic features of that month. Such as the data of client A in
September 2005 can be expressed as [X1, X2, X3, X4, X5, X6,
X12, X18]. The format of the input data is shown in Figure 3.
Table 1. The overall situation of the dataset
Default client Non-default
client Ratio
Number 6636 23364 0.284 : 1
The overall situation of the data is described in Table 1.
Obviously, the problem of imbalanced classification exists in the
dataset. The synthetic minority oversampling technique(SMO-
TE)[14], a popular method for improving the classification
performance of imbalanced data, adds generated minority samples
to change the distribution of imbalanced data sets. We use the
SMOTE algorithm to get a dataset with a positive and negative
sample ratio of 1:1.
Figure 3. Input Data.
5.2 Experimental Setting In the experiment, the Aggregation Layer uses a 3-layer FNN and
the parameter k=2. The RNN Layer is composed of one layer of
LSTM. At the same time, to alleviate the problem of overfitting,
we used dropout[15] technology when designing the RNN layer.
We use the Adam optimization algorithm to train the model, and
the learning rate is set to 0.002.
101
5.3 Evaluation Measures We use the AUC and ACC indicators commonly used in binary
classification tasks as the evaluation criteria for model
performance. AUC is the area under Receiver operating
characteristic(ROC) curve. The value range of AUC is between
0.5 and 1. The closer the AUC is to 1, the higher the authenticity
of the model. Accuracy(ACC) refers to the proportion of results
that the model predicts correctly. We can use Equation (12) to compute.
(12)
5.4 Results Analysis The experiment compares various statistical models in credit risk
assessment, and we can see that our model has excellent
performance through the AUC and ACC indicator. Table 2 gives
specific experiment results. LR is a logistic regression model that
is widely applied in default probability predict because of their
interpretability and practicality. Multilayer Perceptron
(abbreviated as MLP) is a forward-structured artificial neural
network that maps a set of input vectors to a set of output vectors.
MLP can be regarded as a directed graph, composed of multiple
node layers, and each layer is fully connected to the next layer.
Except for the input node, each node is a neuron with a nonlinear
activation function. Support Vector Machine (SVM) is a type of
generalized linear classifier that performs binary classification of
data by supervised learning. Support vector machine can learn by
using different kernel functions, SVM-Linear means linear kernel
function, SVM-RBF means nonlinear kernel function.
Table 2. The AUC of various model
Model AUC ACC
LR 0.731 0.731
MLP 0.767 0.759
SVM-Linear 0.725 0.729
SVM-RBF 0.763 0.760
Ours 0.841 0.763
Table 3 compares the results obtained from the experiment that
utilized different LSTM layers. It is apparent from this table that
the shallow structure possesses outstanding performance in this
application. The most likely cause is the length of input data is
shorter.
Table 3. The AUC of various LSTM layer
Model AUC ACC
LSTM-1 0.841 0.763
LSTM-2 0.826 0.749
LSTM-3 0.798 0.749
LSTM-4 0.815 0.751
Our model could handle any sequence data set. For other data, we
need to select a part of the data to train the model and find the
appropriate learning rate and parameter k.
6. CONCLUSION In this paper, the aim was to assess default probability by
sequential data. We proposed a novel two-stage model based on
LSTM and FNN, and results in real-world datasets show that our
system could mine valuable information from sequential data.
This study has identified that utilized sequential data suitable
could improve the performance of credit risk assessment. We
could process sequential data with RNN efficiently. The findings
of this investigation complement those of earlier studies. The
findings from this study make several contributions to the current
literature. First, a fine-grained feature process method introduced
by this paper. Second, we proposed a two-stage model to mine
valuable information from sequential data. Third, we compare
various baselines, and results on real-world datasets show the
superiority of our model.
7. ACKNOWLEDGMENTS This work is supported by the National Key Research and
Development Program of China(2018YFC0830400).
8. REFERENCES [1] Terry Harris. 2015. Credit scoring using the clustered support
vector machine. Expert Systems with Applications 42, 2:
741–750.
[2] Keqin Chen, Kun Zhu, Yixin Meng, Amit Yadav, and Asif Khan. 2020. Mixed Credit Scoring Model of Logistic
Regression and Evidence Weight in the Background of Big
Data. In Intelligent Systems Design and Applications.
Springer International Publishing, Cham, 435–443.
[3] L G Kabari and E O Nwachukwu. 2013. Credit Risk
Evaluating System Using Decision Tree – Neuro Based Model. International Journal of Engineering Research 2, 6: 8.
[4] Durand D. 1941. Risk elements in consumer installment financing[M]. National Bureau of Economic Research, New
York.
[5] Chongren Wang, Dongmei Han, Qigang Liu, and Suyuan Luo. 2019. A Deep Learning Approach for Credit Scoring of
Peer-to-Peer Lending Using Attention Mechanism LSTM.
IEEE Access 7: 2161–2168.
[6] Yishen Zhang, Dong Wang, Yuehui Chen, Huijie Shang, and Qi Tian. 2017. Credit Risk Assessment Based on Long
Short-Term Memory Model. In Intelligent Computing
Theories and Application. Springer International Publishing,
Cham, 700–712.
[7] Peter Addo, Dominique Guegan, and Bertrand Hassani. 2018. Credit Risk Analysis Using Machine and Deep
Learning Models. Risks 6, 2: 38
[8] Yi Zhao, Yanyan Shen, and Yong Huang. 2019. DMDP: A Dynamic Multi-source Default Probability Prediction
Framework. Data Science and Engineering 4, 1: 3–13.
[9] Bing Zhu, Wenchuan Yang, Huaxuan Wang, and Yuan Yuan. 2018. A hybrid deep learning model for consumer
credit scoring. In 2018 International Conference on Artificial
Intelligence and Big Data (ICAIBD), 205–208.
[10] Dmitrii Babaev, Maxim Savchenko, Alexander Tuzhilin, and Dmitrii Umerenkov. 2019. E.T.-RNN: Applying Deep
Learning to Credit Loan Applications. Proceedings of the
25th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining - KDD ’19: 2183–2190.
[11] K. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural
Netw. 2, 5 (1989), 359–366.
102
[12] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-
term memory. Neural computation 9, 8 (1997), 1735–1780.
[13] I-Cheng Yeh and Che-hui Lien. 2009. The comparisons of data mining techniques for the predictive accuracy of
probability of default of credit card clients. Expert Systems
with Applications 36, 2: 2473–2480.
[14] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-
sampling Technique. Journal of Artificial Intelligence
Research 16: 321–357.
[15] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a
simple way to prevent neural networks from overfitting. J.
Mach. Learn. Res. 15, 1 (January 2014), 1929–1958.
103