Annotated Bibliography for below attached aricles

profilesri18123
creditrisk.assementsystems.pdf

A Two-Stage Dynamic Credit Risk Assessment System

Rui Li Department of Electronic

Engineering Shanghai Jiao Tong University

Shanghai, China [email protected]

Shizhe Deng Department of Electronic

Engineering Shanghai Jiao Tong University

Shanghai, China [email protected]

Jianquan Zhang Department of Electronic

Engineering Shanghai Jiao Tong University Shanghai,

China [email protected]

Hao He Shanghai Institute for Advanced

Communication and Data Science Department of Electronic Engineering

Shanghai Jiao Tong University Shanghai, China

[email protected]

Yaohui Jin MoE Key Lab of Artifcial Intelligence

AI Institute Shanghai Jiao Tong University

Shanghai, China

[email protected]

Jiangang Duan Seassoon Technology

Shanghai, China

[email protected]

financial crisis, most financial institutions explore the more

advanced statistical model instead of traditional manual

approaches[1]. The primary technique behind this task is the

default probability prediction. Many models have been proposed

in this filed are based on traditional classification techniques such

as Logistic Regression(LR)[2], Decision Tree(DT)[3], or Support

Vector Machine(SVM)[1]. Such approaches, however, have failed

to address sequential client transaction data, limiting their

effectiveness and scalability.

Although many statistical models have been evaluated in practice,

there are certain limitations. First, there is still very little scientific

understanding of sequential transaction data in credit risk

assessment fields. Second, the currently existing methods do not

fine-grained process dynamic and static features, respectively.

In this paper, a novel two-stage approach is proposed to learn the

credit risk of a client dynamically. We design the aggregation

layer for learning the static representation of a client at moment T,

and the distinct moment representation constructs the dynamic

features of a client. For the purpose of dynamic process sequential

features, Recurrent Neural Network(RNN) has been used.

The rest of this paper is organized as follows. In Section 2, we

summarize the related work. Section 3 formally define our

problem. Section 4 presents the approach to predict default

probability. In Section 5, we detailed the experimental plan and

results. We conclude this paper in Section 6.

2. RELATED WORK Research into credit risk assessment has a long history. As early

as more than seventy years ago, Durand[4] proposed the concept

of credit score. To date, several studies have investigated the

theory and application of credit risk assessment. Over the past

decade, most research in default probability prediction has

emphasized the use of machine learning and deep learning,

because of the flourish in data science and several essential

advancements in the field of artificial intelligence.

Most researchers investigating credit risk assessment have utilized

Logistic Regression(LR), due to the interpretability and efficient.

Chen[2] show that construct a hybrid scoring model based

https://doi.org/10.1145/3417188.3417193

ACM ISBN 978-1-4503-7548-1/20/07...$15.00 © 2020 Association for Computing Machinery. ICDLT 2020, July 10–12, 2020, Beijing, China [email protected].

specific permission and/or a fee. Request permissions from republish, to post on servers or to redistribute to lists, requires prior honored. Abstracting with credit is permitted. To copy otherwise, or for components of this work owned by others than ACM must be copies bear this notice and the full citation on the first page. Copyrights not made or distributed for profit or commercial advantage and that personal or classroom use is granted without fee provided that copies are Permission to make digital or hard copies of all or part of this work for

around the world. Notably, in light of recent events in the Credit risk assessment is essential to many financial institutions 1. INTRODUCTION

credit risk assessment; machine learning; long short-term memory

Keywords

approaches ➝Neural networks methodologies ➝ Machine learning ➝ Machine learning

• Applied computing ➝ Electronic commerce; • Computing CCS Concepts

over various baselines.

Experimental results on the real-world dataset show its superiority dynamic features could be learned by the RNN layer. representation constructs the dynamic features of a client. The from the static feature at time T. Second, the distinct moment design the aggregation layer to extract representative information Network(FNN) and Recurrent Neural Network(RNN). First, we present a two-stage model using FeedForward Neural features, which limits their effectiveness. Thus, in this paper, we studies have not fine-grained dealt with static and dynamic risk assessment using deep learning methods. However, previous economics. Recently, there has been renewed interest in credit financial companies and banks in the history of development Credit risk assessment has been thought of as a critical factor in ABSTRACT

99

Logistic Regression by data augmentation a more accurate credit

score can be obtained. Despite the widely used LR, there remains

a paucity of ability on nonlinear relationships. To better

understand the relationship behind data, more sophisticated

methods have been investigated, such as DT and SVM. Kabari[3]

used Decision Tree and Artificial Neural Network to build up the

credit model, and results indicated that it is successful technology.

Harris[1] proposed a novel credit model using the clustered

support vector machine(CSVM) and demonstrated that CSVM

could achieve high classification performance while remaining

relatively cheap computationally. However, much of the research

based traditional machine learning method up to now has could

not address sequential input data.

In recent times, the use of deep learning techniques to build a

credit risk model has seen significant increases in the reported

accuracy on benchmarking datasets[1]. Addo[7] compared the

classification abilities of machine and deep learning models, and

results show that the tree-based models are more stable than the

models based on multilayer artificial neural networks[7]. In [5] [6]

depict the only use of LSTM to predict default probability on

sequential input data. Zhao[8] proposed a dynamic forecasting

system to predict the default probability of a company, which

utilizes long short-term memory model to incorporate daily news

from social media. Babaev et al.[10] combined embedding

techniques in NLP and RNN to mining sequential transaction

data.

3. PROBLEM STATEMENT Suppose the input at time can be represented by ⃗

m represents the number of features. The

features can be either numeric or categorical. Therefore, the input

data of a client can be dynamically expressed as ⃗ ⃗ ⃗ Each sample has a corresponding label , y=1 means default sample, and vice versa.

Our purpose is to predict the default probability of clients based on their characteristics. We can use Equation (1) to describe this process.

̃ (1)

The function f is what we need to approximate through deep

learning methods.

4. METHODOLOGY At time T, we could access the output of a client through Equation

(2).

(2)

By analyzing the data of a client at different times, we could predict credit risk probability y.

(3)

Figure 1. The overall architecture of the network.

We dismantled the assessment of the default probability into two

stages. First, we construct the aggregation layer to learn the static

feature representation of the client at time T. Second, the RNN

layer could learn the dynamic feature composed of the static

features at different times. The overall system structure is shown

in Figure 1. Below we will introduce each structure of the model

in detail.

4.1 Input Layer The input layer receives the data after preprocessing, which

mainly includes variable encoding, normalization, and sample

balancing. At time , all features can be regarded as static. Inputs at different times constitute dynamic features. The specific data

processing method will be introduced in detail in Section 5.

Let

denote the value corresponding to feature i at time T,

and the input data of each sample can be expressed by Equation (4).

(

) ⃗ ⃗ ⃗ (4)

4.2 Aggregation Layer The primary function of the aggregation layer is to learn the

representation of all features at time T. This layer is implemented

in Feedforward Neural Network(FNN)[11]. We have constructed

a multi-layer FNN module to learn the representation space of

clients at a particular moment.

At time T, the output of aggregation layer is

⃗ ⃗ ⃗

4.3 RNN Layer Here we choose a variant of the network Long Short-term

Memory(LSTM)[12] to build the RNN layer. LSTM can

effectively alleviate the problem of gradient vanishing during

training, which is more conducive to the actual deployment and

application of the model.

The aggregation layer can generate the feature of a client at time t, and the sequential features can be sampled from different moments. The dynamic features can be expressed as ⃗ ⃗ ⃗ ⃗ .

100

The basic structure of LSTM can refer to Figure 2. Using the sequential data A as the input of the RNN layer can access the output at time t. The detailed calculation method can refer to Equation (5)~(10).

(5)

(6)

(7)

(8)

(9)

(10)

Figure 2. LSTM Cell.

4.4 Predict Layer The predict layer uses logistic regression to obtain the default

probability. The input is the generated by the RNN layer at time t, and the mathematical expression is as follows.

̃ (11)

5. DATA AND RESULTS ANALYSIS The original data set we used is the default of credit card clients

dataset in Taiwan[13], and the dataset is serialized according to its

characteristics. The experiment compares various traditional

methods such as SVM, MLP, and results show that our model has

outstanding performance.

5.1 Data Description and Pre-processing The dataset was payment data in October 2005, from an important

bank in Taiwan, and the targets were credit cardholders of the

bank[13]. The dataset used the following 23 variables[13]:

 X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family

(supplementary) credit.

 X2: Gender (1 = male; 2 = female).

 X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).

 X4: Marital status (1 = married; 2 = single; 3 = others).

 X5: Age (year).

 X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005)

as follows: X6 = the repayment status in September, 2005;

X7 = the repayment status in August, 2005; . . .;X11 = the

repayment status in April, 2005. The measurement scale for

the repayment status is: -1 = pay duly; 1 = payment delay

for one month; 2 = payment delay for two months; . . .; 8 =

payment delay for eight months; 9 = payment delay for nine

months and above.

 X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount

of bill statement in August, 2005; . . .; X17 = amount of bill

statement in April, 2005.

 X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in

August, 2005; . . .;X23 = amount paid in April, 2005.

It can be seen from above that the features X6-X23 represent

dynamic features; that is, they represent the values of a different

time, so we need to do the necessary process of the original data

set.

The dataset contains a total of six months of data. We can think

that each month of data is composed of static features X1-X5 and

dynamic features of that month. Such as the data of client A in

September 2005 can be expressed as [X1, X2, X3, X4, X5, X6,

X12, X18]. The format of the input data is shown in Figure 3.

Table 1. The overall situation of the dataset

Default client Non-default

client Ratio

Number 6636 23364 0.284 : 1

The overall situation of the data is described in Table 1.

Obviously, the problem of imbalanced classification exists in the

dataset. The synthetic minority oversampling technique(SMO-

TE)[14], a popular method for improving the classification

performance of imbalanced data, adds generated minority samples

to change the distribution of imbalanced data sets. We use the

SMOTE algorithm to get a dataset with a positive and negative

sample ratio of 1:1.

Figure 3. Input Data.

5.2 Experimental Setting In the experiment, the Aggregation Layer uses a 3-layer FNN and

the parameter k=2. The RNN Layer is composed of one layer of

LSTM. At the same time, to alleviate the problem of overfitting,

we used dropout[15] technology when designing the RNN layer.

We use the Adam optimization algorithm to train the model, and

the learning rate is set to 0.002.

101

5.3 Evaluation Measures We use the AUC and ACC indicators commonly used in binary

classification tasks as the evaluation criteria for model

performance. AUC is the area under Receiver operating

characteristic(ROC) curve. The value range of AUC is between

0.5 and 1. The closer the AUC is to 1, the higher the authenticity

of the model. Accuracy(ACC) refers to the proportion of results

that the model predicts correctly. We can use Equation (12) to compute.

(12)

5.4 Results Analysis The experiment compares various statistical models in credit risk

assessment, and we can see that our model has excellent

performance through the AUC and ACC indicator. Table 2 gives

specific experiment results. LR is a logistic regression model that

is widely applied in default probability predict because of their

interpretability and practicality. Multilayer Perceptron

(abbreviated as MLP) is a forward-structured artificial neural

network that maps a set of input vectors to a set of output vectors.

MLP can be regarded as a directed graph, composed of multiple

node layers, and each layer is fully connected to the next layer.

Except for the input node, each node is a neuron with a nonlinear

activation function. Support Vector Machine (SVM) is a type of

generalized linear classifier that performs binary classification of

data by supervised learning. Support vector machine can learn by

using different kernel functions, SVM-Linear means linear kernel

function, SVM-RBF means nonlinear kernel function.

Table 2. The AUC of various model

Model AUC ACC

LR 0.731 0.731

MLP 0.767 0.759

SVM-Linear 0.725 0.729

SVM-RBF 0.763 0.760

Ours 0.841 0.763

Table 3 compares the results obtained from the experiment that

utilized different LSTM layers. It is apparent from this table that

the shallow structure possesses outstanding performance in this

application. The most likely cause is the length of input data is

shorter.

Table 3. The AUC of various LSTM layer

Model AUC ACC

LSTM-1 0.841 0.763

LSTM-2 0.826 0.749

LSTM-3 0.798 0.749

LSTM-4 0.815 0.751

Our model could handle any sequence data set. For other data, we

need to select a part of the data to train the model and find the

appropriate learning rate and parameter k.

6. CONCLUSION In this paper, the aim was to assess default probability by

sequential data. We proposed a novel two-stage model based on

LSTM and FNN, and results in real-world datasets show that our

system could mine valuable information from sequential data.

This study has identified that utilized sequential data suitable

could improve the performance of credit risk assessment. We

could process sequential data with RNN efficiently. The findings

of this investigation complement those of earlier studies. The

findings from this study make several contributions to the current

literature. First, a fine-grained feature process method introduced

by this paper. Second, we proposed a two-stage model to mine

valuable information from sequential data. Third, we compare

various baselines, and results on real-world datasets show the

superiority of our model.

7. ACKNOWLEDGMENTS This work is supported by the National Key Research and

Development Program of China(2018YFC0830400).

8. REFERENCES [1] Terry Harris. 2015. Credit scoring using the clustered support

vector machine. Expert Systems with Applications 42, 2:

741–750.

[2] Keqin Chen, Kun Zhu, Yixin Meng, Amit Yadav, and Asif Khan. 2020. Mixed Credit Scoring Model of Logistic

Regression and Evidence Weight in the Background of Big

Data. In Intelligent Systems Design and Applications.

Springer International Publishing, Cham, 435–443.

[3] L G Kabari and E O Nwachukwu. 2013. Credit Risk

Evaluating System Using Decision Tree – Neuro Based Model. International Journal of Engineering Research 2, 6: 8.

[4] Durand D. 1941. Risk elements in consumer installment financing[M]. National Bureau of Economic Research, New

York.

[5] Chongren Wang, Dongmei Han, Qigang Liu, and Suyuan Luo. 2019. A Deep Learning Approach for Credit Scoring of

Peer-to-Peer Lending Using Attention Mechanism LSTM.

IEEE Access 7: 2161–2168.

[6] Yishen Zhang, Dong Wang, Yuehui Chen, Huijie Shang, and Qi Tian. 2017. Credit Risk Assessment Based on Long

Short-Term Memory Model. In Intelligent Computing

Theories and Application. Springer International Publishing,

Cham, 700–712.

[7] Peter Addo, Dominique Guegan, and Bertrand Hassani. 2018. Credit Risk Analysis Using Machine and Deep

Learning Models. Risks 6, 2: 38

[8] Yi Zhao, Yanyan Shen, and Yong Huang. 2019. DMDP: A Dynamic Multi-source Default Probability Prediction

Framework. Data Science and Engineering 4, 1: 3–13.

[9] Bing Zhu, Wenchuan Yang, Huaxuan Wang, and Yuan Yuan. 2018. A hybrid deep learning model for consumer

credit scoring. In 2018 International Conference on Artificial

Intelligence and Big Data (ICAIBD), 205–208.

[10] Dmitrii Babaev, Maxim Savchenko, Alexander Tuzhilin, and Dmitrii Umerenkov. 2019. E.T.-RNN: Applying Deep

Learning to Credit Loan Applications. Proceedings of the

25th ACM SIGKDD International Conference on Knowledge

Discovery & Data Mining - KDD ’19: 2183–2190.

[11] K. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural

Netw. 2, 5 (1989), 359–366.

102

[12] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-

term memory. Neural computation 9, 8 (1997), 1735–1780.

[13] I-Cheng Yeh and Che-hui Lien. 2009. The comparisons of data mining techniques for the predictive accuracy of

probability of default of credit card clients. Expert Systems

with Applications 36, 2: 2473–2480.

[14] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-

sampling Technique. Journal of Artificial Intelligence

Research 16: 321–357.

[15] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a

simple way to prevent neural networks from overfitting. J.

Mach. Learn. Res. 15, 1 (January 2014), 1929–1958.

103