Annotated Bibliography for below attached aricles

profilesri18123
CustomersBehaviorriskml.pdf

Received July 5, 2019, accepted August 7, 2019, date of publication August 12, 2019, date of current version August 21, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2934644

Improve Profiling Bank Customer’s Behavior Using Machine Learning EMAD ABD ELAZIZ DAWOOD1 , ESSAMEDEAN ELFAKHRANY2, AND FAHIMA A. MAGHRABY2 1Department of Information Systems, Arab Academy for Science, Technology and Maritime Transport, Cairo, Egypt 2Department of Computer Science, Arab Academy for Science, Technology and Maritime Transport, Cairo, Egypt

Corresponding author: Emad Abd Elaziz Dawood ([email protected])

ABSTRACT In the banking industry, credit card evolution is a noticeable occurrence. Each banking system includes a huge dataset for customer’s transactions of their credit cards. Therefore, banks would be in need of customer profiling. Profiling bank customer’s cognize the issuer’s decisions about whom to give banking facilities and what a credit limit to provide. It also helps the issuers get a better understanding of their potential and current customers. In previous research, Customer profiling mainly depends on transaction data or demographic data, but in this research, we merge both data in order to get a more accurate result and minimize the risk. By finding the best technique, it leads to improvement in accuracy and helps banks to get higher profitability by customer satisfaction through a focus on the valuable customer (companies) which consider as the main engine in the bank’s profitability. This study aims at using k-mean, improved k-mean, fuzzy c-means and neural networks. The used dataset is labeled and creating a ýnew label as a target for neural network classification is the main aspect of this study, which helps to reduce the clustering execution time and get the best accuracy results. Finally, by comparing the accuracy ratio it shows that the neural network ýis the best clustering technique.

INDEX TERMS Profiling, banking, machine learning, k-mean, fuzzy c-mean, neural network classifier.

I. INTRODUCTION In the modern era of the banking sector, banks have large datasets contain customer’s information and their history of transactions. So that banks need to divide these large datasets into small clusters to be able to analyze these customer’s behaviors for using it in the best way to suggest a suitable strategy to attain the highest benefits, customer satisfaction to increase profitability. To achieve this purpose, customer profiling or customer segmentation is used. Profiling pro- duces customer profiles, which provide the banks with a full description of their customers based on a set of attributes. Customer segmentation refers to characterize the groups of customers based on either specific characteristics (e.g. region, age, income for demographic segmentation) or their behavior (for behavioral segmentation). However, ‘customer segmen- tation’ and ‘profiling’ are considered as two sides of the same coin.

Banks are confronting many challenges like default pre- diction, risk management, customer retention, and customer

The associate editor coordinating the review of this article and approving it for publication was Canbing Li.

profiling for different purposes to achieve higher profitability and reduce the risk. So it is necessary to identify customers well, to solve such challenges. Machine learning is the sci- ence of enabling computers to act without being programmed. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it [1]. Machine learning teaches the computer how to learn, how to find equations and functions that not work only in the example that it has, but also in the future work for unknown ones. Machine learning not only helps in upgrade connection levels with current customers, but it also ýplays an important role in predicting the behavior of customers based on a certain group of occurrences or patterns which identify their future strategy, planning on offering targeted credit products to the customers. It shifted ýthe focus to the customer and modify the role played by banks in their current format. The four machine learning techniques which are used in this research are (K-mean, improved k-mean, fuzzy c-mean, and artificial neural networks) and their applications are applied to a real dataset from a bank in Taiwan, and then compare the accuracy ratio between them. The used machine learning techniques are about profiling the customer behaviors into clusters.

109320 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ VOLUME 7, 2019

E. A. E. Dawood et al.: Improve Profiling Bank Customer’s Behavior Using Machine Learning

The rest of this paper is organized as follows: Section II: presents the related works, which focus

on profiling customers using machine-learning techniques. Section III: explains the four machine learning tech- niques and the accuracy measures, which are used in our research. Section IV: describes the dataset and its attributes. Section V: clarifies the proposed model and applying the techniques on the dataset. Section VI: shows the results of our experiment and compares it with the results of earlier researches. Section VII: presents the conclusion and future work.

II. LITERATURE SURVEY Many researchers are working on the problem of profil- ing bank customers using different techniques and different datasets.

The following papers focus on bank customers profiling and machine learning techniques used:

In 2015, Sharahi and Aligholi [2] presented a classification model for the dataset of Sepah Bank Branches Tehran using two steps and k-means clustering algorithms. Segmentation of 60 companies, which were customers of Sepah Bank, was a kind of demographic and behavioral segmentation and it helped to identify the loyal customers.

In 2016, Ayoubi [3] explained a customer segmentation model based on the two-step algorithm and Kohonen neu- ral network. Customer segmentation based on effective fac- tors on Customer Lifetime Value (CLV). The dataset about 56000 customers of the ‘‘Taavon bank’’ was used in this research. Firstly, by using the means of a Two-step approach, the optimum number of clusters was determined. Then, ‘‘Kohonen neural network’’ was applied. Based on WRFM (the weight of Recency, Frequency, and Monetary) model, the value of each cluster was calculated.

In 2017, Palaniappan et al. [4] presented a profiling model for the customers of a Portuguese retail bank within the duration of five years (2008 to 2013). This paper focused on helping banks to increase the accuracy of their customer profiling through classification as well as identifying a group of customers who had a high probability to subscribe to a long-term deposit. Three classification algorithms were used which were Naïve Bayes, Random Forest, and Decision Tree.

In 2017, Bansal et al. [5] presented a modification in a clustering model of the k-means algorithm. This modifi- cation based on normalization. The researcher to find the results used the Cancer Dataset. The original data were highly dimensional, but only five attributes had been finally con- sidered based on requirements. This paper showed that the accuracy rate for the existing algorithm equal to 57.14% while the improved algorithm recorded 92.86%.

In 2017, Patil and Dharwadkar [6] produce a prediction and classification model for two datasets of bank customer’s data. They used the Artificial Neural Network (ANN) in this model then weighted the results. By applying the ANN algorithm and the proposed model, shows that the ANN algorithm

works efficiently for the two datasets. This algorithm gave an accuracy rate of 72% for dataset1 and 98% for dataset2.

In 2018, Yang and Zhang [7] presented a classifica- tion model for the credit card default data set in the bank from Taiwan using five clustering algorithms. 10-fold cross-validation was used to get the average area under the curve (AUC) and the correct rate of the model. Light GBM (high-performance Gradient Boosting framework built by Microsoft Company) was the highest accuracy rate. The model of Light GBM achieved an accuracy ratio by F1-measure equal 89.34%.

In 2018, Niloy and Navid [8] presented a classification model for the credit card default data set for a bank in Taiwan. Naïve Bayesian Classifier and Decision Trees were used as classification algorithms to classify if the client is the default credit cardholder or not. The result of this paper showed that Naïve Bayesian achieved the best accuracy.

In 2019, Arshad et al. [9] presented a multi-class classifi- cation model for eighteen datasets from the UCI repository. Semi-Supervised Deep Fuzzy C-Mean (DFCM-MC) was used in this paper for clustering semi-supervised data. They introduced a new label for the unlabeled data by fuzzy c-mean. They used the labeled data (supervised data) and unlabeled data (unsupervised data) with the new label that extracted the discriminatory information that was used for classification. The accuracy rate of DFCM-MC was 80.82% and the f-measure was 78.16%.

The previous literature survey shows that various machine- learning algorithms were used for predicting and clustering different datasets by many authors. All of them clustered the original datasets with the existing label, but in this work, we create a new label by using the unsupervised technique and use it as a target for the neural network algorithm. A profiling model was built for the dataset of bank customers using a supervised machine learning algorithm depends on the result of the unsupervised techniques as input for the supervised algorithm.

2.1. The impact of the dollar crisis on credit cards in Egyptian banks:

Some customer switches from one bank to another because of banks do not classify the customer as the best rating so there is no satisfaction for them. In recent days, Due to the high price of the dollar against the Egyptian pound (Dollar crisis), customers tend to use credit cards, which need a good rating so that the customer is satisfied to get the best profit and reduce the risk.

Egypt’s largest listed bank, Commercial International Bank (CIB), told customers on July 2016 it was reducing the number of foreign currency customers can spend and withdraw when using their debit and credit cards abroad. Egypt’s central bank wrote to bank chiefs asking that they ‘‘ensure that debit cards, including pre-paid cards, issued in local currency by Egyptian banks are only used within the country.’’ CIB did not specify which cards would be affected or give the new limits, but several bank staff told Reuters that the move would affect both credit and debit cards

VOLUME 7, 2019 109321

E. A. E. Dawood et al.: Improve Profiling Bank Customer’s Behavior Using Machine Learning

with limits cut by about 50 percent. CIB cut Classic Card owners’ maximum purchases outside of Egypt to $2,500 a month from $5,000 and $3,500 a month from $7,500 a month for Gold Card owners [10]. HSBC Egypt (The Hong Kong and shanghai banking corporation) says that all credit and debit cards have a limit of $100 per month, though it does not specify whether this is for cash withdrawals or purchases, according to the bank’s website [11], [12]. Other Egyptian banks have put limits on debit and credit card purchases and ATM withdrawals abroad. According to Ahmed Aboul Dahab, head of retail at SAIB Bank (Arab International Banking Company) [13], says that the bank registered a 70-percent drop in credit card usage in January and February compared to the same period a year earlier.

Because of this crisis, many customers turned from their banks to another searching for the high limit. So that any bank may lose a huge number of customers, so we suggest to repro- filing the bank customers to put them in a suitable cluster to increase customer retention and get high profitability.

III. METHODS In the world of information explosion, individual banks pro- duce and collect a huge volume of data every day. Right now, machine learning is an indispensable tool in the decision sup- port system and plays a key role in customer segmentation, customer services, fraud detection, credit and behavior scor- ing, and benchmarking [14]. Machine learning authorizes you to take your segmentation to the up next level. Machine learn- ing segments are effective: they can update in real-time. This makes it possible to automate the personalization methods; the thing that is necessary if you want to publish them widely.

The four machine learning techniques employed in this study are discussed below:

A. K-MEANS ALGORITHM K-mean clustering technique is one of the most commonly used techniques for years because of its stability and Mac Queen proposes simplicity. The K-Means clustering algo- rithm in 1967 is a partition-based cluster analysis method. K-means execute division of objects into clusters that are ‘‘similar’’ between them and ‘‘dissimilar’’ to the objects belongs to another cluster. It is used widely in cluster analysis for that, the K-means algorithm has higher efficiency and scalability and converges fast when dealing with large data sets. K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data with- out defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. K-means is a fast and efficient method, because the complexity of one iteration is k∗n∗d where k (number of clusters), n (number of examples), and d (time of computing the Euclidian distance between two points) [15].the following equation represent k-mean cluster- ing algorithm:

J (V) = ∑c

I=1

∑n J=1

(‖Xi−Vj‖)2 (1)

|| xi−vj || is the Euclidean distance between a point, xi, and a centroid, vj, iterated overall c points in the ith cluster, for all n clusters.

B. IMPROVED K-MEANS CLUSTERING ALGORITHM Improvement in the k-means clustering algorithm was used because it can define the number of clusters automatically and assign the required cluster to un-clustered points. The proposed improvement leads to achieve high accuracy and reduce the clustering time by the member assigned to the cluster. An improved k-means clustering algorithm based on dissimilarity. It selects the initial centroids using the Huffman tree, which uses the dissimilarity matrix. Many experiments confirm that the improved algorithm is efficient with better clustering accuracy on the same algorithm time complexity [16].

C. FUZZY C-MEANS CLUSTERING Fuzzy clustering (also referred to as soft clustering) is a form of clustering in which each data point can belong to more than one cluster. In fuzzy clustering, data points can potentially belong to multiple clusters. One of the most widely used Fuzzy Clustering Algorithms is the Fuzzy C-means cluster- ing (FCM) Algorithm. (FCM) clustering was developed by J.C. Dunn in 1973, and improved by J.C. Bezdek in 1981. The algorithm focuses on improving the clustering or centroid computation without considering the noise and outliers [17].

1) ALGORITHMIC STEPS FOR FUZZY C-MEANS CLUSTERING Let X = {x1, x2, x3 ..., xn} be the set of data points and V = {v1, v2, v3 ..., vc} be the set of centers. 1) Randomly select ‘c’ cluster centers. 2) Calculate the fuzzy membership ‘µij’ using:

µij = 1/ ∑c

k=1 (dij/dik)

(2/m−1) (2)

3) Compute the fuzzy centers‘vj’ using:

vj = ( ∑n

i=1

( µij )m xi)/(

∑n i=1

( µij )m

), ∀j = 1,2, . . . .c

(3)

where, ‘n’ is the number of data points. ‘vj’ represents the jth cluster center. ‘m’ is the fuzziness index m e [1, ∞]. ‘c’ represents the number of the cluster center. ‘µij’ represents the membership of ith data to jth cluster

center.‘dij’ represents the Euclidean distance between ith data and jth cluster center.

D. ARTIFICIAL NEURAL NETWORKS (ANN) A neural network sometimes is a simplified pattern of human brain information processing. The neural network by simulat- ing the inner connection between the neurons works. Warren McCulloch and Walter Pitts (1943) created a computational model for neural networks based on mathematics and algo- rithms called threshold logic. This model paved the way for

109322 VOLUME 7, 2019

E. A. E. Dawood et al.: Improve Profiling Bank Customer’s Behavior Using Machine Learning

neural network research to split into two approaches. One approach focused on biological processes in the brain while the other focused on the application of neural networks to artificial intelligence. A common use of the phrase ‘‘ANN model’’ is really the definition of a class of such functions (where members of the class are obtained by varying param- eters, connection weights, or specifics of the architecture such as the number of neurons or their connectivity) [18]. This methodology provides the opportunity of creating a large combination of different structures based on

• Number of layers, • Selection of activation function. • The number of perceptrons. • Normalization layers • Dropout adjustments

ail = σ

(∑ k

ω l jka

l−1 k +b

l j

) (4)

where the activation ail of the jth neuron in the l th layer is

related to the activations in the (l−1)th layer. Weight matrix wl

for each layer, l. the entry in the jth row and kth column is ωljk.

E. EVALUATION METRICS After building a machine learning profiling model, the perfor- mance of this model should be measured by different accu- racy measures to evaluate it. In this paper, there are different techniques (supervised and unsupervised) so evaluation of their performance of classification was measured by using these measures shown in the following equations (5, 6, 7, 8, 9, 10, and 11).

Accuracy = TP+TN

TP+TN +FP+FN (5)

Sensitivity = TP

TP+FN (6)

Specificity = TN

TN+FP (7)

Precision = TP

TP+FP (8)

Recall = sensitivity = TP

TP+FN (9)

F-measure = 2∗ (precision∗ recall) (precision+ recall)

(10)

G-mean = √ sensitivity∗ specifity (11)

where

• True Positive (TP): Observation is positive, and it is predicted to be positive.

• False Negative (FN): Observation is positive, but it is predicted negative.

• True Negative (TN): Observation is negative, and it is predicted to be negative.

• False Positive (FP): Observation is negative, but it is predicted positive.

TABLE 1. Description of the attributes in the dataset.

IV. DATA SET The data set (‘default of credit card clients) is obtained from the archive of UCI (the University of California, Irvine) Machine Learning Repository [19]. It is a recently published dataset (obtained in 2015). The attribute details in the dataset are given in Table 1. The data set contains 30000 obser- vations and 23 variables and there are no missing data on it. All explanatory variables were normalized. Standardizing data is a data pre-processing step applied to variables to scale these variables to a similar range. This research aimed at the case of customer’s default payments in Taiwan and compares the accuracy rate of profiling customers among four machine-learning techniques. Therefore, among the four machine learning techniques, the artificial neural network is the only one that can accurately profile the data set.

V. PROPOSED FRAMEWORK The main idea of our proposed model shown in figure 1 is to improve profiling bank customer’s behavior using different machine learning techniques. This model starts with the data set, which obtained from the UCI machine learning reposi- tory. Then data goes through the step of data preprocessing. After that, the machine learning techniques are applied to build the customer profile. In machine learning, the profiling phase recognizes the items in a group and places them under target categories. In this paper, the accuracy rate of techniques is evaluated through Gini co-efficient for the unsupervised techniques then used the results as input for supervised tech- nique (Artificial Neural Network) (ANN) then evaluates the results to compare them to get the best technique.

1) DATA PREPROCESSING Data preprocessing is the first important step in the data mining process. If there is much not relevant and superfluous

VOLUME 7, 2019 109323

E. A. E. Dawood et al.: Improve Profiling Bank Customer’s Behavior Using Machine Learning

FIGURE 1. The proposed model for profiling bank customers.

information present or noisy and untrusted data, analyzing data that has not been carefully checked for such problems can produce not accurate results. Thus, the quality and repre- sentation of data are first and important before applying the analysis. Often, data preprocessing has been the most impor- tant phase in our machine-learning project. Firstly, the nor- malization process is confirmed in the database. In most problems, to normalize the data, at first eliminate the units of measurement for data, to be able to easily compare data from different places. One of the most common ways to normalize data includes: Re-scaling data to have values between 0 and 1. This is usually called feature scaling. One possible formula to achieve this is [20]:

Xnew = X −Xmin

Xmax −Xmin (12)

2) CLASSIFICATION USING MACHINE LEARNING ALGORITHMS The result of data preprocessing is the final training set. Then, applying the four machine learning techniques on the final training set. The first technique was applied is the K-means algorithm. The number of clusters is determined based on the researcher’s pre-knowledge. So, in this paper, the researcher determined the number of clusters as three.

The second classifier, improved k-mean that determine the number of clusters as five clusters by the next steps [21]:

FIGURE 2. Proposed model pseudo code steps.

1. using the intra-cluster distance measure, which is sim- ply the distance between a point and its cluster cen- ter and we take the average of all of these distances, defined as

intra = 1 N

K∑ i=1

∑ xεCi

‖x − zi‖ 2 (13)

where N is the number of pixels in the image, K is the number of clusters, and zi is the cluster center of cluster Ci. We obviously want to minimize this measure.

2. The next step is minimizing this measure. Measuring the inter-cluster distance, or the distance between clus- ters, which must be as big as possible. Then calculate this as the distance between cluster centers, and take the minimum of this value, defined as

inter = min (∥∥zi − zj∥∥2) , i = 1,2, . . . .,k −1

j = i+1, ..,k (14)

where cluster centers are zi’ and zj. K is the number of clusters.

3. Only taking the minimum of this value, the smallest of this distance to be maximized, and the other larger values will automatically be bigger than this value.

4. Finally, calculate the ratio of inter and intra which defined as validity:

Validity = Intra Inter

(15)

5. Therefore, the clustering, which gives a minimum value for the validity measure; tell us what the ideal value of K (number of clusters).

The third classifier is a fuzzy c-mean that applied to the data set using a number of clusters as five.

109324 VOLUME 7, 2019

E. A. E. Dawood et al.: Improve Profiling Bank Customer’s Behavior Using Machine Learning

TABLE 2. The results of Gini co-efficient for unsupervised techniques.

The next step, calculation Gini co-efficient for each one of the three unsupervised algorithms getting the best accuracy for profiling the dataset.

Finally, applying (ANN). We take the results of the unsu- pervised techniques as a target for a neural network to get its accuracy. By taking our results of K-means, improved k-means, and fuzzy C-means as targets, we introduce a new label for the dataset. Then, try them and get their accuracy by evaluating seven accuracy measures. The best classifiers that can help to improve profiling of bank customers is the highest accuracy one.

VI. EXPERIMENTS AND ANALYSIS The experiment is applied to Matlab Platform (R2015b) and using a PC with the following specifications: Intel(R) Core(TM) i7-2400 CPU @ 3.10 GHz and 6.00 GB RAM, and under windows 64-bit operating system.

A. ANALYSIS AND COMPARISON The results in the below table2 show the classification per- formance using different numbers of unsupervised machine learning classifiers. Then Gini co-efficient measured the accuracy. Taking these results as a new label for the dataset instead of the old label to perform the next step of our exper- iment and using this new label as a target for the artificial neural network algorithm.

Table 2 describes the results of applying the unsupervised three techniques on the dataset after evaluating the perfor- mance with Gini co-efficient. It shows that improved k-means are the best accuracy technique equal to 37.61%.

1) NEURAL NETWORK EVALUATION In this phase, taking the result of the improved k-mean clus- tering algorithm with a high rate of accuracy as a target for the neural network algorithm. The results in table3 showed that the neural network was the best accuracy rate in classifying the dataset. Therefore, we achieved the aim of this experiment to improve profiling bank customer’s behavior by creating a new label with unsupervised machine learning techniques.

Figure 3. Shows variation in the gradient coefficient with respect to the number of epochs. As it is shown in the Figure, after epoch number 170, the errors have happened 6 times and the test is stopped at epoch number 176. The final value of the gradient coefficient at epoch number 176 is 0.073403, which is approximate near to zero. The minimum the value of the gradient coefficient better will be training and testing of networks.

TABLE 3. The evaluation of the proposed neural network model.

FIGURE 3. The training state plot of the proposed model.

FIGURE 4. The confusion matrix of the neural network classifier.

Table 3 shows the results of accuracy measures, which are got from applying the proposed ANN algorithm on the dataset on Matlab. It achieved a high accuracy ratio by dif- ferent measures. The accuracy rate equal 98.08%, achieve F-measure as 95.19% and G-mean equal 97.96%.

The confusion matrix is shown in fig. 4 is a table that is used to represent the performance of our classification model or (‘‘classifier ANN’’) on a set of test data to show the true values. By this matrix, the algorithm visualization of the performance was detected. It produces an easy determination

VOLUME 7, 2019 109325

E. A. E. Dawood et al.: Improve Profiling Bank Customer’s Behavior Using Machine Learning

TABLE 4. The clusters result from the proposed neural network model.

TABLE 5. The results of the earlier founding researches.

of confusion between classes. The performance measures are calculated from this confusion matrix.

By scanning the confusion matrix of the neural network, it achieves an accuracy rate for the neural network in Matlab equal 98.08%.

This confusion matrix shows that there are five clusters with a different number of customers. We can profile them a Table 4 shows the five clusters and the number of customers in each cluster.By scanning and analyzing the results with the dataset, it showed that the platinum cluster with 5765 cus- tomers is the best. After that the golden, bronze, silver and classic clusters with 5580, 5171, 6832 and 5858 customers respectively.

2) RESULTS OF THE EARLIER FOUNDING RESEARCHES Table 5 shows that by comparing our results with paper [7], [22]–[24]; we found that our proposed model achieved the best result in the accuracy measures. The earlier researches we have found using the same dataset and the same technique (ANN).

VII. CONCLUSION AND FUTURE WORK Profiling has allowed the banks to build an interactive rela- tionship based on humanistic experience and trust.

Clustering techniques used to divide large datasets into clusters. Proposed modification in the K-Means clustering vanished off the two major drawbacks of K-Means clustering that are the accuracy level and calculation time consumed in clustering the dataset. The careful analysis of the profiling environment should be made to ensure effective and efficient segmenting of the bank’s customer pool to help design its service and product offering to win customer loyalty and

satisfaction. The supervised machine learning showed high accurate results of profiling than the unsupervised technique by creating a new label target for the dataset. The artificial neural network showed the highest accuracy by seven differ- ent measures. So that any bank in the future can use this model and technique to improve profiling of its customer, get high profitability, and reduce the risk.

In future work, we try to improve the effectiveness and performance of our proposed approach by applying some deep learning algorithms In medical informatics.

REFERENCES [1] S. S.-Schwartz and S. Ben-David, UnderstandingMachineLearning:From

Theory to Algorithms. Cambridge, U.K.: Cambridge Univ. Press, 2014. [2] M. Sharahi and M. Aligholi, ‘‘Classify the data of bank customers using

data mining and clustering techniques (case study: Sepah bank branches Tehran),’’ J. Appl. Environ. Biol. Sci., vol. 5, no. 5, pp. 458–464, 2015.

[3] M. Ayoubi, ‘‘Customer segmentation based on CLV model and neural network,’’ Int. J. Comput. Sci., vol. 13, no. 2, p. 31, 2016.

[4] S. Palaniappan, A. Mustapha, C. F. M. Foozy, and R. Atan, ‘‘Customer profiling using classification approach for bank telemarketing,’’ Int. J. Inform. Vis., vol. 1, nos. 2–4, pp. 214–217, 2017.

[5] A. Bansal, M. Sharma, and S. Goel, ‘‘Improved k-mean clustering algo- rithm for prediction analysis using classification technique in data mining,’’ Int. J. Comput. Appl., vol. 157, no. 6, pp. 1–7, 2017.

[6] P. S. Patil and N. V. Dharwadkar, ‘‘Analysis of banking data using machine learning,’’ in Proc. Int. Conf. IoT Social, Mobile, Analytics, Cloud (I-SMAC), Palladam, India, Feb. 2017, pp. 876–881.

[7] S. Yang and H. Zhang, ‘‘Comparison of several data mining methods in credit card default prediction,’’ Intell. Inf. Manage., vol. 10, no. 5, p. 115, 2018.

[8] N. H. Niloy and M. A. I. Navid, ‘‘Naïve Bayesian classifier and classifi- cation trees for the predictive accuracy of probability of default credit card clients,’’ Amer. J. Data Mining Knowl. Discovery, vol. 3, no. 1, p. 1, 2018.

[9] A. Arshad, S. Riaz, and L. Jiao, ‘‘Semi-supervised deep fuzzy c-mean clustering for imbalanced multi-class classification,’’ IEEE Access, vol. 7, pp. 28100–28112, 2019.

[10] Ahram Online. Egypt’s Banque Misr Conditionally Suspends Card Usage Abroad Amid Currency Crisis. Accessed: Apr. 10, 2019. [Online]. Avail- able: http://english.ahram.org.eg/News/246079.aspx

[11] N. M. El Agroudy, F. A. Shafiq, and S. Mokhtar, ‘‘The effect of the rise in the dollar rate on the Egyptian economy,’’ Sciences, vol. 5, no. 2, pp. 509–514, 2015.

[12] H. Hassan and A. Jreisat, ‘‘Does bank efficiency matter? A case of Egypt,’’ Int. J. Econ. Financial Issues vol. 6, no. 2, pp. 473–478, Apr. 2016.

[13] T. Hafez. IN DEPTH-The UPS and Downs of the Egyptian Pound. Accessed: Apr. 9, 2019. [Online]. Available: https://www.amcham.org. eg/publications/business-monthly/issues/256/April-2017/3568/the-ups- and-downs-of-the-egyptian-pound

[14] T. Perraju, ‘‘Artificial intelligence and decision support systems,’’ Int. J. Adv. Res. IT Eng. vol. 2, no. 4, pp. 17–26, Apr. 2013.

[15] M. Kaur, N. Kaur, and H. Singh, ‘‘Adaptive K-means clustering techniques for data clustering,’’ Int. J. Innov. Res. Sci., Eng., Technol., to be published.

[16] H. Chen, X. Wu, and J. Hu, ‘‘An improved K-means clustering algorithm,’’ in Proc. IEEE 3rd Int. Conf. Commun. Softw. Netw., May 2011, pp. 44–46.

[17] F. Baser, S. Gokten, and P. O. Gokten, ‘‘Using fuzzy c-means clustering algorithm in financial health scoring,’’ Audit Financiar J. vol. 15, no. 147, pp. 385–394, 2017.

[18] S. Deb, ‘‘Application of artificial neural networks (ANN)-in designing SODEPUS (study of dynamic earth processes using software),’’ Tech. Rep.

[19] Default of Credit Card Clients Data Set, UCI Mach. Learn. Repository, 2016. [Online]. Available: https://archive.ics.uci.edu/

[20] B. K. Singh, K. Verma, and A. S. Thoke, ‘‘Investigations on impact of feature normalization techniques on classifier’s performance in breast tumor classification,’’ Int. J. Comput. Appl., vol. 116, no. 19, pp. 1–5, 2015.

[21] S. Ray and R. H. Turi, ‘‘Determination of number of clusters in k-means clustering and application in colour image segmentation,’’ in Proc. 4th Int. Conf. Adv. Pattern Recognit. Digit. Techn., 1999, pp. 137–143.

109326 VOLUME 7, 2019

E. A. E. Dawood et al.: Improve Profiling Bank Customer’s Behavior Using Machine Learning

[22] S. Imtiaz and A. J. Brimicombe, ‘‘A better comparison summary of credit scoring classification,’’ Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 7, pp. 1–4, 2017.

[23] M. Pasha, M. Fatima, A. M. Dogar, and F. Shahzad, ‘‘Performance com- parison of data mining algorithms for the predictive accuracy of credit card defaulters,’’ Int. J. Comput. Sci. Netw. Secur., vol. 17, no. 3, pp. 178–183, 2017.

[24] V. Pyzhov and S. Pyzhov, ‘‘Comparison of methods of data mining tech- niques for the predictive accuracy,’’ Tech. Rep., 2017.

EMAD ABD ELAZIZ DAWOOD was born in Sharkia, Egypt, in 1989. He received the bachelor’s degree in information systems from the Science Valley Academy, in 2010. He is currently pursuing the master’s degree in information sys- tems with the Arab Academy for Science, Technol- ogy and Maritime Transport (AASTMT), Cairo, Egypt, where he is a Research Scholar with the Department of Computing and Information Tech- nology. He is a Teaching Assistant with the higher

valley institute of information systems. He is also the Head of the Youth Welfare Authority, Science Valley Academy. His main fields of research interests include data mining and machine learning.

ESSAMEDEAN ELFAKHRANY received the B.S. and M.S. degrees from the Military Technical College (MTC), Cairo, Egypt, in 1986 and 1991, respectively, and the Ph.D. degree in system engineering from The Ohio State University, in December 1999. He is currently an Asso- ciate Professor with the Computer Science Depart- ment, Arab Academy for Science, Technology and Maritime Transport. His research interests include data science, ontological knowledge representa-

tion, semantic Web, and the IoT streaming data analytics. He is interested in teaching artificial intelligence, knowledge management, decision support systems, and theory of computation.

FAHIMA A. MAGHRABY received the B.S., M.S., and Ph.D. degrees in computer science from Ain Shams University, Cairo, Egypt, in 2003, 2008, and 2014, respectively. From 2004 to 2014, she was a Lecturer Assistant with the Institute of Computer Science, Shorouk Academy, Cairo. Since 2014, she has been a Lecturer with the Faculty of Computing and Information Tech- nology, Arab Academy for Science, Technology and Maritime Transport (AASTMT), Cairo. Her

research interests include bioinformatics, imaging processing, artificial intel- ligence, and blockchain.

VOLUME 7, 2019 109327

  • INTRODUCTION
  • LITERATURE SURVEY
  • METHODS
    • K-MEANS ALGORITHM
    • IMPROVED K-MEANS CLUSTERING ALGORITHM
    • FUZZY C-MEANS CLUSTERING
      • ALGORITHMIC STEPS FOR FUZZY C-MEANS CLUSTERING
    • ARTIFICIAL NEURAL NETWORKS (ANN)
    • EVALUATION METRICS
  • DATA SET
  • PROPOSED FRAMEWORK
    • DATA PREPROCESSING
      • CLASSIFICATION USING MACHINE LEARNING ALGORITHMS
  • EXPERIMENTS AND ANALYSIS
    • ANALYSIS AND COMPARISON
      • NEURAL NETWORK EVALUATION
      • RESULTS OF THE EARLIER FOUNDING RESEARCHES
  • CONCLUSION AND FUTURE WORK
  • REFERENCES
  • Biographies
    • EMAD ABD ELAZIZ DAWOOD
    • ESSAMEDEAN ELFAKHRANY
    • FAHIMA A. MAGHRABY