Crime Reduction Policy
ISSN: 2229-6959 (ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, APRIL 2017, VOLUME: 07, ISSUE: 03 DOI: 10.21917/ijsc.2017.0202
1459
SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING
TECHNIQUES
H. Benjamin Fredrick David1 and A. Suruliandi2 Department of Computer Science and Engineering, Manonmaniam Sundaranar University, India
Abstract
Data Mining is the procedure which includes evaluating and
examining large pre-existing databases in order to generate new
information which may be essential to the organization. The extraction
of new information is predicted using the existing datasets. Many
approaches for analysis and prediction in data mining had been
performed. But, many few efforts has made in the criminology field.
Many few have taken efforts for comparing the information all these
approaches produce. The police stations and other similar criminal
justice agencies hold many large databases of information which can
be used to predict or analyze the criminal movements and criminal
activity involvement in the society. The criminals can also be predicted
based on the crime data. The main aim of this work is to perform a
survey on the supervised learning and unsupervised learning
techniques that has been applied towards criminal identification. This
paper presents the survey on the Crime analysis and crime prediction
using several Data Mining techniques.
Keywords: Criminology, Crime Analysis, Crime Prediction, Data Mining
1. INTRODUCTION
Historically solving crimes has been the right of the criminal
justice and law enforcement specialists. With the increase in the
use of the computerized systems to track crimes and trace
criminals, computer data analysts have started lending their hands
in helping the law enforcement officers and detectives to speed up
the process of solving crimes. Criminology is process that is used
to identify crime and criminal characteristics. The criminals and
the crime occurrence possibility can be assessed with the help of
criminology techniques. The criminology aids the police
department, the detective agencies and crime branches in
identifying the true characteristics of a criminal. The criminology
department has been used in the proceedings of crime tracking
ever since 1800. Crimes are a social nuisance and cost our society
dearly in several ways. Even, the Indian Government has taken
steps to develop applications and software for the use of State and
Central Police in relation with the National Crime Records Bureau
(NCRB) [27]. Any research that can help in solving crimes faster
will pay for itself. About 10% of the criminals commit about 50%
of the crimes [15]. People who study criminology will be able to
identify the criminals based on the traces, characteristics and
methods of crime which can be collected from the crime scene. In
the middle of 1990s, data mining came into existence as a strong
tool to extract useful information from large datasets and find the
relationship between the attributes of the data [11]. Data mining
originally came from statistics and machine learning as an
interdisciplinary field, but then it was grown a lot that in 2001 it
was considered as one of the top 10 leading technologies which
will change the world [12]. According to many researchers such
as Nath [23], solving crimes is a difficult and time consuming task
that requires human intelligence and experience and data mining
is one technique that can help us with crime detection problems.
For solving crimes faster we have to develop a data mining
paradigm that performs an interdisciplinary approach between
computer science and criminal justice. As said earlier, the
Criminology is a process that aims to identify crime
characteristics and it is one of the most important fields for
applying data mining. By using this, data mining algorithms will
be able to produce crime reports and help in the identification of
criminals much faster than any human could. Because of this
remarkable feature, there is a growing demand for data mining in
criminology. Actually, Crime analysis is a process which includes
exploring the behavior of the crimes, detecting crimes and their
relationships with criminals. The huge volume of crime and
criminal datasets and the complexity of relationships between
these kinds of information have made criminology an appropriate
field for applying data mining techniques. Identifying crime
characteristics is the first step for proceeding with any further
analysis. The quality of data analysis depends greatly on
background knowledge of analyst. A criminal can range from
civil infractions such as illegal driving to terrorism mass murder
such as the 9/11 attacks, therefore it is difficult to model the
perfect algorithm to cover all of them [21]. The knowledge that is
gained from Data Mining approaches is a very useful and this can
help and support, the police. More specifically, we can use
classification and clustering based models to help in identification
of crime patterns and criminals. The wide range of data mining
applications in the criminology has made it an important field of
research. Data mining systems have played as a key role in
assisting humans in this forensic domain and criminology domain.
This makes it one of the most challenging decision-making environments for research.
The motivation for proceeding with this survey work is to aid
a helping hand to the young researchers who are performing their
research in criminal analysis and crime prediction areas. The
paper is organized in such a manner to provide insights about the
crime analysis procedure and then produce different types of
crime analysis operations and those which can be applied together
for producing an end user product which can be applied to the
crime analysis in any police stations and detective agencies. This
work will be a valuable reference to those who precede their
research work in the crime analysis and Crime prediction using
data mining techniques.
This survey paper is organized in such a manner for easy
understanding of the concepts. The general crime analysis
procedure is discussed in section 2. The Criminal analysis
methods are discussed in the section 3 which will include all the
different types of methods grouped under their own categories.
Finally, section 4 gives the Qualitative analysis of the Crime
Analysis and Prediction techniques and section 5 gives the
H. BENJAMIN FREDRICK DAVID AND A. SURULIANDI: SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES
1460
Quantitative analysis of the Crime Analysis and Prediction
techniques.
2. CRIME ANALYSIS PROCEDURE
Usually, the crime analysis tasks can be a tedious process for
the police or the investigation team to work with. The criminals
when leaving the crime scene does leave some traces which can
be used as a clue to identify the criminals. The crime sequence
and the patterns which several criminals follow when committing
a crime make it easy for analyzing the crime. This process
includes several procedures to be followed in order to identify the
criminals and getting more information based only on the clues or
information given by the local people. The criminal can be
analyzed based on the information from the crime scene which is
tested against the previous crime patterns and judging by the
method which is implied to test and proceed with the information
that can affect the prediction results. The prediction can be further
made useful for detecting the crimes in advance or by adding more
cops to the sensitive areas which are identified by the system .The
police stations can put up special force when there are chances for
crime ahead of time. This type of the system will ensure there are
peace and prosperity among the citizens.
The crime analysis can be performed procedure which is similar
to figure Fig.1 which specifies each module which is used for
machine learning to predict the crime or form group of clusters of
criminals according to crime records. The criminals can hold
certain properties and their crime characteristics and crime careers
may vary from one criminal to another. Such a type of information
can be taken as the input dataset. The input dataset is given to a pre-
processor which performs the preprocessing based on the
requirements. Once the pre processing is completed the features or
attributes from those information are extracted which may be in the
form of text content from emails, the crime factors for a day,
criminal characteristics, geo-location of the criminal, etc,. The pre
processed result is further given to the classification algorithm or
the clustering algorithm based on the requirements. The
requirements may be anything from selecting the crime prone areas
to predicting the criminal based on the previous crime records.
Fig.1. Crime Prediction and Crime clustering based on the input
dataset
The classification algorithm works in a supervised learning
manner in which the training and testing phase is required in order
to train the classifier to identify the new unknown crime record.
This is known as prediction. Whereas the clustering algorithm
works in an un-supervised learning manner which automatically
separates the crime records based on the number of groups to be
created. The groups created in such a manner are known as
clusters. Such a type of design can be a general template for
applying crime prediction and crime analysis based on data
mining algorithms.
3. CRIMINAL ANALYSIS METHODS
3.1 TEXT, CONTENT AND NLP-BASED METHODS
Sharma [1] proposed a concept which depicts zero crime in
the society. For detecting the suspicious criminal activities, he has
concentrated on the importance of data mining technology and
designed a proactive application for that purpose. In his paper, he
proposed a tool which applies an enhanced Decision Tree
Algorithm to detect the suspicious e-mails about the criminal
activities. An improved ID3 Algorithm with an enhanced feature
selection method and attribute-importance factor is applied to
produce a better and faster Decision Tree based on the
information entropy which is explicitly derived from a series of
training data sets from several classes. He proposed a new
algorithm which is a combination of Advanced ID3 classification
algorithm and enhanced feature selection method for the better
efficiency of the algorithm.
Hamdy et al. [8] described an approach based on the people’s
interaction with social networks and mobile usage such as
location markers and call logs. Their work also introduced a
model for detecting suspicious behavior based on social network
feeds and it not only describes a new method using the social
interaction of people but, their work proposes a new system to
help crime analysis create faster and precise decisions. The
suspicious movement of the entity can be determined using the
sequence of inference rules. Their constructed model is able to
predict and characterize human behavior from reality data sources
3.2 CRIME PATTERNS AND EVIDENCE-BASED METHODS
Bogahawatte and Adikari [2] proposed an approach in which
they highlighted the usage of data mining techniques, clustering
and classification for effective investigation of crimes and criminal
identification by developing a system named Intelligent Crime
Investigation System (ICSIS) that could identify a criminal based
up on the evidence collected from the crime location. They used
clustering to identify the crime patterns which are used to commit
crimes knowing the fact that each crime has certain patterns. The
database is trained with a supervised learning algorithm, Naïve
Bayes to predict possible suspects from the criminal records. His
approach includes developing a multi-agent for crime pattern
identification. There are agents for the place, time, role trademark
and substance of criminals which separates the role of the
criminals in components. The system is a multi-agent system and
made with managed Java Beans. It makes it easy to encapsulate
the requested entities in the work into objects and returns it to the
bean for exposing properties. Classifying the criminals/ suspects is
based on the Naïve Bayes classifier for identifying most possible
suspects from crime data. Clustering the criminals is based on the
model to help to identify patterns of committing crimes.
Agarwal et al. [3] used the rapid miner tool for analyzing the
crime rates and anticipation of crime rate using different data
Input dataset
Preprocessing
Feature Selection
Classification Clustering
Crime Prediction Crime clusters
ISSN: 2229-6959 (ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, APRIL 2017, VOLUME: 07, ISSUE: 03
1461
mining techniques. Their work done is for crime analysis using
the K-Means Clustering algorithm. The main objective of their
crime analysis work is to extract the crime patterns, predict the
crime based on the spatial distribution of existing data and
detection of crime. Their analysis includes the tracking homicide
crime rates from one year to the next
Kiani et al. [4] performed a crime analysis work based on the
clustering and classification techniques. Their work includes the
extraction of crime patterns by crime analysis based on available
criminal information, prediction of crimes based on the spatial
distribution of existing data and crime recognition. They proposed
a model in which the analysis and prediction of crimes are done
through the optimization of outlier detection operator parameters
which is performed through the Genetic Algorithm. The features
are weighted in this model and the low-value features were
deleted through selecting a suitable threshold. After which the
clusters are clustered by the k-means clustering algorithm for
classification of crime dataset.
Satyadevan et al. [5] has done a work which will display high
probability for crime occurrence and can visualize crime prone
areas. Instead of just focusing on the crime occurrences, they are
focusing mainly on the crime factors of each day. They used the
Naïve Bayes, Logistic Regression and SVM classifiers for
classification of crime patterns and crime factors of each day.
Their method consists of a pattern identification phase which can
identify the trends and patterns in crime using the Apriori
Algorithm. The prediction of crime spots is done with the help of
Decision Tree algorithm which will detect the crime possible
areas and their patterns.
Bruin et al. [7] proposed a technique which is used to
determine the clustering of criminals based on the criminal
careers. The criminal profile per offense per year is extracted from
the database and a profile distance is calculated. After that, the
distance matrix in profile per year is created. The distance matrix
including the frequency value is made to form clusters by using
naïve clustering algorithm. They made a criminal profile which is
established in a way of representing the crime profile of an
offender for a single year. With this information, the large group
of criminals is easily analyzed and they predicted the future
behavior of individual suspects. It will be useful for establishing
the clear picture on different existing types of criminal careers
They tested the tool on actual Dutch National Criminal Record
Database for extracting the factors for identifying the criminal
careers of a person.
3.3 SPATIAL AND GEO-LOCATION BASED METHODS
Huang et al. [6] focused on a different approach for criminal
activity prediction based on mining location based Social
Network interactions. By using these interactions, they can collect
information using the geographical interactions and data
collections from the people. They devised a working procedure in
which a series of features are categorized from the Foursquare and
Gowalla used in the San Francisco Bay area. The crime patterns
and the crime occurrences are tracked with the geographical
features which are extracted from the map and they are analyzed
to detect the urban areas with high crime activities. Their work
aims at exploiting the location-based social network data to
investigate the criminal activities in urban areas. By using the
Haversine formula the distance between the two points i.e. the
crime location and venue location is calculated and shown in the
Google Maps API and OpenStreetMap.
Chen [19] have presented a general framework for crime data
mining that draws on experience gained with the Coplink project
with the researchers at Arizona and their work mainly focuses on
showing the relationships between crime types and the link
between the criminal organizations. They used a concept space
approach which will extract criminal from the incident summaries.
Yu [20] have discussed the preliminary results of a crime
forecasting model developed in collaboration with the police
department of a United States city in the Northeast. Their approach
is to architect datasets from original crime records. The datasets
contain aggregated counts of crime and crime-related events
categorized by the police department. The location and time of
these events is embedded in the data. Additional spatial and
temporal features are harvested from the raw data set. Second, an
ensemble of data mining classification techniques is employed to
perform the crime forecasting. Then they analyzed a variety of
classification methods to determine which is best for predicting
crime “hotspots”. They even investigated classification on increase
or emergence. Last, they have proposed the best forecasting
approach which is aimed at achieving the most stable outcomes.
Rizwan et al. [22] have performed classification of crime
dataset to predict Crime Category for different states of the United
States of America. The crime dataset that they used in this research
is real in nature. That is, it was collected from socio-economic data
from 1990 US Census, law enforcement data from the 1990 US
LEMAS survey, and crime data from the 1995 FBI UCR. Their
work compared the two different classification algorithms namely,
Naïve Bayesian and Decision Tree for predicting Crime Category
for different states in USA. The results from their experiment
showed that, Decision Tree algorithm out performed Naïve
Bayesian algorithm and achieved 83.9519% Accuracy in
predicting Crime Category for different states of USA.
Donald [24] have proposed a system for Crime Analysis which
was named by them as The Regional Crime Analysis Program
(ReCAP) system. It was designed by them as a computer
application designed to aid local police forces (e.g. University of
Virginia (UVA), City of Charlottesville, and Albemarle County)
in the analysis and prevention of crime. ReCAP works in
cooperation with the Pistol 2000 records management system,
which aggregated and housed all of the crime information from a
region. Their research and development was primarily focused on
the individual components of the system which includes a
database, geographic information system (GIS), and data mining
tools which consisted of data mining alrogithms which produced
spatial mining results over the crime hotspots. Their system
consists of the seamless integration of all the components in the
system.
3.4 PRISONER BASED METHODS
Sheehy et al. [10] came up with a research idea which was
geared towards the treatment of the mentally ill people inside the
prison. According to their work, the mentally ill criminals are
identified using their Social Security Number (SSN) with all the
criminal personal records and their crime career records attached.
As the outcome, the Criminals are classified into “high”,
“medium” and “low” levels of recidivism risk potential according
H. BENJAMIN FREDRICK DAVID AND A. SURULIANDI: SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES
1462
to their mental health. Their objective was to describe and classify
the criminals into a misdemeanor and a felony which can be
referred and not referred based on the mental health of the
criminals. Their ill activities are monitored and data collection is
continuous. By these, the criminals can be separated from other
criminals who are hazardous and those who can cause damage to
other inmates along with them. Further, their study also involves
the classification of the mental health of the criminals into two
categories i.e. “referred” and “not-referred”. This helps the guards
to identify the prisoners who are referred for the mental health
check-up. The research work they had undergone will provide a
summary of the inmates who are seriously mentally ill and those
who are to be separated from the other inmates.
3.5 COMMUNICATION BASED METHODS
Taha et al. [9] has developed a forensic investigation tool for
identifying the influential members who create an impact in a
criminal organization. The immediate leaders can also be
identified in a criminal organization. Removing these influential
members can weaken the strength of the criminal organization.
Their work is based on this methodology. They proposed a new
work which is known as SIIMCO which first constructs the graph
representing the criminal group or organization as a network from
either mobile communication data of the criminal organization or
based on the crime records. The system works on the basis of the
created networks. These networks represent the criminal
organization or crime incident reports. The vertex represents the
individual criminals and the link represents the relationships or
communication link between those two criminals. They employed
certain formulas that quantify the degree of influence/ importance
of each vertex in the network relative to all other vertices i.e.
criminals in the graph. Based on this their system identifies the
immediate leaders with the weighted graph which connects the
criminals and identify them for further processing.
4. QUALITATIVE ANALYSIS OF CRIME ANALYSIS AND PREDICTION
APPROACHES
The prediction can be made based on the Textual information
or the Geospatial information or even the prisoner records which
were manually recorded. By using the real open data such as
internet, social feeds and messages the researcher can use the text
processing or NLP techniques to mine information from the data
and categorize the e-mails, messages or posts into a suspicious or
a non-suspecious record [1]. Whereas in the Spatial mining area,
the extraction of features from SNAP Gowalla dataset, DataSF
criminal dataset up to February 2015 provides the way to plot the
crime occurences on the Google Map which is interpreted easily.
The communication based methods describe the identification of
the leaders in a criminal organization may be a tedious process.
Kamal Taha et al. [9] produced an approach through the phone
calls and other communication data such as call logs and records,
the influential members on a crime organization can be tracked.
Kevin Sheehy, Thomas Rehbreger, Andrew O’Shea, William
Hammond, Charlotte Blais, Michael Smith K., Preston White, Jr.,
Neal Goodloe [10] introduced an approach to categorize and
identify the mentally ill prisoners among the prisoners and keep
them separate from other prisoners to avoid conflict and injuries
between them. Even though there are many methods for analyzing
the crimes , this paper concludes many results based on the
qualitative analysis When considering the Text/NLP based
methods, Hamdy et al. [8] overcame the defects from the work of
Sharma [1] based on many factors such as implementation of
preprocessing for the data and extraction of relevant features.
Both the paper labels the outcome based on suspicious activity.
Mugdha Sharma [1] used an enhanced ID3 algorithm whereas the
work produced by Ehab Hamdy, Ammar adl, Aboul Ella
Hassanien, Osman Hegazy and Tai-Hoon Kim. [8] does not
specifies the classification algorithm. The weakned of this paper
is mostly about not giving the clear view of the pre processing and
classification alogorithm. When considering crime patterns and
evidence based methods, there are clustering and classification
based papers. Bogahawatte and Adikari [2] concentrated on using
the Naïve Bayes for finding out most possible suspect. Jyoti
Agarwal et al. [3] on the other hand focused on crime analysis by
implementing the K-Means clustering algorithm on crime dataset
using rapid miner tool and the author had performed the crime
analysis by considering the homicide crimes and plotting it with
respect to year. Kiani et al. [4] concentrated on using the Genetic
Algorithm to optimize the distance operator parameter of the
decision tree using GINI index. The clustering of the criminal
careers has been effectively done in the work [7]. Whereas, Shiju
Satyadevan, et al. [5] have performed a comparison of the Naïve
Bayes, SVM, Logistic Regresion and Decision Tree. This paper
presents the crime prone regions and represented as heatmaps
which indicate the level of heat. When considering the Spatial and
Geolocation based methods, all these methods are analysed based
on qualitative manner and the analysis information is described in
the below mentioned table Table.1.
Table.1. Qualitative Analysis of Crime Analysis and Prediction
METHOD INPUT DATASET
USED
PRE
PROCESSING
FEATURE
EXTRACTION
CLASSIFICATION/
CLUSTERING STRENGTH WEAKNESS OUTCOME
Text/
NLP-
based
methods
[1] E-mail
messages
Real and
open emails
sent by
terrorists
and some
are dummy
emails
Nil
Selection of a
subset of the
original text
containing “kill”,
“death”, “bomb”,
“guns”, “blasts”
Enhanced ID3
Decision Tree
algorithm
Introducing
attribute
importance as
a factor before
information
gain in the
decision tree
Nil
Labeling email
as Suspicious,
Non-
suspicious, and
May be
suspicious
[8]
Crime
history, age,
previous
Device
sensors,
Security
Structuring
collective data
into {Time,
Similarity
matching for
sensory images
A trained
classification model
is used to predict the
Consideration
of location
feeds and
Not giving a
clear view of
the processing
Suspicious
behavior to
three levels
ISSN: 2229-6959 (ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, APRIL 2017, VOLUME: 07, ISSUE: 03
1463
arrests,
Modus
Operandi,
countries
visited,
place of
birth,
Average use
of ATM,
Types of
crimes,
Entrance
with respect
to Time of
Day, Crime
areas,
Victims’
mistakes
camera
information,
Messages,
Audio feeds,
Social
network
posts and
messages
Final
Movement,
Frequency rate,
Video, Images,
Audio }
using sliding
window. Text
semantic Analysis
of the text
information
performed using
Lexical processing,
Natural Language
Processing (NLP).
similarity of a given
input to the
suspicious item or
location.
mobile usage
information
and
comparison of
criminal
behavior.
such as
“High”,
“Medium” and
“Low”
Crime
patterns
and
Evidenc
e-based
methods
[2]
Crime
evidences
including
many
attributes
like crime
scene, day,
month,
offense,
resources
used, time,
role in
crime,
transportatio
n etc.,
Colombo
crime and
criminal
records
Nil Extraction of
evidence
Clustering based
model to identify
patterns of
committing crimes.
Naïve Bayes
classifier applied to
find most possible
suspect
Uses Naïve
Bayes so this
can be even
suitable for
small datasets.
No clear view
of clustering
method and
Prisoner
verification
Finding
Categories as
robbery,
burglary, and
theft
Classifying
person as
“suspect” and
after judgment
“criminal”
[3]
Homicide
crimes and
their
occurrences
Crime
dataset for
crime
analysis by
polices in
England and
Wales from
1990 –
2011-12
Nil
Extraction of crime
patterns based on
the available crime
and criminal data
K-means clustering
algorithm
Produces year
wise clusters
of homicide
crimes
committed
Concentration
is only on
clustering of
homicide
crimes
Year and
analysis of
variation in
clusters formed
[4]
Burglary,
Robbery,
and
Homicide
Crime
dataset for
crime
analysis by
polices in
England and
Wales from
1990 – 2011
Nil
Filtering of
dataset, Outlier
detection using
distance operator
(k-NN), Genetic
Algorithm used for
optimizing of
outlier detection
operator
parameters
Classification was
done using Decision
Tree using GINI
index and the testing
and training done
using Sample
Stratified
Use of GA to
optimize the
distance
operator
parameters in
Clustering and
Predict the
cluster’s
members
based on
classification
using Decision
Tree
The number of
clusters in the
clustering
process needs
to be
optimized and
further
optimization
of the
technique
needs to be
done
The results for
the optimized
and non-
optimized
parameters
were compared
to show the
difference in
quality and
effectiveness
[5]
location,
date, type of
crime data
extracted
from
Websites,
Blogs,
Social
Media, RSS
Feeds
Websites,
Spatial
Information,
and date
about crimes
Nil
Extraction of the
following crime
data related to
“vandalism”,
“murder”,
“robbery”,
“burglary”, “sex
abuse”, “gang
rape”, “arson”,
“armed robbery”,
Naïve Bayes, SVM,
Logistic regression
Crime prediction was
done using decision
tree which is done
using sample police
complaints
Comparison of
Naïve Bayes
with SVM.
Decision Tree
is easy to
interpret and
understand for
crime spot
identification.
Not predicting
the time in
which the
crime is
happening.
The crime-
prone areas
(regions) are
graphically
represented
using a heat
map which
indicates the
level of crimes
H. BENJAMIN FREDRICK DAVID AND A. SURULIANDI: SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES
1464
“highway
robbery”,
“snatching”
[7]
Crime
database and
criminal
information
National
Crime
Record
Database
Nil
Crime nature,
frequency,
duration, severity
Crime profile of
offender for single
year is determined for
comparison and he
Development
of new
distance
measures with
combination
of profile
distance with
crime
frequency of
criminals
The runtime
of the chosen
approach is
not optimal
Clustering of
criminal
careers based
on the nature.
One time
criminal,
severe
criminals and
minor career
criminals
Spatial
and
Geo-
location
based
methods
[6]
Geo-
location and
Crime Type
SNAP
Gowalla
dataset,
DataSF
criminal
dataset up to
February
2015
Extraction of
crime type like
Assault,
Robbery, Theft,
Vandalism,
Drug
Geographical
features,
Popularity,
Location category,
Neighbor entropy,
Social Tightness
density, crime
location, venue
from Foursquare
Random Forest(RF),
Linear Regression
(LR) and Support
Vector Machine
(SVM)
Random Split
method
utilized with
80% for
training and
20% for
testing in
classification
Nil
Crime Areas
plotted using
Google Map
API and
OpenStreetMa
p in San
Francisco Bay
area and
Criminal
pattern
discovery
according to
the context of
user activity
and location-
based social
networks.
Predict crime
frequency and
find which
crime is to be
more difficult
or easier to be
predicted
Commu
nication
based
methods
[9]
Flow of
communicat
ions/informa
tion links
between two
criminals
(e.g., phone
call records,
messages,
etc.), names
of
criminals/su
spects, the
type of
crime,
location and
date of the
crime.
Real-world
communicat
ion records
(DBLP,
Enron email
dataset,
Nodobo
mobile
phone
records
dataset)
Creating the
graph based on
the data and
then assigning
weight to a
vertex based on
its number of
communication
attempts in the
criminal graph
The immediate
leaders of lower-
level criminals and
the lower-level
criminals
themselves are
extracted.
Evaluation of the
accuracy of the three
systems by measuring
their Recall,
Precision, and
Euclidean Distance.
Evaluated
SIIMCO by
comparing it
experimentall
y with
CrimeNet
Explorer and
LogAnalysis
Nil
System can
identify the
influential
members of a
criminal
organization
and the
immediate
leaders of a
given list of
lower-level
criminals
Prisoner
based
methods
[10
]
The Social
Security
Number
(SSN) with
all the
criminal
personal and
crime career
records.
Albemarle-
Charlottesvil
le Regional
Jail (ACRJ),
Jefferson
Area
Community
Corrections
(JACC) and
A combination
which includes
the Social
Security
Number (SSN)
and date was
used to link the
databases
together.
age, criminal
history,
employment
history, crime
type := “assault”,
“larceny”,
“supervision
violations”,
“narcotics
Offenders are
classified into three
classes namely
“high”, “medium”,
and “low” as levels of
recidivism risk
potential. Further, the
mental health status
of the inmates is
Analysis for
the
identification
of the
mentally ill
felony.
Statistical
classification
of criminals
missing.
Could have
taken more
features
“Referred”
individuals can
be made to
have a longer
stay in jail
longer than
“not-referred”
individuals.
ISSN: 2229-6959 (ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, APRIL 2017, VOLUME: 07, ISSUE: 03
1465
Region Ten
Community
Services
Board.
charges”, “traffic
violations”,
“driving while
intoxicated”,
categorized into two
categories “referred.”
and “not-referred.”
5. QUANTITATIVE ANALYSIS OF CRIME ANALYSIS AND PREDICTION
APPROACHES
For performing the quantitative analysis of the methods taken,
the performance metric value needed to be computed and they are
to be compared with the other. Hence, for performing the
calculations of the performance metric there are a few formulas
which can be utilized for achieving the performance value from
the dataset. The formulae for the calculation of the performance
metrics are given below in Table.2.
Table.2. Metrics and their formula
METRIC FORMULA
Accuracy FNFPTNTP
TNTP
Error Rate 100 - Accuracy
Precision FPTP
TP
Recall FNTP
TP
F-value
RP
11
2
where, P is the Precision and R is the Recall
Although many papers were studied in the literature review all
the papers were irrevalent to the crime prediction and criminal
analysis domain. Hence a few papers in the crime analysis and
prediction domain has been taken and their results have been
reproduced as given originally in the reference papers. The below
given table Table 3 provides quantitative analysis of the three tools
and the Decision Tre algorithm which is suppored with the Genetic
Algorithm for optimization of the parameters. When the parameters
are optimized, the classification accuracy of the Decision Tree is
increased a bit further. This shows that although the Decision tree
performs well, when it used with the Genetic Algorithm for
optimization of the decision tree paramters, the results shown show
significant improvement in the accuracy and further more the tools
given below have the metric value, which is purely based on the
dataset and the records and the performance values are taken as it
is in the reference paper. The quantitative analysis produced results
which show the increase in the accuracy level of classification
because of using the GA to optimize the parameters. This occurs
because of the ability of the GA to learn the optimal values and then
it is applied to set the parameter to optimal value when performing
calculation. Also, the Precision, Recall and F-value varies from the
dataset and the system. This shows the SIIMCO performing well
when defined in terms of the metrics.
Table.3. Quantitative Analysis of Crime Analysis & Prediction
NAME OF
THE
METHOD/
SYSTEM
PERFORMANCE
METRIC
PERFORMANCE
VALUE
[4]
Decision
Tree
classification
with GA for
optimizing
the the
parameters
Accuracy of
Prediction
Optimized
parameter 91.64%
Non-
Optimized
parameter
85.74%
Clasification Error
Optimized
parameter 8.36%
Non-
Optimized
parameter
13.26%
Fitness Function
Optimized
parameter 72.28%
Non-
Optimized
parameter
72.48%
[9]
SIIMCO
Recall 0.62
Precision 0.56
F-Value 0.59
CrimeNet
Explorer
Recall 0.36
Precision 0.41
F-value 0.38
Log Analysis
Recall 0.53
Precision 0.51
F-value 0.52
6. CONCLUSION
In this paper, we have studied some known approaches for
crime analysis and prediction concerned with data mining.
Although many papers have been studied, only those papers with
background in the crme prediction and criminal identification
papers are compared with a theoretical study. Each paper has their
own advantages and disadvantages. Each paper has its own
individual approach for solving the crimes and criminal
prediction. This is a theoretical study for several methods in
identification of crime and criminals which includes Text/ NLP
based methods, crime patterns and crime evidence based methods,
spatial and geo location based methods, communication based
methods and finally Prisoner based methods. The data mining
techniques studied from this survey can be applied for identifying
the criminals in the society and also for providing a better future
to live in.
H. BENJAMIN FREDRICK DAVID AND A. SURULIANDI: SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES
1466
REFERENCES
[1] Mugdha Sharma, “Z-Crime: A Data Mining Tool for the Detection of Suspicious Criminal Activities based on the
Decision Tree”, International Conference on Data Mining
and Intelligent Computing, pp. 1-6, 2014.
[2] Kaumalee Bogahawatte and Shalinda Adikari, “Intelligent Criminal Identification System”, Proceedings of 8th IEEE
International Conference on Computer Science and
Education, pp. 633-638, 2013.
[3] Jyoti Agarwal, Renuka Nagpal and Rajni Sehgal, “Crime Analysis using K-Means Clustering”, International Journal
of Computer Applications, Vol. 83, No. 4, pp. 1-4, 2013.
[4] Rasoul Kiani, Siamak Mahdavi and Amin Keshavarzi, “Analysis and Prediction of Crimes by Clustering and
Classification”, International Journal of Advanced
Research in Artificial Intelligence, Vol. 4, No. 8, pp. 11-17,
2015.
[5] Shiju Sathyadevan, M.S. Devan and S. Surya Gangadharan, “Crime Analysis and Prediction using Data Mining”,
Proceedings of IEEE 1st International Conference on
Networks and Soft Computing, pp. 406-412, 2014.
[6] Yu-Yueh Huang, Cheng-Te Li and Shyh-Kang Jeng, “Mining Location-based Social Networks for Criminal
Activity Prediction”, Proceedings of 24th IEEE International
Conference on Wireless and Optical Communication, pp.
185-190, 2015.
[7] Jeroen S. De Bruin, Tim K. Cocx, Walter A. Kosters, Jeroen F. J. Laros and Joost N. Kok, “Data Mining Approaches to
Criminal Career Analysis”, Proceedings of 6th IEEE
International Conference on Data Mining, pp. 1-7, 2006.
[8] Ehab Hamdy, Ammar Adl, Aboul Ella Hassanien, Osman Hegazy and Tai-Hoon Kim, “Criminal Act Detection and
Identification Model”, Proceedings of 7th International
Conference on Advanced Communication and Networking,
pp. 79-83, 2015.
[9] Kamal Taha and Paul D. Yoo, “SIIMCO: A Forensic Investigation Tool for Identifying the Influential Members
of a Criminal Organization”, IEEE Transactions on
Information Forensics and Security, Vol. 11, No. 4, pp. 811-
822, 2016.
[10] Kevin Sheehy et al., “Evidence-based Analysis of Mentally 111 Individuals in the Criminal Justice System”,
Proceedings of IEEE Systems and Information Engineering
Design Symposium, pp. 250-254, 2016.
[11] David J. Hand, Heikki Mannila and Padhraic Smyth, “Principles of Data Mining”, MIT Press, 2001.
[12] 10 Emerging Technologies That Will Change Your World, Available at: http://www.rle.mit.edu/thz/documents/10_emerging_tech.p
df.
[13] U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, “The KDD Process for Extracting Useful Knowledge from Volumes of
Data”, Communications of the ACM, Vol. 39, No. 11, pp.
27-34, 1996.
[14] Illhoi Yoo, Patricia Alafaireet, Miroslav Marinov, Keila Pena-Hernandez, Rajitha Gopidi, Jia-Fu Chang and Lei Hua,
“Data Mining in Healthcare and Biomedicine: A Survey of
the Literature”, Journal of Medical Systems, Vol. 36, No. 4,
pp. 2431-2448, 2011.
[15] Shyam Varan Nath, “Crime Pattern Detection using Data Mining”, Proceedings of IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent
Technology Workshops, pp. 1-4, 2006.
[16] Hsinchun Chen, Wingyan Chung, Yi Qin, Michael Chau, Jennifer Jie Xu, Gang Wang, Rong Zheng and Homa
Atabakhsh, “Crime Data Mining: An Overview and Case
Studies”, Proceedings National Conference on Digital
Government Research, pp. 1-5, 2003.
[17] Tong Wang et al., “Learning to Detect Patterns of Crime”, Joint European Conference on Machine Learning and
Knowledge Discovery in Databases, pp. 515-530, 2013.
[18] Karl F. Schuessler and Donald R. Cressey, “Personality Characteristics of Criminals”, American Journal of
Sociology, Vol. 55, No. 5, pp. 476-484, 1950.
[19] H. Chen, W. Chung, J.J. Xu, G. Wang, Y. Qin and M. Chau, “Crime Data Mining: a General Framework and Some
Examples”, Computer, Vol. 37, No. 4, pp. 50-56, 2004.
[20] Chung-Hsien Yu, Max W. Ward, Melissa Morabito and Wei Ding, “Crime Forecasting using Data Mining Techniques”,
Proceedings of 11th IEEE International Conference on Data
Mining Workshops, pp. 779-786, 2011.
[21] P. Thongtae and S. Srisuk, “An Analysis of Data Mining Applications in Crime Domain”, Proceedings of IEEE 8th
International Conference on Computer and Information
Technology Workshops, pp. 122-126, 2008.
[22] Rizwan Iqbal, Masrah Azrifah Azmi Murad, Aida Mustapha, Payam Hassany Shariat Panahy and Nasim
Khanahmadliravi, “An Experimental Study of Classification
Algorithms for Crime Prediction”, Indian Journal of Science
and Technology, Vol. 6, No. 3, pp. 4219-4225, 2013.
[23] Shyam Varan Nath, “Crime Data Mining”, Proceedings of Advances and Innovations in Systems, Computing Sciences
and Software Engineering, pp. 405-409, 2007.
[24] Donald E.Brown, “The Regional Crime Analysis Program (RECAP): A Framework for Mining Data to Catch
Criminals”, Proceedings of IEEE International Conference
on Systems, Man, and Cybernetics, pp. 2848-2853, 1998.
[25] Colleen McCue, “Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis”, Butterworth-
Heinemann, 2014.
[26] Arunima S. Kumar and Raju K. Gopal, “Data Mining based Crime Investigation Systems: Taxonomy and Relevance”,
Proceedings of Global Conference on IEEE Communication
Technologies, pp. 850-853, 2015.
[27] Manish Gupta, B. Chandra and M.P. Gupta, “Crime Data Mining for Indian Police Information System”, Journal of
Crime, Vol. 2, No. 6, pp. 43-54, 2006.