Literature review Summary

profilebharathgdk209
Artificial_Intelligence_in_Cyb.pdf

Artificial Intelligence in Cybersecurity: Concentration on the Effectiveness of Machine

Learning

By: Preston Pham

December 7, 2021

Submitted in Partial Fulfillment of the Requirements for the Doctor of Education degree.

St. Thomas University

Miami Gardens, Florida

ii

Copyright© 2021 by Preston Pham

All Rights Reserved

iii

Copyright Acknowledgement Form

St. Thomas University

I, the writer’s full name, understand that I am solely responsible for the content of this

dissertationand its use of copyrighted materials. All copyright infringements and issues

are solely the responsibly of myself as the author of this dissertation and not St. Thomas

University, its programs,or libraries.

________________________ _____________________

Signature of Author Date

________________________ _____________________

Witness (Martin Nguyen) Date

__________________

Signature of Author

_____________

_____________

iv

St. Thomas University Library Release Form

Artificial Intelligence in Cybersecurity – Concentration on the Effectiveness of Machine

Learning

Preston Pham

I understand that US Copyright Law protects this dissertation against unauthorized use. By

my signature below, I am giving permission to St. Thomas University Library to place this

dissertation in its collections in both print and digital forms for open access to the wider

academic community. I am also allowing the Library to photocopy and provide a copy of

this dissertation for the purpose of interlibrary loans for scholarly purposes and to migrate

it to other forms of media for archival purposes.

________________________ _____________________

Signature of Author Date

________________________ _____________________

Witness (Martin Nguyen) Date

__________________

Signature of Author

_____________

_____________

v

St. Thomas University Dissertation Manual Acknowledgement Form

Artificial Intelligence in Cybersecurity – Concentration on the Effectiveness of Machine

Learning

Preston Pham

By my signature below, I Preston Pham assert that I have read the dissertation publication

manual, that my dissertation complies with the University’s published dissertation

standards and guidelines, and that I am solely responsible for any discrepancies between

my dissertation and the publication manual that may result in my dissertation being

returned by the library for failure to adhere to the published standards and guidelines

within the dissertation manual. The Dissertation Publication Manual may be found:

https://www.stu.edu/library/How-To/Publish-Your-Thesis-or-Dissertation

________________________ _____________________

Signature of Author Date

________________________ _____________________

Signature of Chair Date

__________________

Signature of Author

_____________

_____________

vi

Abstract

Modern networks drive for ubiquitous connectivity and digitalization in support

of globalization but also, inadvertently and unavoidably, create a fertile ground for the

rise in scale and volume of cyberattacks. Countermeasures to these advanced attacks have

never been more crucial than in our present time; hence with Artificial Intelligence (AI),

this technological breakthrough can help augment protective techniques for the defensive

side of cybersecurity. AI improves its knowledge by detecting the patterns and

relationships among data and learns through the data to build self-learning algorithms. It

analyzes relationships between threats like malicious network traffic, suspicious internet

protocol (IP) addresses, or malware files within minutes or even seconds to provide the

intelligence to the organization for quicker response to a threat event than traditional

labor-intensive methods. This paper is intended to explore the phenomenon of AI in

cybersecurity and determine whether the present stage of AI technology and in particular

Machine Learning can help improve cybersecurity. The paper has two main objectives of

testing AI’s threat classification ability against a human cybersecurity analyst and AI’s

prediction ability of future threat events against a renowned time-series data-forecasting

model, the autoregressive integrated moving average (ARIMA) statistical model.

Keywords: cybersecurity, artificial intelligence, classification, prediction, ARIMA

vii

Acknowledgments

It is with genuine pleasure that I would like to express my deep sense of gratitude

and give my warmest thanks to my former professors and committee members which

consist of Dr. Lisa J. Knowles, Dr. Joseph M. Pogodzinski, and Dr. Jose G. Rocha. Their

dedication, advice, meticulous scrutiny, and scholarly advice have helped me to

accomplish my dissertation paper.

I would like to profoundly acknowledge Dr. Knowles, my dissertation chair, for

her kindness, enthusiasm, positivity, and dynamism. Dr. Knowles relentlessly helped me

manage every step along the way to ensure I completed my all my chapters and the work

overall during the time of a global pandemic in 2020-2021. I also thank my former

manager Mr. Raymond Lee, Director of Information Security, for suggesting necessary

technological advice during my research pursuit in writing about the topic of

cybersecurity.

viii

Dedication

The study serves as time-capsule literature to show how a doctorate student at St.

Thomas University performed a study on Artificial Intelligence within the domain of

cybersecurity using the current technologies of the era. The study seeks to be a reference

guide for both academia and industry leaders to further extend the research and

applications of Artificial Intelligence in cybersecurity. The paper is also dedicated to the

men and women in the cybersecurity industry around the globe who are actively fighting

against cybercriminals to protect their organizations, institutions, or government

agencies.

ix

Table of Contents

Copyright Acknowledgement Form ............................................................................... iii

St. Thomas University Library Release Form.................................................................. iv

St. Thomas University Dissertation Manual Acknowledgement Form .............................. v

List of Tables.................................................................................................................xii

List of Figures ............................................................................................................. xiii

List of Formulas ........................................................................................................... xiv

CHAPTER ONE. INTRODUCTION............................................................................... 1

Introduction to the Problem ..................................................................................1

Background, Context, and Theoretical Framework ...............................................2

Statement of the Problem .....................................................................................5

Purpose of the Study ............................................................................................5

Research Question ...............................................................................................6

Rationale, Relevance, and Significance of the Study ............................................6

Nature of the Study ..............................................................................................7

Definition of Terms ..............................................................................................7

Assumptions, Limitations, and Delimitations .......................................................8

Organization of the Remainder of the Study .........................................................9

Chapter One Summary ....................................................................................... 10

CHAPTER TWO. LITERATURE REVIEW ................................................................. 12

Introduction to the Literature Review ................................................................. 12

Review of Research Literature ........................................................................... 14

Chapter Two Summary ...................................................................................... 22

x

CHAPTER THREE METHODOLOGY ........................................................................ 25

Introduction to Methodology .............................................................................. 25

Purpose of Study ................................................................................................ 27

Research Questions ............................................................................................ 27

Research Design ................................................................................................ 27

Data Collection and Data Analysis Procedures ................................................... 30

Target Population, Sampling Method, and Related Procedures ........................... 33

Instrumentation .................................................................................................. 34

Limitations of the Research Design .................................................................... 37

Data Validity Test .............................................................................................. 38

Expected Findings .............................................................................................. 41

Ethical Issues ..................................................................................................... 42

Conflict of Interest Assessment .......................................................................... 42

Chapter Three Summary .................................................................................... 42

CHAPTER FOUR DATA ANALYSIS AND RESULTS ............................................... 44

Introduction to Data Analysis and Results .......................................................... 44

AI vs. Human Analysis in Classification of Threat Events .................................. 44

Detailed Analysis (AI vs. Human Analysis in Classification of Threat Events) ... 47

AI vs. ARIMA Statistical Computation in Prediction of Threat Events ............... 48

Detailed Analysis (AI vs. ARIMA Statistical Computation to Predict Threat

Events) ............................................................................................................... 52

Chapter Four Summary ...................................................................................... 53

CHAPTER FIVE. CONCLUSIONS AND DISCUSSION ............................................. 56

xi

Introduction to Conclusions and Discussion ....................................................... 56

Summary of the Results ..................................................................................... 57

Discussion of the Results ................................................................................... 58

Discussion of the Results in Relation to the Literature ........................................ 60

Limitations ......................................................................................................... 61

Implication of the Results for Practice ................................................................ 62

Recommendations for Further Research ............................................................. 64

Conclusion ......................................................................................................... 65

APPENDIX A. INSTITUTIONAL REVIEW BOARD (IRB) ........................................ 67

REFERENCES .............................................................................................................. 68

xii

List of Tables

Table 1. 10 Intrusion Categories with Depiction of Training and Testing Samples .... 30

Table 2. Results of Precision, Recall, and F1-Score for Classifiers ............................. 40

Table 3. Results of P-Values Stationary Test.............................................................. 41

Table 4. Ordinary Least Squares Regression: AI vs. Cybersecurity Analyst Results ... 46

Table 5. Number of Intrusion Events Detected and Average Time of AI vs.

Cybersecurity Analyst ................................................................................................ 47

Table 6. Spearman’s Rank Correlation Estimation Results ......................................... 50

Table 7. Trend to Month Translog Estimation Results................................................ 52

xiii

List of Figures

Figure 1. Branches of Artificial Intelligence ................................................................. 14

Figure 2. Methodological Literature Used ..................................................................... 22

Figure 3. Honeypot Network Architecture Diagram ...................................................... 28

Figure 4. Data Collection and Analysis Workflow Diagram .......................................... 29

Figure 5. Configuration of Log and Netflow Forwarding .............................................. 30

Figure 6. Example of a Firewall Log ............................................................................. 31

Figure 7. Example of Raw Netflow Data ...................................................................... 31

Figure 8. Display of Log Highlights when Unstructured in Corelight ............................ 32

Figure 9. Display of Log Parsing when Structured in Elastic ......................................... 33

Figure 10. AI vs. Cybersecurity Analyst Intrusion Detection Regression Graph ............ 45

Figure 11. AI vs. Cybersecurity Analyst Intrusion Prediction Regression Graph ........... 49

xiv

List of Formulas

Formula 1. Precision, Recall, and F1-Score Calculation Model .................................. 38

Formula 2. Ordinary Least Squares Regression for Severity Score .............................. 46

Formula 3. Spearman’s Rank Correlation Estimation Model ....................................... 50

Formula 4. Trend to Month Translog Estimation Model ............................................ 51

CHAPTER ONE. INTRODUCTION

Introduction to the Problem

Today’s world is highly network interconnected with the pervasiveness of small

personal devices (e.g., smartphones) as well as large computing devices or services (e.g.,

cloud computing). Each passing day, millions of data bytes are being generated,

processed, exchanged, and consumed by various applications within the cyberspace.

Thus, securing the data and users’ privacy on the world wide web has become an utmost

concern for individuals, business organizations, and national governments (Benavente-

Peces & Bartolini, 2019). With the massive amount of data that travels over the Internet,

it is also a great opportunity for cyber criminals to take advantage of this phenomena to

attack various organizations’ networks. An ever-growing percentage of cyberattacks is

explicitly targeted at specific organizations to steal intellectual properties or sensitive

data; perform espionages; and execute industrial sabotages or denial of services

(Apruzzese, et al., 2018). Although, organizations can employ human analysts to detect

threat agents on their network, yet the amount of time for a human analyst to triage the

malicious activities could take hours, days, or even months of correlating between

multiple data points to identify true positive threat events (Benavente-Peces & Bartolini,

2019). Thus, organizations are now looking at a new prodigy of technological discipline:

Artificial Intelligence (AI) which can gather knowledge by detecting the patterns and

relationships among data, then learn through data architectures to build self-learning

algorithms (Virmani et al., 2020). AI can analyze relationships between threats like

malicious network traffic, suspicious IP addresses, or malware files in seconds or minutes

2

and provide the intelligence to organizations for quicker response to threat events

(Apruzzese, et al., 2018).

Background, Context, and Theoretical Framework

With the rapid expansion in support of globalization, modern networks drive for

ubiquitous connectivity and digitalization, but also, simultaneously and unavoidably,

create a fertile ground for the rise in scale and volume of cyberattacks. Increasing cyber

threats with diversified and sophisticated tactics, cyber criminals and nation state

attackers target the systems that run our day-to-day-lives and easily exposed targets (Al

Qahtani, 2020). Countermeasures to these advanced attacks have never been more crucial

than in our present time; thus, with AI, learning new cyberattack vectors can help

augment protective techniques for the defensive side of cybersecurity. Defense in

cybersecurity can be a set of technologies and processes designed to protect systems,

networks, applications, and data from unauthorized access, alteration, or destruction

(Tyugu, 2011). A cybersecurity defense system consists of a network-based security

system and a host (computer-based) security system. Each of these systems includes

firewalls, antivirus software, intrusion detection and prevention systems (Al Qahtani,

2020). These systems are intended to block certain unwanted traffic; determine and

identify unauthorized system or user behaviors; analyze and distinguish everyday

baseline versus an anomalous event; then lastly eradicate or contain the malicious agent

from further executions.

Calderon and Floridi (2019) believe that AI can improve cybersecurity and

defense measures, allowing for greater system robustness, resilience, and recognition.

First, AI can improve systems’ robustness with the ability of a system to maintain its

3

stable configuration and settings even when it has processed erroneous inputs. Secondly,

AI can strengthen systems’ resilience, that is, the ability of a system to resist and tolerate

an attack without fatal failure or shutdown. Third, AI can be used to enhance system

recognition or detection, in terms of the capacity for a system to discover autonomous

intrusion behaviors and self-identification of vulnerabilities (Calderon & Floridi, 2019).

According to Banoth, et al. (2017), the driving forces that are boosting the use of AI in

cybersecurity are comprised of: (1) speed of impact: In some of the major attacks, the

time of impact on an organization is unpredictable. Today’s attacks are not just targeting

one specific system or certain vulnerability; the attackers can maneuver and change their

targets once they have penetrated the network. These types of attacks occur incredibly

quickly and not many human interactions can counteract the velocity of impact. (2)

Operational complexity is another concern, given the proliferation of cloud computing

and the fact that those platforms and services are operationalized and delivered very

quickly in the millisecond range. This level of complexity overwhelms the human

interactions; therefore, these actions can only be performed by machines matching to

another machines’ prowess. (3) Skills gaps in the cybersecurity workforce remain an

ongoing challenge: There is a global shortage of cybersecurity experts. The level of

scarcity has pushed industry to automate processes at a faster pace (Banoth, et al., 2017).

Realizing the crucial impact of AI today, AI (and in particular, Machine Learning in

cybersecurity) became the focus for this research.

AI is the science that enables computers and machines to learn, judge, and predict

based its own logic (Virmani, et al., 2020). As technology becomes more sophisticated,

the demand for AI is growing because of its ability to solve complex problems within a

4

limited amount of time. AI adopts abilities to equip the technical expertise to a machine

to learn and deploy new theories, methods, and techniques that aim to simulate and

extend the human intelligence (Conner-Simons, 2016). There has been a big

breakthrough in the field of AI due to advances in big data and graphic processing units

(GPUs) which have helped AI to grow exponentially in the last two decades (Sarker,

2020). Organizations can now benefit from AI’s cognitive ability to quickly become a

subject-matter expert in a relatively short time through self-training. Through repeated

use, the system will provide increasingly accurate responses, eventually eclipsing the

accuracy of human expertise (Mittal, et al., 2019). As the intelligence of machines and

the use of digital sensor data improve, various fields of science can use AI to understand

a wide range of collective information (Hussain, et al., 2020). AI is now being applied in

a variety of business industries, with underlying technological subsets such as natural

language processing, robotics, and computer vision. Hence, in particular regard to

cybersecurity, Truve (2017) considers AI techniques are most useful in cybersecurity in

its classification and prediction ability of entities and events. Automated classification of

events will help analysts prioritize on what they should focus their attention. Instead of

spending significant amounts of time deciding what topics to focus on, cybersecurity

analysts can improve their forensic work with already categorized and sorted threats. In

addition, Truve believes cyber defenders today are almost always one step behind, trying

to defend or patch systems where attacks and threats already exist. With predictive

information, defenders might instead start being proactive and protect their systems

against future threats. Therefore, predictive threat intelligence is important with AI’s

capability to predict future events from historical and current data. Prediction generation

5

is an example of a task that is hard or even impossible for a human analyst to carry out,

due to the complexity and large volume of data needed. Algorithms and machines scales

from AI generate predictive models that can be used to forecast events to solve such

problems (Williams & McGregor, 2020).

In this paper, the first task is to determine which branch of AI best applies to

cybersecurity. The overall objective is to apply the most popular branch of AI—Machine

Learning—to classify cybersecurity intrusion events against a human cybersecurity

analyst. Next, AI is tested to predict future cybersecurity intrusion events with time-series

datasets to determine its effectiveness in comparison to a popular time-series data-

prediction model, the autoregressive integrated moving average (ARIMA) statistical

model.

Statement of the Problem

AI is adopted in a wide range of domains where it shows its superiority over

traditional rule-based algorithm and manual human knowledge analysis (Benavente-

Peces & Bartolini, 2019), although the complete automation of detection and analysis of

cybersecurity threats and predict future attacks is an enticing goal. Yet, the efficacy and

accuracy of AI in cyber security must be evaluated with due diligence based on real-life

data.

Purpose of the Study

With its cognitive data-processing capability of Machine Learning, AI is a great

complement to defensive cybersecurity systems which can better detect and defend

against modern cyberwarfare (Truve, 2017). The purpose of the study is to examine the

phenomenon of AI in cybersecurity, research its implications in the business world, and

6

determine whether the present stage of AI technology—in particular, Machine

Learning—can help improve cybersecurity.

Research Question

This research paper focuses on the basic question: What branch of AI is most

applicable to cybersecurity? From this main question emerges the following sub-

questions: How accurate is Machine Learning currently, and can it be beneficial to

cybersecurity? What is the accuracy rate for AI to classify intrusion events versus a

human cybersecurity analyst? When AI is used to predict future intrusion events, what is

the accuracy rate when compared to a time-series prediction statistical ARIMA model?

Rationale, Relevance, and Significance of the Study

In a computing context, the world of information technology has undergone

massive shifts in technology from recent years; the power of high-performance

computers and big data analytics have been the driving factor to these the changes

(Sarker, 2020). With the high trend in cyber-attacks on the frontier of many

organizations. Cybersecurity arguably is the discipline that could benefit most from the

introduction of AI (Calderon & Floridi, 2019). Hence, this research is significant and

relevant to the cybersecurity community. The research is a technical paper that uniquely

designed a small Machine Learning engine with threat-detection algorithms based on

collected data from a dedicated honeypot network environment. Additionally, the

Machine Learning engine is trained with a large amount of data and has an integration

with threat intelligence feeds where the machine self-learns then provides analysis and

predictive results from the data.

7

Nature of the Study

AI has become a hot topic and keyword in recent years; it is being adopted and

widely used in various fields of science (Parrend, et al., 2018). Since AI itself has many

subsets of technology, literature was reviewed on multiple historical AI-related study

cases to determine what AI branch was the most widely used and applicable to

cybersecurity. From there, AI’s ability to classify network threats was compared against a

human cybersecurity analyst, then AI’s ability to forecast future threats was compared

against a well-known ARIMA statistical formula. In order to accomplish this, a dedicated

honeypot network environment was set up to collect firewall logs and netflow data in

order to train and use real-life examples to test the Machine Learning engine hosted on

Microsoft’s Azure Artificial Intelligence Web Service.

Definition of Terms

Anomaly: An activity deviates from what is standard, normal, or expected from

the normal behaviors of systems, network traffic, and system resources (National

Initiative for Cybersecurity Careers and Studies [NICCS], 2018).

Big Data: Extremely large data sets or data points that may be analyzed

computationally to gain insights on patterns, trends, behaviors and interactions (NICCS,

2018).

Cloud: On-demand availability of computer system resources accessible over the

internet, especially networking or computing power, without direct or physical

management by the user (NICCS, 2018).

Cloud Services: Software or program services that are accessible over the

internet (NICCS, 2018).

8

Cyberattack: An act of assault by which an entity intended to evade security

services in order to damage or destroy a computer network or system (NICCS, 2018).

Intelligence Source: A reliable information source where cybersecurity defensive

systems can absorb information about the latest malware algorithms, attack patterns, etc.

(NICCS, 2018).

Intrusion: A security incident in which an entity attempts circumvent security

services in order to gain access to a system or system resource without having proper

authorization (NICCS, 2018).

Netflow: Data of network protocols, IP traffic information as packets enters or

exits the interface collected by a network device (NICCS, 2018).

Network Traffic: Data transmissions in the form of packets sent over the

network from a sender host to a recipient host (NICCS, 2018).

System Log: Data of informational, error, or warning events related to the

behaviors of a computer system and its resources (NICCS, 2018).

Threat: A potential entity that has the intention to cause adversely affect through

unauthorized access to disrupt, damage, or steal from a network, computer system, or

resources (NICCS, 2018).

Assumptions, Limitations, and Delimitations

The study of AI is complicated due to the lack of data available, firstly, as the

phenomenon is very new and, secondly, companies that utilize this technology are often

privately held and are not obliged to disclose the results to the public. Even if some

companies that employ AI are willing to publish their results, due to the small number of

those companies worldwide, the quantitate data on this topic would be limited (Patel,

9

2017). Existing public datasets that may be found on the Internet also face the problems

of uneven data and outdated contents from technology to data collection methods. In

addition, most of the data have already been aggregated. Hence, it is empirical to collect

sufficient and proper raw security data in order to build an AI engine (Bhuyan, et al.,

2014). Furthermore, the study’s budget constraint meant that it was not possible to

implement the best of breed AI system in the market.

For these reasons, this study is limited to investigation of the performance of

Azure Artificial Intelligence Web Service. It is important to note that the results of this

research may not be extrapolated to represent AI widely, as AI types differ from each

another. The research is also limited to the linear methods used, such as the ARIMA

statistical model compared to AI. Thus, the conclusion can only be made based on one

branch of AI since this study has a large concentration on Machine Learning in

comparison to its classification and prediction abilities. The Machine Learning model is a

subset within AI while this study rejects other models such as: natural language

processing, vision, robotics, pattern recognition, and convolutional neural networks,

which pose questions for further research on how other subsets of AI can be applicable to

cybersecurity. Lastly, this study is limited to the United States market, in particular to

cybersecurity, hence the results might differ if a similar study were conducted in a

different country or region.

Organization of the Remainder of the Study

The paper is structured with the following format: Chapter One serves as prelude

to introduce the readers to the contextual background of AI and cybersecurity along with

the theoretical framework for this study. Chapter Two takes the readers through a

10

comprehensive understanding of AI in various fields of science and highlights the AI

branch that seems to be most commonly used in previous literature. With the AI branch

identified in Chapter Two, the study uses Machine Learning, a subset of AI, as the focal

point on how it can be applied to cybersecurity. Chapter Three provides the readers with

a concrete guideline on the methodologies of data collection and analysis that are used to

conduct the study. Chapter Four depicts the results and detailed analysis from the study

based on the methodology as discussed in Chapter Three. Chapter Five summarizes the

study’s results and discusses implications for further research.

Chapter One Summary

Cybersecurity in recent years has been a fast-growing field demanding a great

deal of attention from individuals, business organizations, and national governments as

cyber criminals are on the rise to steal intellectual properties and sensitive data (Virmani,

et al., 2020). The remarkable progress in web technologies such as cloud computation

and big data analytics has also fostered the growth of AI. AI techniques have been

applied to many areas of science due to its distinctive properties of adaptability,

scalability, and potential to rapidly adjust to new and unknown challenges (Rajbanshi, et

al., 2017). It has been hypothesized that the Machine Learning engine from AI could be

deployed to cybersecurity to address such wide-ranging problems (Truve, 2017).

Therefore, this paper attempts to amalgamate AI and cybersecurity, discussing and

highlighting AI’s applicability in cybersecurity. The objective of this paper is twofold:

first, to identify which branch of AI is most pertinent to cybersecurity through an

extensive review of past research; secondly, to assess the current maturity of AI, in

particular to Machine Learning, for cyber detection schemes. To achieve the second

11

objective, AI’s ability to classify network threats was compared against a human

cybersecurity analyst; then AI’s ability to forecast future threats was juxtaposed against a

prominent ARIMA statistical model.

12

CHAPTER TWO. LITERATURE REVIEW

Introduction to the Literature Review

AI in today’s world is progressing rapidly with new advanced innovations day in

and day out. The development of AI is speeding up rapidly and has started to change the

many business landscapes (Mittal, et al., 2019). Companies from various industries of

healthcare, finance, to manufacturing are focused on applying AI with automation

processes to gain new heights of efficiency and quality (Benavente-Peces & Bartolini,

2019). According to Frank (2014), one crucial aspect that might have been overlooked is

cybersecurity which is an important factor to many businesses. Cybersecurity essentially

protects businesses’ sensitive and proprietary data from being breached. Thus,

cybersecurity can be enhanced with AI to make a superb combination (Khisimova, et al.,

2019). When used in conjunction with cyber security, AI can a powerful tool for

protecting against cybersecurity attacks. Furthermore, in our Internet Age, with hackers’

abilities to commit theft or cause harm remotely, shielding or masquerading their own

operations from those deceivers has become more difficult than ever (Al Qahtani, 2020).

With such ever-increasing threats and numbers of breaches staggering, many

organizations need help on their cybersecurity frontiers. AI may be the solution to help

organizations solve this problem and heighten organizations’ cybersecurity defense

postures (Demertzis & Iliadis, 2015).

As organizations are looking towards automation to reduce manual processes, AI

can help make cybersecurity more manageable, efficient and effective, yet ultimately

lower their cyber threat risk (Parrend, et al., 2018). Today, typical AI capabilities include:

speech, image and video recognition, autonomous objects, conversational agents,

13

prescriptive modeling, augmented creativity, smart automation, advanced simulation, as

well as complex analytics and predictions (Conner-Simons, 2016). One of the driving

forces to AI is Machine Learning which is the science of getting computers to act without

being explicitly programmed (Schuurmans, 1995). It takes algorithms inspired by the

structure and function similar to a human’s brain, in order to create an artificial neural

network (Becue, et al., 2021). Hence, AI in cybersecurity when combined can create

security systems with a set of capabilities that allow organizations to detect, predict and

respond to cyber threats in real-time (Collins, 2019).

The literature review first evaluates prior studies conducted on AI in the domain

of cybersecurity. Yet, to avoid a parochial or narrow scope the literature review also

surveys other studies of AI when used across the different business sectors of

manufacturing, product design, healthcare, customer communication, environmental

science, higher education, and finance. The literature review poses the initial question

whether Machine Learning is right the AI, then analyzes branches within AI—e.g.,

machine learning, natural language processing, vision, and robotics (see Figure 1)—to

find out which technological branch is mostly commonly used in the industries and can

be applicable to cybersecurity.

14

Figure 1

Branches of Artificial Intelligence

Note: Model re-designed from Kabbas, A., & Munshi, A. (2020), p. 120.

Review of Research Literature

AI in Cybersecurity

There have been previous academic works done by various researchers to

understand AI’s relationships to computer security. Ghosh, et al. (1998) proposed to train

an applied support vector machine model to detect anomalous Windows registry accesses

by using the Knowledge Discovery and Data Mining (KDD99) benchmark dataset to

evaluate the performance of their model. In other research, Kozik, et al. (2014) used the

Canadian Society of Immigration Consultants (CSIC) 2010 Hypertext Transfer Protocol

(HTTP) Dataset to assess the classification of internet traffic. Their study specifically

focuses on traffic using HTTP protocols to communicate clients with the servers. The

techniques described therein look for well-known port numbers of IP flows that are

statically abnormal but do not characterize the traffic itself. Bhuyan, et al. (2014)

introduce a new approach to create unbiased real-life network intrusion datasets in order

to compensate for the lack of available datasets. They create a significant amount of an

15

intrusion dataset in the development of a detection system, launching temporally unique

identifiers (TUIDS) distributed denial of service (DDoS) to test against an older DDoS

Center for Applied Internet Data Analysis (CAIDA) dataset. Bhuyan, et al. propose an

empirical study using the K-Nearest Neighbors model in order to handle important

security metrics such as detection of both low-rate and high-rate DDoS attacks. They

conduct several experiments using significant entropy measures to analyze DDoS attack

from normal traffic. This methodology is known as Feature Score, which consists of three

features on the network traffic which are the source IPs, variation of source IPs, and

packet rate flow. The experimental results show that the proposed model yield 65%

detection accuracy on the normalized CAIDA dataset. Yet, the paper primarily only

focuses on DDoS detection on wired networks, while leaving out wireless networks

where it is also another noteworthy DDoS vector. Together, the three research articles

above have investigated the impact of Machine Learning-based computer solutions (early

stages of AI) in their research, however those researches have not been very impactful. In

their conclusions and footnotes, the authors aspire for future research on Machine

Learning methods that can detect anomalous traffic, possible attack, and misuse by

analyzing the data on its own. In “A Survey of Data Mining and Machine Learning

Methods for Cyber Security Intrusion Detection,” Buczak and Guven (2016) survey

numerous articles that relate to AI in cybersecurity. Their results indicate that using AI

for cybersecurity purposes in the three main areas of intrusion detection, malware

analysis, and spam detection could be very useful. Altogether, a noble pursuit in the study

of AI in cybersecurity is largely encouraged.

16

AI in Smart Manufacturing

AI applications and robotics in smart production generate new industrial

paradigms. Cioffi, et al. (2019) provide a systematic literature review of research from

1999 to 2019 in the areas of AI and Machine Learning technique. The mixes bibliometric,

content analysis, and social network techniques are used. Through research and

classification process, the paper reviews, classifies, and analyzes 82 articles from the

Web of Science and SCOPUS database. Greater innovation, process optimization,

resources optimization and improved quality are the most significant benefits of using AI

that leverages Machine Learning and Robotics in the industrial sectors. The results also

emphasize emerging trends of AI with Machine Learning and Robotics in sustainable

manufacturing through the intelligent utilization of materials and energy consumption;

inventory and supply chain management; predictive maintenance; and production.

Moreover, AI with Machine Learning and Robotics also improves quality control

optimization in manufacturing systems. In addition, with the consumption of test and

manufacturing data from various systems, AI can help factories optimize their

manufacturing processes to be more efficient, systematic, and smart.

AI in Product Design

AI improves the ways companies innovate and develop new products in order to

speed up development and maintenance processes and consolidate companies’ support

functions according to Min (2015). Min lists several standards that address products’ life

cycle, management and quality control such as those from the International Organization

for Standardization (ISO) and the International Electrotechnical Commission (IEC) and

compares them to manufacturing data, such as that from the Institute for Supply

17

Management. The paper lists 30 processes that can benefit from AI compared to the

technical processes that data engineers are struggling with. The research indicates that AI

can assist engineers with Machine Learning and AI’s vision capability in creating more

accurate product specifications and at the same time helping them in the design-defining

phases. Example of an AI-assisted awareness learning cycle consists of engineers

developing product quality systems with some criteria of quality assurance in mind. An

anomaly detector analyzes usage of the system to find abnormal events and/or patterns

from various quality assurance tests. AI absorbs those data and computes a result table of

test flaws and product quality disintegrations where they occurred. Hence, AI helps

engineers validate which quality assurance test is most applicable to the product and

builds a risk matrix of mal-functionalities to products. In addition, AI assists engineers to

assess the flaw probabilities in their design tests to create more accurate design of

products based on their test outcomes.

AI in Healthcare

Data and information play an important role for decision-making and provision of

healthcare. A tremendous volume of data from patients, doctors, hospitals and healthcare

providers, medical insurance, medical equipment, and medical research could be

consumed and utilized for improving the delivery of healthcare. The objective of

Davenport and Kalakota’s (2019) research is to identify the evidence based on big data

analytics, machine learning and AI in healthcare. The authors use 2,421 articles between

2013 and 2019 to evaluate and analyze prior research on big data analytics and AI within

the healthcare sector to conduct a systematic literature mapping study. Research type

facet, contribution facet, and publication year are used to focus on previous studies’

18

research dimensions and topic-specific schemes. Different perspectives on research with

big data analytics and AI in healthcare are shown by five facets. A summary of existing

research in the field of big data analytics and AI in healthcare is also provided by this

systematic mapping and review paper. The study discusses barriers to rapid

implementation of AI in healthcare and the potentials in AI offering to automate aspects

of care. The development of AI in healthcare can improve diagnosis and treatment

recommendations as well as transform many aspects of patient care and administrative

process. The authors show that these AI tasks can be accomplished by neural networks

and deep learning, natural language processing, rule-based expert systems, physical

robots, and robotic process automation. The paper also uses Electric Health Record data

as supporting evidence for the applicability of AI to successful progress in diagnosis and

treatment; patient engagement; adherence applications, administrative applications; and

various implications for the healthcare workforce. The paper overall highlights the

important role of AI in healthcare. At the end, the research recommends that future

software incorporate healthcare systems, biosensors, watches, smartphones,

conversational interfaces and other instrumental data to interconnect with the patient’s

diagnosed data to identify effective treatment pathways. The recommendations then can

be used by healthcare providers, frontline staff such as nurses, call-center agents or care

delivery coordinators.

AI in Customer Communication

Online chatbot creates a new and more efficient support platform for customers.

It leverages AI’s natural language processing and speech recognition capabilities to

mimic human conversations as well as providing more realistic customer-support

19

experiences. Recent technological advances in AI allow chatbots to assist with

increasingly complicated and multiplex tasks. The objective of Pantano and Pizzi’s

(2020) journal is to provide a comprehensive understanding of actual progress in AI,

focusing on online chatbots. To provide a good overview of patent development, the

paper uses 668 patents including the words “chatbot” in the title and/or abstract from

1998 to 2018. By analyzing the investigation of occurrences with the extraction of topics

and phrases by the Cogito software, hierarchical cluster analysis and multidimensional

scaling showed that the adoption of new conversational agents based on natural language

has increased tremendously in recent years. Their findings highlight chatbot systems are

more characterized with different abilities through the incorporations of AI. The paper

emphasizes the strong connection of the digital assistants’ analytical skills and their

ability to automatically interact with the users. Lastly, the study draws inferences on

consumers from different data points to automate and improve chatbot abilities and

provides more customized solutions for chatbots by using consumers' knowledge.

AI in Higher Education

Self-exploration education or self-determined learning of heutagogical techniques

are examples of AI in higher education. These systems interact, assist, and guide students

by semi-automatic learning methods via natural language processing from AI. In order to

stay competitive and fulfill their stakeholder needs, higher-education providers in the

Malaysian school system are forced to adapt with technological innovations in education.

Fazil,, et al. (2019) aim to examine the relationship of AI technology and the educational

industry through the examination of self-determined learning platforms. The paper uses

multiple case studies designed to showcase the heutagogical theme for the research. The

20

research is both quantitative and qualitative which utilizes expert opinions of consultants,

educational suppliers, and educational providers towards the importance of Self-

Determined Learning platforms and leverages massive open online course (MOOC) data

within the educational industry of Malaysia to examine the level validity of AI in

education. The framework is applied to precisely determine how AI leverages Machine

Learning and natural language processing to attract and promote interests to self-

determined learning students. The results depict a positive outlook in support of AI-

enabled technology in education, affecting the value proposition towards promoting

education services, an increase of value-centric value propositions fostered by continuous

interactions between AI and students using self-exploration education.

AI in Environmental Science

Rainfall prediction, although largely observed in many prior studies, is extremely

challenging because of stochastic meteorological parameters such as temperature,

humidity, wind besides time and space. In order to provide a prompt estimating method,

Prakash, et al. (2020) attempt to introduce a newly-developed model (namely Adaptive

Network based Fuzzy Inference System optimized with Particle Swarm Optimization and

Machine Learning from an advanced AI model) to be among the most effective

methodologies in predicting daily rainfall. The study leverages 3,653 data samples

collected in Hoa Binh province, Vietnam from January, 2004 to December, 2013. In each

model, rainfall is used as an output parameter, while input parameters include: maximum

temperature, minimum temperature, wind speed, relative humidity and solar radiation.

The research highlights evidence that AI models are validated; it also proves by

correlation coefficient and mean absolute error, skill score, probability of detection,

21

critical success index, and false alarm ratio that there is a plausible range for forecasting

daily rainfall, even in the utilization of Monte Carlo approach. The results show Machine

Learning appears to be the best performer. The paper shows the helpfulness of AI-based

study to the existing literature of rainfall prediction. AI used in corporate decision-

making AI can help business decision-making to identify stakeholders’ wills and needs.

AI in Finance

Zavadskaya (2017) focuses on stock-market prediction to ask whether AI could

offer an investor more accurate forecasting results. The study uses two datasets: monthly

returns of S&P 500 index returns over the period 1968-2016, and daily S&P 500 returns

over the period of 2007-2017. Both datasets undergo a test for univariate and multivariate

with 12 explanatory forecasting variables. The test uses recurrent dynamic Machine

Learning techniques and compares performance with ARIMA and vector autoregression

(VAR) models, using both statistical measures of forecast accuracy (such as mean square

of predicted error [MSPE] and mean absolute predicted error [MAPE]), as well as

economic Success Ratio and Direction prediction measures. Further, given that AI may

produce different results during each iteration, the study also performs a sensitivity

analysis, checking for the robustness of the results given different network configuration,

such as training algorithms and number of lags. Even though, some networks outperform

certain linear models, the overall result is mixed. ARIMA models may seem to be the

best in minimizing forecast errors, while Machine Language often displays better

accuracy in sign or direction predictions. After the forecast-accuracy MSPE and MAPE

tests have been applied, Machine Learning seems to outperform the respective ARIMA

models in many parameters, but the difference of this outperformance was not significant.

22

At the end, all models produced in the study have significant results to 60-65% accuracy

of stock index direction predictions and changes in the S&P 500 returns. Figure 2 lists

the studies reviewed.

Figure 2

Methodological Literature Used

Authors Research Area AI Sub-Type Used

Davenport & Kalakota

(2019)

AI in Healthcare Machine Learning

Pantano & Pizzi (2020) AI in Customer

Communication

Machine Learning and

Natural Language

Processing

Cioffi, et al. (2019) AI in Smart

Manufacturing

Machine Learning and

Robotics

Hockey (2015) AI in Product Design Machine Learning and

Computer Visions

Fazil, et al. (2019) AI in Higher Education Machine Learning and

Natural Language

Processing

Prakash, et al. (2020). AI in Environmental

Science

Machine Learning (Subset:

Adaptive Fuzzy Inference

System)

Zavadskaya (2017). AI in Finance Machine Learning (Subset:

Artificial Neural Networks)

Chapter Two Summary

In summary, previous research shows that Machine Learning to be frequently

superior and widely adopted to the backend processing component to AI. Cioffi, et al.

(2019) have performed an in-depth literature review of 82 articles from the Web of

Science and SCOPUS database in the areas of AI; in the research the authors discover

that Machine Learning and Robotics can help improve quality controls in manufacturing

systems. With the utilization of test and manufacturing data from various systems, AI can

help factories optimize their manufacturing processes smarter and more efficient.

Hockey’s (2015) paper lists the top technical processes data engineers are struggling and

23

through the author’s quantitative evaluation reflects the areas where AI can help benefit

the most. The research indicates that AI could assist engineers with Machine Learning

and AI’s vision capability in creating more accurately products’ specifications as well as

help them during the architecture and design phases. Davenport and Kalakota (2019)

have their study on how AI could benefit the Healthcare industry. Their study involves

Electric Health Record evidence/data to claim about the applications of AI in particular to

Machine Learning have established tremendous progress and success in diagnosis and

treatment applications, patient engagement and adherence applications, and

administrative applications. Pantano and Pizzi (2020) emphasize online chatbots that

have created a new and more efficient communication alternative to support for

customers by using AI’s natural language processing and speech capabilities to mimic

human language/conversations. The objective of Pantano and Pizzi (2020) is to provide a

comprehensive understanding of the actual mechanism in AI that are behind online

chatbots. To provide a comprehensive study, the paper uses 668 patents including the

words “chatbot” in the title and/or abstract from 1998 to 2018. By analyzing the

investigation of occurrences, the extraction of topics and phrases by a software and the

hierarchical cluster analysis and multidimensional scaling, the research shows that the

adoption of new conversational agents are based on Machine Learning and Natural

Language Processing components behind AI. Rainfall prediction is extremely challenging

because of stochastic meteorological parameters such as temperature, humidity, wind

besides time and space. In order to provide a prompt estimating method, Prakash (2020)

attempts to introduce a newly-developed model, namely Adaptive Network based Fuzzy

Inference System, an advanced model of the Machine Learning branch in AI. The study

24

shows the model is most effective in predicting daily rainfall based on the computation of

3,653 data samples collected. The paper contributes to the existing literature of rainfall

prediction the helpfulness of AI-based study. Zavadskaya (2017) focuses on a stock

market prediction, whether AI could offer an investor more accurate forecasting result.

The study uses two datasets: monthly returns of the Standard and Poor (S&P) 500 index

returns over the period 1968-2016, and daily S&P 500 returns over the period of 2007-

2017. Both datasets undergo a test for univariate and multivariate with 12 explanatory

forecasting variables. The test uses dynamic Machine Learning in comparison to the

performance of ARIMA and MOOC models. The study shows Machine Learning

outperforms the respective VAR models in many parameters. Moreover, all models show

significant results to the significant 60-65% accuracy rate of stock index direction

predictions and changes in S&P 500 returns. The study proves AI could drastically

influence business decision-making in the financial sector.

All in all, previous literature suggests the majority of AI’s computation capability

must rely on Machine Learning as an algorithm that can become smarter through the

absorbance of data. Machine Language can learn on its own to produce more accurate

predictive results. The use of AI in customer communication, smart manufacturing,

product design, finance, healthcare, higher education, and environmental science was also

studied and suggests the value of choosing Machine Learning, an ingrained branch of AI

in many industries (Becue, et al., 2021), to find its applicability to cybersecurity.

25

CHAPTER THREE METHODOLOGY

Introduction to Methodology

In this research, the main goal is to show the accuracy of machine learning in

classifying cybersecurity intrusions and predicting feature attacks based on a time-series

dataset. For this, the research compares the effectiveness of intrusion classification

techniques of Machine Learning to a human cybersecurity analyst, and the intrusion

event prediction ability of Machine Learning as compared to a popular ARIMA statistical

model. Note that in the remaining chapters, the terms AI and Machine Learning will be

used interchangeably as Machine Learning is a subset or branch of AI, whereas the

research purely relies on Machine Learning for the technical computation and analysis.

Machine-learning-based approaches rely on identifying anomalies that can

identify false positive results. So-called analytical solutions are based on rules created by

Machine Learning design experts to detect the outlying events that do not match to the

established rules (Schuurmans, 1995). Machine Learning is a branch of AI that is closely

related to (and often overlaps with) computational statistics that focus on analysis-making

through the use of computers. It has strong ties to mathematical optimization, which

delivers methods, theory and application domains to the field of research (Butler &

Kazakov, 2010). Machine learning leverages the use of data mining and exploratory data

analysis, and its techniques have been applied in many areas of science due to

adaptability, scalability, and potential to rapidly adjust to new and unknown challenges

(Palmer, 2017). Machine learning techniques offer potential solutions that can be

employed for resolving such challenging and complex situations due to their ability to

adapt quickly to new and unknown circumstances (Kabbas & Munshi, 2020). Machine

26

Learning can also be unsupervised and used to learn and establish baseline behavioral

profiles for various entities and then used to find meaningful anomalies. The pioneer of

Machine Learning, Arthur Samuel, defined it as a ‘‘field of study that gives computers

the ability to learn without being explicitly programmed’’ (Dasgupta, et al., 2020, p. 8). It

primarily focuses on classification and regression based on known features previously

learned from training data. It also mimics the human brain’s function to interpret data

inflow and learn from it. Its motivation lies in the establishment of a neural network that

simulates the human brain for analytical learning (Bhatele, et al., 2019).

In cybersecurity, security breaches include external intrusions and internal

intrusions. There are three main types of network analysis for threat detection: misuse-

based (also known as signature-based), anomaly-based, and hybrid. Misuse-based

detection techniques aim to detect known attacks by using the signatures of these attacks.

They are used for known types of attacks without generating a large number of false

alarms. However, administrators often must manually update the database rules and

signatures. Anomaly-based techniques study the normal network and system behavior

and identify anomalies as deviations from normal behavior. New (zero-day) attacks

cannot be detected based on signature or misused algorithms as they are the hybrid of the

two. They are appealing because of their capacity to go undetected in their early stages

(Goyal & Sharma, 2019). The data on which anomaly-based techniques alert (novel

attacks) can be used to define the signatures for misuse detectors. The main disadvantage

of anomaly-based techniques is the potential for high false alarm rates because previously

unseen system behaviors can be categorized as anomalies (Cylance, 2020). Hybrid

detection combines misuse and anomaly detection. It is used to increase the detection rate

27

of known intrusions and to reduce the false positive rate of unknown attacks. Hence, with

Machine Learning’s capability to utilize both potentials of detecting threat events by both

signature-based and anomaly-based techniques, it can discover a wide range of issues,

such as: malware attack, ransomware, denial of service (DoS), phishing or social

engineering, Structured Query Language (SQL) injection attack, man-in-the-middle,

vulnerability discovery, deception, or insider threats (Devakunchari & Sourabh, 2019).

Purpose of Study

This study aims to investigate the approach of AI in cybersecurity, research its

effectiveness to determine whether the present stage of AI technology and in particular to

ask how Machine Learning can help improve cybersecurity.

Research Questions

The research paper starts at the highest level with the general question of: What

branch of AI is most applicable to cybersecurity? From the main question, it asks the sub-

questions: How accurate is Machine Learning at our present time to be beneficial to

cybersecurity? What is the accuracy rate for AI to classify intrusion events versus a

human cybersecurity analyst? When AI is used to predict future intrusion events, what is

the accuracy rate when compared to a time-series prediction statistical ARIMA model?

Research Design

The research utilizes a honeypot hosted on Microsoft Azure Cloud (Figure 3); it

has a weak IP address of 192.168.1.1 which is widely known as the default IP address

out-of-the-box for networking devices. The public IP is provided by MS Azure to be

static and resolves back to a purchased IP domain of Goooogle.com purchased

specifically for this project. “Goooogle.com” was chosen as it is attractive to global

28

internet citizens to visit the site with the most common typo error of typing the extra “o”

to the popular search engine site of a “Google.com.” The site also gets heavy net traffic

from botnets and cybersecurity hackers due to the workstation and server being

connected within the network where those two hosts actively visit malicious websites to

download malware contents and click on spiteful phishing emails (see Figure 2).

Figure 3

Honeypot Network Architecture Diagram

The site was live in production for 274 calendar days (9 months). Logs and

netflow data captured from the devices within the honeypot are unambiguous

representation of instants in time; they contain timestamps that uniquely bind to each

event and are considered to be time-oriented data. The dataset contains a combination of

110,516 total traffic flow with 42,871 benign traffic flows and 67,645 malicious traffic

flows that are reflected in 10 categories of cybersecurity attacks. Raw security data are

used to analyze the various patterns of security incidents or malicious behavior, to build a

29

data-driven security model and achieve the study’s objective. The raw security data

(netflows and logs) are captured at the firewall-level in the instance hosted on Azure,

then it gets loaded into Corelight that converts unstructured data (netflows and logs) to

structured data for Elastic to interpret the data and sort the data to their respective schema

and data structure. The data is then analyzed and loaded into Azure SQL Database where

the Azure Artificial Intelligence Web Service picks up the data and develops a

classification model for differentiating the relationship between the various intrusion

event types; (refer to Figure 4 for visualization).

Figure 4

Data Collection and Analysis Workflow Diagram

Table 1 shows the result attacks classified in 10 attributes with training samples

and testing samples used. From the partitioned datasets 80% is used to train the Machine

Learning algorithm within Azure Artificial Intelligence Web Service and the remaining

20% is used for testing as depicted in Table 1. Existing datasets were not leveraged since

they are subjected to defects of old data as the latest public AI testing data on

cybersecurity hasn’t been released since 2012. These data are susceptible to outdated and

unbalanced information due legacy technologies and attack vectors have changed in our

30

present days. Legacy data is prone to the aggregation measures of information technology

components such as mainframe computers, software or programs, and communication

equipment that are clustered together. Also, there is a problem of insufficient data volume

and raw data integrity to build the AI (Machine Learning) engine. Therefore, establishing

network intrusion detection datasets with large amounts of data, wide-type coverage and

balanced sample numbers of attack categories for analysis of intrusion detection was a

top priority in this research.

Data Collection and Data Analysis Procedures

To expand on Figure 4, the processes within the Data Collection and Analysis

Workflow are described in Figure 5 in the order below: Step 1) From the Azure Firewall,

the management console’s netflow and log data are being forwarded to a centralized log

management console which is Corelight acting as a syslog server. Syslogs are sent via

dedicated port 5985 and netflows on port 2055; both with SSL encryptions.

Figure 5

Configuration of Log and Netflow Forwarding

Note: Screenshot taken from Azure Firewall console

31

Figure 6 displays a raw log (unstructured data format) that requires Corelight to

parse the data.

Figure 6

Example of a Firewall Log

Note: Screenshot taken from Azure Log Analytics console

Figure 7 provides an example of raw netflow data.

Figure 7

Example of Raw Netflow Data

Note: Screenshot taken from Wireshark console

Step 2) Corelight parses the given data, identifying the key words based of

various events within the netflow and log data. It also converts the unstructured data into

structured data formats labeling them with the correct header information for each data

column (Figure 8), which then gets loaded onto Elastic for further analysis.

32

Figure 8

Display of Log Highlights when Unstructured in Corelight

Note: Screenshot taken from Corelight console

Step 3) Once the structured data are loaded onto Elastic, the system can read the

columns with their respective header information based on: time, TCP/UDP description,

error code, event type, hostname, etc. (Figure 9). The data then are mapped to an open-

source library that can correlate these data to specific malicious events by Cisco Talos

Threat Intelligence Rules.

33

Figure 9

Display of Log Parsing when Structured in Elastic

Note: Screenshot taken from Elastic console

Target Population, Sampling Method, and Related Procedures

The dataset contains roughly 110,516 total traffic flow with 42,871 benign traffic

flows and 67,645 malicious traffic flows that reflect 10 categories of cybersecurity

attacks. As mentioned above, 80% of the dataset is used to train the Machine Learning

engine. Once the data are loaded into the Elastic software, those 80% of the dataset is

mapped to an open-source threat-intelligence library that can correlate the data points to

specific threat events based on Cisco Talos Threat Intelligence Rules. In this process,

Elastic adds another column to tag that event to a specific threat labeled by Talos. Once

the dataset is ready, it is fed to the Machine Learning engine hosted on Azure Artificial

Intelligence Web Service which inspects all the data. This process took around 16 hours

for the engine to sort through and analyze all the data to build Machine Learning

34

algorithms for detecting cybersecurity threats. The remaining 20% of the data are

unknown to the Machine Learning engine which will be used for the testing purpose of

this study. The dataset and their respective threat categories are depicted in Table 1.

Table 1

Ten Intrusion Categories with Depiction of Training and Testing Samples

Category

Training Samples

N

Testing Samples

N

Total

N Benign Traffic 34,296 8,575 42,871

Cross-Site Scripting 7,014 1,754 8,768

SQL Injection 6,075 1,519 7,594

Email Spam 9,254 2,313 11,567

Password Brute-Force 4,434 1,109 5,543

Port Scanning 10,288 2,572 12,860

Registry Takeover 2,968 743 3,711

Denial-of-Service (Dos) 6,138 1,535 7,673

Shellcode Execution 1,671 418 2,089

Malware Exploit 6,992 1,748 8,740

Note: Table 1 shows result attacks classified in 10 attributes with training samples and testing samples used. From the partitioned datasets 80% is used to train the Machine Learning algorithm within Azure Artificial Intelligence Web Service and the remaining 20% is used for testing.

Instrumentation

Parrend, et al. (2018) quote Bernard Marr’s definitions for AI as: “Artificial

Intelligence is the broader concept of machines being able to carry out tasks in a way that

we would consider ‘smart’’ (p. 85) and, “Machine Learning is a current application of AI

based around the idea that we should really just be able to give machines access to data

35

and let them learn for themselves” (p. 85). Therefore, AI specific to Machine Learning

engine in this project is a self-learning model that can learn and solve problems,

especially in environments where algorithms or rules need to evolve in order to solve

dynamic problems. Machine Learning can successfully achieve this by learning and

classifying from past network activities to predict future attacks that are actually

transpiring (Banoth, et al., 2017). As mentioned, patterns that describe normal and

abnormal network activities are traditionally defined manually by security professionals

based on their expert knowledge while Machine Learning can be trained to identify such

patterns automatically. AI improves its knowledge to understand cybersecurity threats by

consuming a large number of data artifacts (Mittal, et al., 2019). Therefore, in order to

build the AI Machine Learning engine, data needs to be fed to the Machine Learning

model hosted on Azure Artificial Intelligence Web Service where it can process, analyze,

and match events to rules in order to build its intrusion detection and prediction

algorithms.

The Machine Learning model for this project has two main principles: (a) to

identify signature-based detection approach, which identifies malicious activities by pre-

defined patterns of abnormal network and/or system behaviors; and (b) use of a system-

anomaly-detection approach, which is based on evaluating abnormal patterns from the

normal network and/or system behaviors. Machine Learning needs to be fed a large

knowledge base, which stores expert knowledge, and an inference engine, which is used

for reasoning about predefined knowledge as well as finding answers to given problems.

Depending on the form of reasoning, Machine Learning will apply to different problem

classes (Hossein, et al., 2020). A case-based reasoning approach allows solving problems

36

by recalling previous similar cases, assuming the solution of a past case can be adapted

and applied to a new problem case. Subsequently, newly proposed solutions are evaluated

and, if necessary, revised, thus leading to continual improvements of accuracy and ability

to learn new problems over time. In addition, rule-based reasoning solves problems using

rules defined by experts. Rules consist of two parts: a condition and an action. Problems

are analyzed stepwise: first, the condition is evaluated and then the action that should be

taken next is determined. It is crucial to recall that expert systems so far solely assist

decision makers (Al Qahtani, 2020). Ultimately, Machine Learning can define both

patterns, mainly based on their experiences plus their prior knowledge of cyber threats.

The cybersecurity analyst who volunteered in the study to compare AI’s intrusion

event classification ability to a human cybersecurity analyst is a 5-year experienced

analyst. The analyst has a Bachelor’s degree from a university in California; the degree is

in Computer Science with expertise in software development and networking. He also

possesses two cybersecurity certifications of the Computing Technology Industry

Association Security+ and Certified Information Systems Security Professional.

Previously, the analyst has worked for two U.S. based Fortune 500 companies as part of

their Security Operations Center team with his primary focus on detecting a wide range

of malware attacks, denial of service, phishing emails, web application attacks, man-in-

the-middle, vulnerability discovery, masquerade or deception insider threats on a daily

job basis.

An ARIMA model was used in this study to compare its ability to forecast future

trends to AI. ARIMA is a statistical analysis model that uses time series data to forecast

or predict future outcomes based on a historical time series. The model explains a given

37

time series based on its own past values, that is, of its own lags and the lagged forecast

errors, so that equation can be used to forecast future values (Prakash, et al., 2020). It is a

popular and widely used statistical method for time series prediction. For example, an

ARIMA model might seek to predict a stock's future prices based on its past performance

or forecast a company's earnings based on past periods and are widely used in technical

analysis to forecast future security prices. It is based on the statistical concept of serial

correlation, where past data points influence future data points (Zavadskaya, 2017).

ARIMA forecasting is achieved by plugging in time series data for the variable of

interest. Here, time series data were collected in Python statistical and coding software

that had an ARIMA computation model pre-configured with integration from PANDAS,

an open-source Python Data Analysis Library. The software then identifies the

appropriate number of lags or amount of differencing that can be applied to the data, and

then outputs a computed data table with multiple linear regression values.

Limitations of the Research Design

Data constitutes the essential foundation of cybersecurity network research.

Hence, due to the lack of disaggregated and up-to-date data, it was necessary to collect

fresh network data from a honeypot, parse the data, and structure the data in a format for

Machine Learning engine to learn and understand from in order to build its threat

detection algorithm. A well-known limitation of threat identification approach to build a

Machine Learning engine is subjected to static detection rules is the need for frequent and

continuous updates (e.g., daily updates of malware definitions). In this case, the research

is only able to retrieve Cisco Talos Threat Intelligence Rules at the time when the

research is conducted in order to map out the threat categories of 80% of the data used for

38

training of the Machine Learning engine. Furthermore, the study design is limited to the

use of low-end technology such as Corelight, Elastic, Python, STATA statistical

software, and the Azure Artificial Intelligence Web Server; thus, it was not possible to

implement the best of breed AI system. It is important to note that the results of this

research could have been altered if more sophisticated technology were introduced.

Data Validity Test

Supervised learning algorithm analyzes existing training data with labeled results

to map to new entries. Unsupervised learning is a machine-learning algorithm that

deduces the description of hidden structures from unlabeled data (Soni & Bhushan,

2019). This study leveraged the technology of semi-supervised learning, which is a

method of combining supervised learning with unsupervised learning. Semi-supervised

learning uses a large amount of labelled data on unlabeled data for pattern recognition.

Using semi-supervised learning can reduce label efforts while achieving high accuracy.

However, as suggested by Selden (2016), the quality of each classifier must be measured

for accuracy through common performance assessment metrics, namely Precision, Recall,

F-score, as computed below:

Formula 1. Precision, Recall, and F1-Score Calculation Model

�������� � ��

�� �� ���������������������������� �

��

�� �� ��������������������� � �����

� � �� ������� � ������

������ ������ �

TP denotes true positives, FP false positives, and FN false negatives. For

coherency, the research considers a true positive to be a correct detection of a malicious

sample while a false negative is the erroneous detection of a malicious sample. Precision

39

indicates how much a given approach is likely to provide a correct result. Recall is used

to measure the detection rate. The F-score combines Precision and Recall into a single

value to capture the true effectiveness of a classifier. Finally, to reduce the possibility of

biased results, each evaluation metric is computed after performing 10-fold cross

validation. The higher and better the precision and recall scores are, the better, but in fact

these two are in some cases contradictory and can only be emphatically balanced;

therefore, the F-score is needed to be the harmonic average of precision and recall, in

respect to their results. The intuition for F-score interpretation is that it measures both the

balanced of good precision and good recall together to result in a good F-score measure.

Thus, in general, the higher the F-score, the better the model will perform. Selden (2016)

indicates that according to the International Statistical Institute, an F1 score is considered

perfect when it's 1, while the model is a total failure when it's 0. Table 2 shows a

protectory F-Score of 0.89 for the average of the 10 classifiers (rounded to the nearest

hundredth) depicting the confidence level in the semi-supervised learning algorithm to

have an 89% learning accuracy rate which is a positive-looking score.

40

Table 2

Results of Precision, Recall, and F1-Score for Classifiers

Classifier F-Score Precision Recall

Benign Traffic 0.83 0.87 0.80

Cross-Site Scripting 0.93 0.89 0.97

SQL Injection 0.89 0.92 0.87

Email Spam 0.93 0.93 0.94

Password Brute-Force 0.88 0.91 0.85

Port Scanning 0.89 0.88 0.90

Registry Takeover 0.83 0.86 0.81

Denial-of-Service (Dos) 0.89 0.85 0.94

Shellcode Execution 0.86 0.85 0.88

Malware Exploit 0.92 0.90 0.96

In the field data analysis, time series data are very special data due to their nature

of being proportion and random. Stationarity is a desired property in the field of time

series analysis as it has a large influence on how the data are perceived and predicted

when processed through a statistical model. There are two main factors that cause time

series data to become non-stationary: (a) when the data are subjected to a trend which is

an extended absence of events within the timeline of a dataset; (b) seasonality, which is a

long-term recurring pattern at a fixed and known frequency whether a time of the year,

week, or day (Yavanoglu & Aydos, 2017). Thus, before any further work is done to build

out predictive models, it is necessary to determine whether the data collected are

stationary. An Augmented Dickey Fuller (ADF) test is a great way and most widely used

technique to confirm if the series is stationary or not. In addition, another Kwiatkowski–

Phillips–Schmidt–Shint (KPSS) test can confirm the validity of the results. The KPSS

41

test is similar to ADF; it can validate the null hypothesis that an observable time series is

stationary around a deterministic trend. Both ADF and KPSS tests are done through

Python using the functions of adf.test and kpss.test from the statsmodels library. Table 3

depicts p values of ADF and KPSS. To interpret the obtained p-value scores, Ogunc and

Hill (2008) suggest the ADF p-value show a negative number, where the more negative it

is, the stronger the rejection of the hypothesis that there is a unit root present. The KPSS

needs to show a positive number greater than the confidence level of 5%, at this level the

null hypothesis can be rejected. Both tests performed are within the tolerated interval

ranges, both admit the stationarity of data or reject the hypotheses of non-stationarity.

Table 3

Results of P-Values Stationary Test

Test Type p values

Augmented Dickey Fuller (ADF) test

-0.08

Kwiatkowski–Phillips–Schmidt–Shint (KPSS)

test

0.10

Expected Findings

Based on the data collected and the rigorous tests performed on the time series

data to ensure the data were stationary, it was assumed that these stationary data could be

used in time-series forecasting models both through AI or ARIMA. It was expected that

80% of the data could be mapped to the threat-intelligence source (Cisco Talos) that

could identify and inject a label to the data with a relevant threat category which then

could be fed to train the Machine Learning engine to build its threat detection algorithm.

Since the remaining 20% of the data are unlabeled and unknown to the Machine Learning

42

engine, the data can used to be test the threat classification ability of AI compared to a

human cybersecurity analyst.

Ethical Issues

The protection of human subjects through the application of appropriate ethical

principles is imperative in all research. Thus, this concern can be eliminated as the study

was not done on human beings but rather on machine data. The research purely used

machine produced data and the data did not constitute to the study of human beings nor

unethical practice to harm human beings, animals, plants, or the natural environment.

Conflict of Interest Assessment

Awareness of potential conflicts of interest is very important in research to

maintain the integrity of an unbiased professional view in a research publication. There

were no conflicts of interest regarding the publication of this paper, nor any affiliations

with or involvement in any organization or entity with any financial interest (such as

honoraria; educational grants; membership, stock ownership, or other equity interests;

and expert testimony or patent-licensing arrangements), or non-financial interest (such as

personal or professional relationships, affiliations, marriages or beliefs) in all materials or

technologies discussed in this manuscript.

Chapter Three Summary

Chapter Three described how the data were collected and analyzed. The research

project utilizes a honeypot hosted on Microsoft Azure Cloud that is set up to have a weak

IP address and revolves back to a simulated domain similar to “Google.com.” The site

allures botnets and cybersecurity hackers as the workstations connected within the

honeypot network actively visit malicious websites to download malware contents then

43

also click on malicious phishing emails. The site hosted live in production for a duration

of 274 calendar days (9 months). Logs and netflow data from the devices within the

honeypot are captured for data analysis. In their raw forms, the logs and netflows are still

unstructured with various critical values that need to be parsed by a solution called

Corelight. Once parsed and re-formatted to structured data with the proper data points

aligned to their corresponding columns, the data are then fed to Elastic where 80% of the

data is mapped to a threat-intelligence source (Cisco Talos). Cisco Talos interprets the

data to identify and label the data to their respective threat categories. These data can then

be consumed by the Machine Learning engine in Azure Artificial Intelligence Web

Service to analyze and build out the threat detection algorithms. From there, a Machine

Learning engine was used to compare its classification of cybersecurity intrusion’s ability

to a human cybersecurity analyst with the remaining 20% of the data where those data are

unlabeled and unknown to the Machine Learning engine. Afterwards, the Machine

Learning engine also faces another test designed to compare its threat-prediction ability

to a popular ARIMA statistical model.

44

CHAPTER FOUR DATA ANALYSIS AND RESULTS Introduction to Data Analysis and Results

This research paper has two goals: first, to find out the accuracy of Machine

Learning in classifying cybersecurity intrusions when compared to a cybersecurity

analyst, and secondly to compare its predictive ability of future attacks based on a time-

series dataset when tested against a popular ARIMA statistical model. In this chapter, the

two tasks are split into two sections that explain the data analysis and methods, and can

help in understanding the context more easily.

AI vs. Human Analysis in Classification of Threat Events

As depicted in early chapters, this project’s first task is to compare Machine

Learning’s ability to classify cybersecurity intrusions when compared to a human

cybersecurity analyst. As mentioned, a total of 110,516 traffic logs and netflow data were

captured from the honeypot in the data-collection phase. From the data 80% is used to

train and build out the threat detection algorithm for the Machine Learning engine, while

the remaining 20% of the data are unknown to the Machine Learning engine and are used

for testing purposes. Hence, the 20% remaining of data (which is equivalent to 22,286

intrusion events) was selected for testing the Cybersecurity Analyst’s (denoted as

CAnalyst) intrusion classification skills versus the Machine Learning engine. Figure 7

displays the outcome of 200 intrusion classification of AI and CAnalyst results selected

at random; yellow colors show intrusion events detected by AI while blue colors show

intrusion events detected by CAnalyst. The x-axis is the time it takes for the two entities

to detect the threat while the y-axis displays the threat severity-level. Figure 10 below is

graphed by STATA 14 to give the reader a visualization of the time duration of how long

45

it takes for the CAnalyst to detect the threat events compared to AI. The graph would get

saturated and fuzzy if 22,286 intrusion event results are graphed; therefore, 200 events

were chosen at random to represent on the graph. The y-axis of the regression graph is

unable to accept too many string variables, so the 10 categories of intrusion types were

converted into numeric severity levels ranked from 1-10.

Figure 10

AI vs. Cybersecurity Analyst Intrusion Detection Regression Graph

Note: Yellow colors show intrusion events detected by AI while blue colors show intrusion events detected by CAnalyst. The x-axis shows the time it takes for the two entities to detect the threat while the y-axis displays the threat severity-level. On the y axis, the 10 categories of intrusion types were converted into

numeric severity levels ranked from 1-10.

In order to get unbiased results, Formula 2 was introduced where the variables

underwent an Ordinary Least Squares test to regress them against their respective

intrusion detection time. The higher coefficient number relates to the time AI and

CAnalyst were able to detect the intrusion, thereby the coefficient (delineating the

intrusion category) will be ranked higher in number.

46

Formula 2. Ordinary Least Squares Regression for Severity Score

����������� �� � ! "�#������$�%

"&'� � � (

"�#������$�%

")& ������

Formula 2 shows the Ordinary Least Squares formula used to calculate the regression

model where ����������� �� equals the changes in alpha �#������$�% over the

amount of Time for AI to classify the threat plus beta of �#������$�% over the amount

of time for CAnalyst to classify the threat. Table 4 displays results from STATA14.

Table 4

Ordinary Least Squares Regression AI vs. Cybersecurity Analyst Results

Variables (N = 10) AI

b* (Robust SE) CAnalyst

b* (Robust SE) Time 0.312 0.486

(0.430) (0.569)

BenignTraffic 0.001 0.002 (0.070) (0.069)

CrossSiteScripting 0.037 0.029 (0.0167) (0.0946)

SQLInjection 0.043 0.036 (0.0375) (0.122) EmailSpam 0.086 0.091 (122.6) (0.0765)

PWBrute-Force 0.056 0.047 (0.0701) (0.069) PortScanning 0.073 0.069 (0.0167) (0.0946)

RegistryTakeover 0.103 0.087 (0.0375) (0.122)

Denial-of-Service 0.097 0.086 (122.6) (0.0765) ShellcodeExecution 0.052 0.045 (0.0946) (0.0788) MalwareExploit 0.122 0.116 (0.205) (0.0913) Constant 7.937 6.562

(5,274) (3,731)

p-value .01*** .01*** Observations 2,228 2,228

F-statistic 11.81 12.46 R-squared .61 .47

Note: Robust standard errors in parentheses *** p < .01, ** p < .05, * p < .10

47

The values in Table 5 are the data output extracted from Azure Artificial

Intelligence Web Service after the AI has classified the intrusion events based on their

categories. The time the CAnalyst has taken to analyze and classify the intrusion events

are manually recorded. Both the average of how long AI and CAnalyst take to classify

the intrusion event are captured and rounded to the nearest hundredth decimal point.

Once visualized, we can see that the time it takes for AI to detect and classify the

intrusion events to their confined categories are much faster than for a CAnalyst’s

detection ability.

Table 5

Number of Intrusion Events Detected and Average Time of AI vs. Cybersecurity Analyst

Category AI Detected Intrusions

N

Average Time for Detection

Analyst Detected Intrusions

N

Average Time for Detection

Benign Traffic 858 0.05 824 1.80

Cross-Site

Scripting

176 7.20 165 10.50

SQL Injection 152 5.10 140 15.30

Email Phishing 231 3.80 230 10.10

Brute-Force Attack

110 8.60 110 45.20

Port Scanning 257 10.20 240 30.40

Registry Takeover 74 27.40 65 56.30

Denial-of-Service

(Dos)

153 5.70 153 37.50

Shellcode

Execution

42 18.30 42 32.70

Malware Exploit 175 10.90 167 15.20

Detailed Analysis (AI vs. Human Analysis in Classification of Threat Events)

Although many cybersecurity analysts possess great depth of cybersecurity

knowledge, in this case the cybersecurity analyst had 5 years’ experience of threat

48

detection and response for a Tier 2 Security Operations Center, and might at times be

slower in threat detection compared to AI. AI can parse through, analyze, and correlate

log events in the thousands faster and more efficiently. Thus, when it comes to reducing

errors in operational tasks and finding anomalies in the case of threat classification, the

result supports the hypothesis that AI is ahead of human ability and competence. AI is

instrumental in establishing baselines and can more quickly detect anomalies and outlier

events for a wider range than humans can (Virmani, et al., 2020). As a cybersecurity

solution AI can help protect organizations from Internet threats, identify type of malware,

ensure practical security standards, and help create better prevention and recovery

strategies as it can correlate relationships between threats like malicious network traffic,

suspicious IP addresses, or files in seconds or minutes (Apruzzese, et al., 2018).

AI vs. ARIMA Statistical Computation in Prediction of Threat Events

The second goal of this paper is to compare AI’s threat forecasting ability to a

popular statistical model of ARIMA based on a time-series dataset. To give a

visualization of the result, Figure 8 displays 10% of the 22,286 intrusion events selected

for testing AI’s forecasting ability against the ARIMA model which are 2,228 dots

displayed on the chart below. The red dots are the intrusion events predicted by AI while

the blue dots show the intrusion events predicted by the ARIMA model. The predictive

data values from AI and ARIMA are entered into STATA 14 and run with the regression

commands of XTREG and REG with the cluster options. The regressions are performed

twice to display two regression lines below for accuracy (Figure 11). When results

display concave and convex humps, one can ask which threat events may be the

underlining causes. Sokol and Gajdos (2018) propose that it does not make sense for one

49

particular threat category to cause a significant impact in the trend of a single month;

therefore, the 10 categories are needed to undergo a Spearman’s Rank Correlation test;

their intra-relationships helped to consolidate data to the categories of: Network Detected,

Endpoint Detected, or Email Detected threat agents.

Figure 11

AI vs. Cybersecurity Analyst Intrusion Prediction Regression Graph

Note: Figure 8 displays 10% of the 22,286 intrusion events selected for testing AI’s forecasting ability against the ARIMA model. The red dots are the intrusion events predicted by AI while the blue dots show

the intrusion events predicted by the ARIMA model.

A Spearman Rank’s Correlation test was performed to see the correlation among

intrusion types, which were then consolidated into three categories: Endpoint Detection

(of all the intrusion threats detected at the endpoint level), Email Detection (all the

intrusion threats detected as spam emails), and Network Detection (the intrusion threats

detected at the network layer). It is important to check for correlations between variables

before running the models in order to identify important explanatory variables and check

for possible multicollinearity. Hence, based on the data we see that the data are within a

good percentage of their confidence intervals.

50

Formula 3: Spearman Rank’s Correlation Estimation Model

* � % � �!

&�+,- �#������$�%

.��/�

��������% (

&�+,- �#������$��

.��/�

���������

The formula in Model 1 shown above is the Spearman’s Rank Correlation

Estimation formula used to calculate the correlation ratios of the variables within the

model. In the model, * represents the ratio when 1 minus alpha average of �#������$�%

over the .���/��(mull) which is the mean Time of AI and Time of the ARIMA models

are able to forecast the threat events; then plus the beta �#������$�� with similar

computation as��#������$�%. From there the values are input to STATA14’ Table 6

shows the * ratios for the correlations between the two variables.

Table 6

Spearman’s Rank Correlation Estimation Results

0��12 BT CSS SI EP BFA PS RT DoS SE ME

Benign Traffic .08 .02 .01 -.07 .01 .02 .01 .02 -.01 .03

Cross-Site Scripting -.01 .03 .08 -.02 .02 .01 .03 .01 .02 .02

SQL Injection .04 .07 .02 .02 .03 -.05 .02 .03 .04 .01

Email Phishing .02 .01 -.02 .01 .02 -.03 .01 .02 -.03 .08

Brute-Force Attack .03 -.04 .01 .03 .01 .01 -.02 .07 .01 .03

Port Scanning .01 .02 .02 -.01 .07 .02 .04 .01 .04 .03

Registry Takeover -.02 .02 .01 .04 .03 .08 .01 .02 .02 .01

Denial-of-Service .03 .01 -.03 .02 .04 .03 .02 .02 .07 .02

Shellcode Execution .01 -.02 .01 .01 .01 -.01 .09 .01 .01 .02

Malware Exploit -.01 .03 .02 .08 -.06 .01 .04 .03 .04 .01

Note: The rows display the names of the threat types; those names are abbreviated in the columns.

51

The Translog Estimation Model below is a derivative of the renowned Cobb-

Douglas Production function model used to calculate the relationship between production

output and production inputs. The model measures the ratios of inputs to one another for

efficient production and estimates the technological change in production methods. The

Cobb-Douglas Production model has a substantial limitation as it imposes an arbitrary

level for substitution possibilities between inputs. The Translog Estimation model was

chosen instead because it permits greater flexibility and more realistic estimation scores.

Formula 4. Trend to Month Translog Estimation Model

��3 � 4 5$ � �6����� 4/���6����� ���7 �86����� ! 9&'�

�������� (

9:;<=:�

��������

The formula above is the Translog Estimation Model where the delta in ��3

stands for Trend to Month, which is affected by 4 5$ � �6����� (Endpoint Detection),

4/���6����� (Email Detection), and ���7 �86����� (Network Detection) values.

Average Time of AI over total number of �������� scores plus beta Time of the ARIMA

able to forecast the threat events over their total number of �������� scores are added to

the formula as fixed variables to produce a stable R2. The results are shown below after

values have been calculated in STATA14. Table 7 shows that the most prominent threat

agents are Network Detected, where the malicious intents are from external entities

derived largely from the network at first.

52

Table 7

Trend to Month Translog Estimation Results

���5����53 �# Coefficient SE t P>|t| [95% Conf. Interval]

TrendtoMonth

ARIMATimeAvg.

AI TimeAvg.

EndpointDetect

EmailDetect

NetworkDetect

_cons

100.023 66.09165 25.45 0.004 48.19462 122.3276

85.9153 61.49118 24.77 0.006 45.16691 117.6643

87.9483 40.79165 9.01 0.001 36.65492 77.69821

51.2390 32.80269 8.14 0.003 27.26822 69.12977

43.8801 19.07419 1.68 0.018 27.04683 59.57094

77.9483 40.79165 9.01 0.001 36.65492 77.69821

27116.82 13921.92 4.52 0.607 18804.27 36516.69

R2

sigma_u

sigma_e

0.5932

.09724095

.2734401

Detailed Analysis (AI vs. ARIMA Statistical Computation to Predict Threat Events)

When visualized, AI’s’ prediction ability of future intrusion events is surprisingly

not too far off from the ARIMA statistical model. This test proves AI’s forecasting ability

is only replicating what is closest to ARIMA based on historical data. However, the

results have their limitations as they are solely based on a predictive model of past data

with an average trend computation. Yet, there are some large spikes and dipping lines

within the months of November and December as the results seem to mainly calculate the

average trend of each month reflecting to the traffic going into the honeypot from

historical data. A possible explanation based on Cyberlytic’s (2018) survey for such

occurrence of those two months is to have spikes as they are considered the holiday

season months where cyber criminals are most active. During the holiday season,

53

Cyberlytic (2018) believes cyber criminals prey on docile users with holiday season

phishing email advertisements and attempts to attack web applications due to high traffic

holiday shopping. They are also aware that the holiday season is the time when

cybersecurity analysts usually take time-off to spend time with family. Based on this

depiction, companies may need to explore contingency plans to ensure dedicated staff

have visibility on the network and endpoints during this time of the year when certain

members from Cybersecurity team are off for the holiday. When results show various

concave and convex nodes, one can ask which threat agents are the underlining causes.

According to Sokol and Gajdos (2018), the authors believe it is unreasonable in the realm

of Cybersecurity to have one single threat agent to have caused a significant change to

the threat trend result for a particular month. For instance, the entirety spike that resulted

for the December is not the outcome of only an SQL injection cause. Therefore, the 10

disseminated threat types must undergo a Spearman Rank’s Correlation test in order to

group them to the three categories of whether they are Network Detected, Endpoint

Detected, or Email Detected threat agents. Based on the result of Table 7, the reader can

see the most prominent threat agents are Network Detected which the malicious intents

are from external entities derived largely from the network at first.

Chapter Four Summary

In precis, Chapter Four has taken the reader to the two sections of “AI vs. Human

Analysis in Classifying Cybersecurity Intrusion Events” and “AI vs. ARIMA Statistic

Computation in Predicting Future Cybersecurity Intrusion Events.” In the first section,

data visualization was provided for 200 results of AI and Cybersecurity Analyst

classifying intrusion events chosen at random. The graph can get saturated and fuzzy if

54

22,286 intrusion event results are graphed; therefore, 200 results were graphed for a high-

level visualization to the reader. In addition, the y-axis of the graph is unable to accept

string variables; therefore, the 10 categories of intrusion types are converted to numeric

severity levels ranked from 1-10. An Ordinary Least Squares (model was run between the

data and their respective intrusion detection time. The higher in coefficient numbers

relate to the time AI and CAnalyst are able to detect the intrusion, the coefficient

(intrusion category) will be ranked higher in number. At the end of the section, Table 5

displays an overall data output extracted from Azure Artificial Intelligence Web Service

after the AI has classified the intrusion events based on their categories. The time the

CAnalyst takes to analyze and classify the intrusion events are recorded manually. In

closing, the results clearly indicate that AI has a much faster speed in classifying network

events to their intrusion categories than a human cybersecurity analyst. In the second

section, AI’s threat forecasting ability was compared to an ARIMA statistical model. The

predictive data values from AI and ARIMA are entered in to STATA 14 and a trend

graph is displayed. When results showed various concave and convex protuberances, the

study asked which threat agents are the underlining causes to the strange occurrence.

Based on Sokol and Gajdos (2018), it seemed unlikely that one standalone threat agent

could have a significant impact on the threat trend result for the month. Hence, the spike

in the month of December is not solely the outcome of only an SQL injection. Therefore,

the 10 disseminated threat types underwent a Spearman’s Rank Correlation test in order

to group them to the three categories of whether they are Network Detected, Endpoint

Detected, or Email Detected threat agents. Based on the result most outstanding threat

agent were Network Detected threats. Similar to Sokol and Gajdos (2018), the authors

55

believe most threats start out at the network layer then make their way into an

organization’s internal network in order to cause further damage to the endpoints,

databases, and systems that host mission-critical applications to run the business.

56

CHAPTER FIVE. CONCLUSIONS AND DISCUSSION

Introduction to Conclusions and Discussion

AI is considered one of the most promising developments in the information age,

with availability of high compute power and abundant data, AI has been on the rise in our

recent decade. It has been trying to enter into every business and industry, thus

cybersecurity cannot be left out (Hossein, et al. 2020). New algorithms, data volume, and

technological enhancements have let AI grow concurrently with the emerging global

security industry. Compared to conventional cybersecurity solutions, AI is more flexible,

adaptable, and robust, thus helping to improve security performance and better protect

systems from an increasing number of sophisticated cyber threats (Selden, 2016).

Currently, AI techniques have been widely adopted as powerful detection, prediction, and

response tool in the realm of cybersecurity (Collins, et al., 2019). Data within a network

environment is enormous, whether its firewall logs or network packets, user activities or

application logs, it’s difficult for human analysts to triage and analyze them in a rapid

speed for early detection, hence this is where AI can come into play with its Machine

Learning algorithm (Rajbanshi, et al., 2017). Armed with rapid and trustworthy analysis

provided by machine learning engine which can be used to take informed decisions for

the cybersecurity analysts. This study’s examination of AI’s ability to classify intrusion

events compared to a human Cybersecurity analyst proved that AI is much faster than

humans. Additionally, the paper also tests the prediction capability of AI to forecast

cybersecurity intrusion events with time-series datasets in comparison to a popular time-

series data prediction model, the ARIMA statistical model. Yet, this test proves AI’s

forecasting ability is only replicating what is closest to ARIMA based on historical data.

57

However, the results have limitations as they are based on a predictive model of past data

with an average trend computation. Thus, AI’s predictive ability is not heteroskedastic to

include threat risks due to social unrests, political changes, wars, natural disaster events,

or recently discovered product vulnerabilities which can constitute to the tremendous

changes in the threat vectors on a periodic basis of future threat trends.

Summary of the Results

This paper explores the topic of AI in its appositeness to Cybersecurity. AI’s

ability to classify intrusion events was compared to a human Cybersecurity analyst which

proved that AI is much faster than humans. Additionally, the paper also tests the

prediction capability of AI to forecast cybersecurity intrusion events with time-series

datasets compared to a popular time-series data prediction model, the ARIMA model.

Yet, this test proves AI’s forecasting ability is mimicking what is closest to ARIMA-

based past data values. As shown from the results on the first section of Chapter Four, the

results show that Cybersecurity Analysts are slower in threat detection than AI. Threats

with higher severity scores over five take the Cybersecurity Analysts more than 20

minutes to detect and classify while AI can identify the threat within minutes. The results

support the theory that AI can parse, analyze, and correlate log events to identify threats

with faster and more efficient speed. Thus, when it comes to reducing operational time in

finding anomalies, AI may prove to be ahead of human’s ability and competence. In the

second section, when AI is compared to an ARIMA statistical model to predict the

intrusion events. The two models are not too far off from each other, there are some dips

and spikes within the months. However, the results based on the historical data of the

traffic and threats going into the honeypot from the previous point in time. Yet, the

58

forecast is an average trend of the previous month’s data. It is understandable that there

are underlying limitations to AI’s prediction ability as it is based only on past data with

an average trend computation. Thus, AI’s predictive model was not heteroskedastic

where it took into consideration multiple externalities that can cause impacts to the threat

trend which includes but is not limited to: social unrest, policy changes, global news, or

natural disasters.

Discussion of the Results

The nature of this study type is exceptionally intricate and would require

leveraging several technology components in order to achieve the results. Those

technology components have been explained in previous chapters; hence, it seemed that

the three quintessential components that contributed most to the study’s results were the

Machine Learning engine, Cybersecurity Analyst, and the ARIMA model. The study

specifically focuses on Machine Learning to extract insights from security data as the

research design has the particular concept to build its own data-driven intelligent security

solution. Therefore, a dedicated honeypot environment was hosted on Azure Cloud

Services in order to collect the necessary data such as firewall logs and netflow in order

to build the Machine Learning engine. From there those data have been handled by

Corelight and Elastic technology in order to structure the data to a proper format that is

ingestible by the Azure Artificial Intelligence Web Service. From there the Machine

Learning engine builds its intelligence from the 80% that have been mapped using Cisco

Talos Threat Intelligence Rules while the remaining 20% have left unknown to AI to be

tested later. AI improves its knowledge to understand cybersecurity threats by consuming

a large number of data artifacts. In order to build the AI Machine Learning engine, data

59

are fed to the Machine Learning model hosted on Azure Artificial Intelligence Web

Service where it can process, analyze, and match events to rules in order to build its

intrusion detection and prediction algorithms. AI’s ability to classify intrusion events was

first compared to a human cybersecurity analyst, which proved that AI is much faster

than humans. However, the cybersecurity analyst used in the study to compare AI’s

intrusion event classification ability has 5-years experience as an analyst. However, the

results could have altered if the cybersecurity analyst is a more experienced or novice

analyst in the cybersecurity space and could have analyzed and classified intrusion events

way faster or slower than AI.

Next, the study asked about the prediction capability of AI to forecast

cybersecurity intrusion events when compared against a famous time-series data

prediction, the ARIMA statistical model. The ARIMA model, used to compare its ability

to forecast future trends to AI is a statistical analysis model that uses time series data to

forecast or predict future outcomes based on a historical time series. The model explains

a given time series based on its own past values, that is, of its own lags and the lagged

forecast errors, so that equation can be used to forecast future values (Zavadskaya, 2017).

Replacing the ARIMA model with a Bayesian Structural Time Series [BSTS]

model could have produced a different outcome. BSTS is also designed to work with

time series data. However, it has a different approach to ARIMA models when

performing time series forecasting, nowcasting, and inferring causal impact. More

importantly, it deals with uncertainty in a different manner. The model does not rely on

differencing, lags and moving averages, yet it quantifies the posterior uncertainty of the

60

individual components, controls the variance of the components, and incorporates

exogenous variables and multi-seasonal components more easily (Ogunc & Hill, 2018).

Discussion of the Results in Relation to the Literature

Prior academic works performed by various researchers to understand AI’s

relationships to cybersecurity are restricted by certain boundaries. Some are limited to the

lack of public data availability on AI in cybersecurity, some face a narrow scope in their

research due to technological constraints of their time. In Ghosh, et al. (1998), the authors

apply the support vector machine model to detect anomalous Windows registry accesses

using KDD99 dataset; however, anomalous Windows registry access has now become an

obsolete approach. Anomalous behaviors on users’ computers nowadays are comprised

of a wide range of activities from BIOS modification, malicious process execution,

account privilege escalation, to kernel corruption. In other research by Kozik, et al.

(2014), the authors used the CSIC 2010 HTTP Dataset to assess the classification of

Internet traffic. Their study specifically focuses on traffic using HTTP protocols by

detecting abnormality in well-known port numbers of IP flows, yet they have not

characterized the traffic itself. The weakness of this study is that at the time, the majority

of the Internet traffic used HTTP on logical port 80 and onlya few used other ports such

as port 23 for Telnet, port 21 for FTP, and port 139 for NetBIOS. Yet, over time, modern

network protocols have expanded drastically, with contemporary applications all

leveraging multitude port ranges from the 65,535 TCP/IP application ports from the

Internet founding fathers. Bhuyan, et al. (2014) attempt to take a different approach to

create unbiased real-life network intrusion datasets in order to compensate for the lack of

the available datasets. The authors create an entirely new dataset in the development of a

61

detection system, TUIDS Distributed Denial of service (DDoS) to test against an older

DDoS CAIDA dataset. Although the Bhuyan et al. study is very thorough, experimental

results show the proposed model yields 65% detection accuracy on the normalized

CAIDA dataset. Yet, the paper primarily only emphasizes DDoS attacks while leaving a

large number of attack vectors in the cyberspace unmentioned. Therefore, state of the art

in the works of this research, are fully applicable to the intrusion and misuse detection

problems and cover a wide-range intrusion vectors at the network, endpoint, and email

layers. The study also has a concentration on Machine Learning (a branch of AI) to

extract insights from security data as the research design; with the concept to build its

own data-driven intelligent security solution. Moreover, the paper has two unique

objectives of testing AI’s threat classification ability against a human cybersecurity

analyst and AI’s prediction ability of future threat events against a renowned time-series

data forecasting ARIMA statistical model in order to understand the effectiveness of AI

on cybersecurity.

Limitations

This dissertation had certain limitations as the study of AI is complicated due to

the lack of data. Data about AI in cybersecurity are extremely limited; even if some

legacy data can be found, there are problems with existing public datasets, such as

uneven data, and outdated contents from technology to data collection methods. In

addition, most of the data have already been aggregated. The dataset used to build a

Machine Learning engine is imperative as sufficient and proper data are needed to train

and test the system. In order to collect raw security data, it was necessary to create a

dedicated honeypot environment hosted on Azure Cloud Services. Obtaining such dataset

62

is difficult and time-consuming as it took 9 months to collect enough data. Furthermore,

the study’s limited budget meant that it was not possible to implement the best of breed

AI system out there. For these reasons, this study is limited to the investigation of the

performance of Azure Artificial Intelligence Web Service. It is important to note that the

results of this research could not represent the AI models used at wide, as each AI type

differs from one another. The research is also limited to the linear methods used, such as

the ARIMA statistical model when compared to Machine Learning. Thus, the comparison

on the classification and prediction of threats consists only of the branch of Machine

Learning model within AI while the research rejects other subsets of AI such as: natural

language processing, vision, robotics, pattern recognition, or convolutional neural

network, The study design has constraints of using low-end and open-sourced technology

such as Corelight, Elastic, Python, STATA, and Azure Artificial Intelligence Web Server.

Thus, the results of this research could tremendously vary if more sophisticated

technology were introduced.

Implication of the Results for Practice

AI, and in particular Machine Learning, has taken huge strides impacting all

aspects of industry and society. This development has been fueled by decades of

exponential improvement in hardware computing power, combined with progress in

algorithms and, perhaps most importantly, a huge increase in the volume of data for

training and testing that is ready for AI to intake (Xin, et al., 2018). AI is now ready

improve the efficiencies of our workplaces and can augment the work humans do as it is

gradually being integrated into the fabric of business and applicable fields of science.

However, not all sectors are equally advanced through AI (Tyugu, 2011). Therefore, this

63

paper explores the topic of AI in its effectiveness to cybersecurity. The paper showcases

AI’s ability to classify intrusion events compared to a human cybersecurity and AI’s

ability to forecast future cybersecurity events when compared to a time-series prediction

model, the ARIMA model. The study is uniquely conducted with a domain of

cybersecurity data science by collecting raw security data from a honeypot. Using cloud

services such as Azure Artificial Intelligence Web Service to build a Machine Learning

engine provided by semi-supervised machine learning techniques, where the data is

analyzed and referenced to threat intelligence sources to train the Machine Learning

algorithm. For practitioners this study opens a spectrum of ideas on how cybersecurity

data science and relevant learning methods can be used to design data driven intelligent

decision-making cybersecurity systems and services from machine learning perspectives

for organizations. This study on AI, Machine Learning, and cybersecurity data science

opens a promising path and can be used as a reference guide for both academia and

industry leaders for later research and applications in the area of cybersecurity. Due to the

fast-growing nature of AI and its promising benefits for the future, certain ethical issues

and adverse effects of AI still need to be addressed. It is necessary to resolve these related

risks and concerns as early as possible. But, given these concerns and those sustainable

solutions are still somewhat hindered in sight, a socially responsible use of AI within

cybersecurity is highly recommended. Defining the boundaries of ethical access to data is

a complex problem that affects various stakeholders: citizens, the state, corporations,

public institutions, etc. This study displays a lightweight AI engine that has the capability

to classify and forecast intrusion events. Although the AI engine built in this study is very

basic, government officials and world leaders must ensure organizations protect

64

proprietary AI machines that are super computers from ending up in the wrong hands of

malicious entities who would exploit AI’s ability to carry out much vitriolic intents.

Recommendations for Further Research

In this paper, the primary focal points of AI’s applicability to cybersecurity are on

its abilities to classify and predict attacks. Yet, according to Collins, et al. (2019) a

prolific component of AI its autonomous ability to respond to threat events where this

research paper’s conclusion section builds a theoretical hypothesis which poses questions

for further research. In the future, other researchers may focus on AI’s effectiveness to

respond to cybersecurity threats on behalf of human cybersecurity analysts. Furthermore,

data collected by honeypots conducted in this research are interesting as they are raw

cybersecurity data. Much of this paper focuses on cybersecurity data examining raw

security data to data-driven decision making for intelligent security solutions, yet it could

also be related to big data analytics in terms of data processing and decision making. Big

data deals with data sets that are too large or have complex characteristics. Overall, this

paper’s aim is not only to discuss cybersecurity data science and relevant methods but to

discuss applicability towards data-driven intelligent decision making in cybersecurity

systems and services from machine learning perspectives. Although there were not

impressive outcomes for the predictive model of the Machine Learning engine when

compared to an ARIMA statistical model, AI’s forecasting ability is only replicating what

is closest to ARIMA based on historical data by averaging the trends. Future research

may explore multivariate time series data that are heteroskedastic where the predictive

models can take into consideration multiple externalities that may cause impacts to the

threat trend which includes, but not limited to: social unrests, political changes, natural

65

disasters, pandemics, etc. Future work could assess the effectiveness of other areas of

empirical evaluation of the suggested data-driven model, and comparative analysis of

other subsets within AI such as natural language processing, vision, robotics, pattern

recognition, and convolutional neural network, which can pose further research on how

these can be applicable to using AI in cybersecurity.

Conclusion

Despite the positive future for AI, it can also introduce potential global risk for

human civilization, there could be some ethical issues such as missing moral code of AI’s

autonomous decision-making ability and concerns about the lack of data privacy (Patel,

2017). AI is a kind of intelligent system capable of making decisions on its own. This

system represents the direction of the development of computer functions related to

human intelligence, such as: reasoning, training, and problem solving. In other words, AI

is the transfer of human capabilities of mental activity to the plane of computer and

information technologies, but without inherent human vices (Zhang, et al., 2018). To that

extent, if AI is given the authority to act on its own without human interventions it would

wreak havoc to enterprises in such cases where AI has taken actions on its own, but those

are erroneous. For example, if AI were to respond to a threat event incorrectly and the

Cybersecurity department relies on that response of AI, it could ignore true positive threat

events that can cause major damage to the company. Secondly, AI raises major privacy

concerns as it can ingest large amounts of data in the milliseconds. With this speed and

volume of data ingestion, AI may potentially capture employee’s username, personal

identifiable information, salary, etc. across the network; once absorbed, it would become

66

hard to separate the wheat from the chaff of those data in the long run (Williams &

McGregor, 2020).

In the most recent cybersecurity products’ marketing articles written about AI,

many speak about the applicability of AI in cybersecurity; some even emphasize that AI

has the ability to replace human analysts. Although AI has undergone major advancement

in recent years, yet the truth is we might not be at that stage yet. AI can certainly be a

complimentary product for human cybersecurity analysts, but at our current stage in time

it cannot be a substitution product. As mentioned in the beginning of the research paper,

this study suggests that AI can automatically classify certain threat events to help

cybersecurity analysts prioritize where they should focus their attention, and can also

predict future threat trends to provide insight into futuristic threat trends. Thus, AI should

only act as a source of intelligence for human-decision making but not be taking

autonomous actions on its own. At this technological stage, a strong interdependence

between AI systems and human factors is necessary for augmenting cybersecurity’s

maturity. Moreover, a holistic view on the cybersecurity landscape within the enterprise

IT environment is needed as cybersecurity is not only a technological paradigm, it is also

an art of how security risks are dealt with through human logic and experience. It is

necessary to integrate technical solutions, relevant processes to achieve optimal security

performance; however, in the end, it is still the human factor that matters, not just the

tools themselves. Therefore, a combined effort of human and AI will surely provide more

excellence to fight off cyber-criminals jointly.

67

APPENDIX A. INSTITUTIONAL REVIEW BOARD (IRB)

Institutional Review Board | Academic Affairs Phone. 786.417.9300 | Email. [email protected]

Memorandum Date: 1.22.21

To: Dr. Knowles and Mr. Pham From: Tony Andenoro, Chair, Institutional Review Board

Subject: IRB Application 1.2021.3 – Artificial Intelligence in Cybersecurity - Concentration on the Effectiveness of Machine Learning

St. Thomas University Institutional Review Board (IRB) is pleased to inform you that your research protocol submitted for review on 1.20.21 has been approved after a formal review for exempt status aligning with the Common Rule, Code of Federal Regulations – Human Subjects Research provision. Please note that approval for this study will lapse on 1.22.22 and any changes to the provided protocol will require notification and may require additional approval. More specifically, any changes to any portion of the research project, including but not limited to instrumentation protocol(s) or informed consent must be reviewed and approved by the IRB prior to implementation. In addition, if there are any unanticipated adverse reactions or unanticipated events associated with the conduct of this research, you should immediately suspend the project and contact the IRB Chair for consultation.

Should you have any questions feel free to contact me at 786.417.9300 between 8AM and 5PM EST Monday through Friday or via e-mail at [email protected]. Thank you and have a great day. Sincerely,

Anthony C. Andenoro, Ph.D. Chair | Institutional Review Board Executive Director | Institute for Ethical Leadership Office of the Provost | St. Thomas University

16401 NW 37 Avenue • Miami Gardens, FL 33054 • 305.625.6000 stu.edu

68

REFERENCES

Al Qahtani, H., Sarker, I. H., Kalim, A., & Hossain, S. (2020). Cyber intrusion detection

using machine learning classification techniques. In N. Chaubey, S. Parikh, & K.

Amin (Eds.), Computing Science, Communication, and Security.

Communications in Computer and Information Science, 1235, 121-131. Springer.

https://doi.org/10.1007/978-981-15-6648-6_10

Apruzzese, G., Colajanni, M., Ferretti, L., Guido, A., & Marchetti, M. (2018). On the

effectiveness of machine and deep learning for cyber security. 2018 10th

International Conference on Cyber Conflict (CyCon), 2018, 371-390. Tallinn,

Estonia, May 29-June 1 2018. doi: 10.23919/CYCON.2018. 8405026

Banoth, L., Teja, M. S., Saicharan, M., & Chandra, N. J. (2016). A survey of data mining

and machine learning methods for cyber security intrusion detection.

International Journal of Research, 4, 406-412. DOI: 10.23883/ijrter.2017.3117.

9nwqv

Benavente-Peces, C., & Bartolini, D. (2019). Insights in machine learning for cyber

security assessment. In C. Benavente-Peces, S. Slama, & B. Zafar (Eds)

Proceedings of the 1st International Conference on Smart Innovation, Ergonomics

and Applied Human Factors (SEAHF), Madrid, January 2019. Smart

Innovation, Systems and Technologies, 150. Springer. https://doi.org/10.

1007/978-3-030-22964-1_33

Becue, A., Praça, I., & Gama, J. (2021). Artificial intelligence, cyber-threats and industry

4.0: Challenges and opportunities. Artificial Intelligence Review, 54, 3849-3886.

DOI: 10.1007/s10462-020-09942-2

69

Bhatele, K. R., Shrivastava, H., & Kumari, N. (2019). The role of artificial intelligence in

cyber security. In S. Geetha, & A. V. Phamila (Eds.), Countering Cyber Attacks

and Preserving the Integrity and Availability of Critical Systems (pp.170-192).

IGI Global. Doi: 10.4018/978-1-5225-8241-0.ch009.

Bhuyan, M. H., Kashyap, H. J., Bhattacharyya, D. K., & Kalita, J. K. (2014). Detecting

distributed DoS attacks: Methods, tools and future directions. The Computer

Journal, 57(4), 537-556. https://doi.org/10.1093/comjnl/bxt031

Buczak, A. L., & Guven, E. (2016). A survey of data mining and machine learning

methods for cyber security intrusion detection. IEEE Communications Surveys &

Tutorials, 18, 1153-1176. Doi 10.1109/COMST.2015.2494502

Butler M., & Kazakov, D. (2010). The effects of variable stationarity in a financial time

series on artificial neural networks. 2011 IEEE Symposium on Computational

Intelligence for Financial Engineering and Economics (CIFEr), 2011, 1-8, doi:

10.1109/CIFER.2011. 5953557.

Calderon, R., & Floridi, L. (2019). The benefits of artificial intelligence in cybersecurity.

Nature machine intelligence, Business Computer Science Journal, 1065(3), 1-4.

Cioffi, R., Travaglioni, M., Piscitelli, G., Petrillo, A., & Felice, M. (2019). Artificial

intelligence and machine learning applications in smart production: Progress,

trends, and directions. Sustainability, 12(2), 492. DOI:10.3390/su12020492

Collins, C., Dennehy D., Kieran C., & Mikalef, P. (2019). Artificial intelligence in

information systems research. International Journal of Information Management,

60(2), 56-70. https://doi.org/10.1016/j.ijinfomgt.2021.102383.

70

Conner-Simons, A. (2016, April 18). System predicts 85 percent of cyber-attacks using

input from human experts. https://phys.org/news/2016-04-percent-cyber-attacks-

human-experts.html

Cyberlytic. (2018). AI for web security technical data sheet. https://www.cyberlytic.

com/uploads/ resources/Technical-Data-Sheet-Final.pdf

Cylance. (2020). Continuous threat prevention powered by artificial intelligence.

https://www. cylance.com/content/dam/cylance-web/en-us/resources/knowledge-

center/resource-library/datasheets/CylancePROTECT.pdf

Darktrace. (2018). Detects and classifies cyber-threats across your enterprise.

https://www. darktrace.com/en/products

Dasgupta, D., Akhtar, Z., & Sen, S. (2020). Machine learning in cybersecurity: A

comprehensive survey. The Journal of Defense Modeling and Simulation, 45, 90-

109. DOI: 10.1177/ 1548512920951275

Davenport T., & Kalakota, R. (2019). The potential for artificial intelligence in

healthcare. Future Healthcare Journal, 6(2), 94-98. DOI: 10.7861/futurehosp.6-2-

94

Demertzis, K., & Iliadis, L.S. (2015). A bio-inspired hybrid artificial intelligence

framework for cyber security. In N. J. Daras, & M. T. Rassias (Eds.),

Computation, Cryptography, and Network Security (pp.161-193). Springer.

Devakunchari, R., & Sourabh, M. (2019). A study of cyber security using machine

learning techniques. International Journal of Innovative Technology and

Exploring Engineering, 8(7C2), 178-255. https://www.ijitee.org/

71

Fazil, M., Rohila, A., & Ghaparb, W. (2019). The era of artificial intelligence in

Malaysian higher education: Impact and challenges in tangible mixed-reality

learning system towards self-exploration education. Procedia Computer Science,

163(2), 2-10. https://doi.org/10.1016/j.procs.2019.12.079

FinancesOnline (2018, July 5). FinancesOnline IBM MaaS360 review. https://reviews.

financesonline.com/p/ibm-maas360

Frank, J. (2014). Artificial intelligence and intrusion detection: Current and future

directions. Proceedings of the 17th National Computer Security Conference,

Baltimore, October 1994. https://www.cerias.purdue.edu/apps/reports_and_

papers/ view/894

Goyal, Y., & Sharma, A. (2019). A semantic machine learning approach for cyber

security monitoring. 2019 3rd International Conference on Computing

Methodologies and Communication, 2019, 439-442. March 2019. doi:

10.1109/ICCMC.2019.8819796.

Hossein, M. R., Karimipour, H., Rahimnejad, A., Dehghantanha A., & Srivastava, G.

(2020). Anomaly detection in cyber-physical systems using machine learning. In

K. R. Choo, & A. Dehghantanha (Eds.), Handbook of Big Data Privacy (pp.

219-236). Springer Nature. https://doi.org/10.1007/978-3-030-38557-6_10

Hussain, F., Hussain, R., Hassan, S. A., & Hossain, E. (2020). Machine learning in IoT

security: Current solutions and future challenges. IEEE Communications Surveys

& Tutorials, 22(3), 1686-1721. doi: 10.1109/COMST.2020.2986444.

72

Kabbas, A., & Munshi, A. (2020). Artificial intelligence applications in cybersecurity.

International Journal of Computer Science and Network Security, 20(2), 1-22.

http://paper.ijcsns.org/07_book/202002/20200216.pdf

Khisimova, Z. I., Begishev, I., R., & Sidorenko, E. L. (2019). Artificial intelligence and

problems of ensuring cyber security. International Journal of Cyber Criminology,

13(2), 564–577. DOI: 10.5281/zenodo.3709267

Kozik, R., Choraś, M., Renk, R., & Holubowicz, W. (2014). A proposal of algorithm for

web applications cyber attack detection security. In K. Saeed, & V. Snášel (Eds),

Computer Information Systems and Industrial Management. CISIM 2015.

Lecture Notes in Computer Science, 8838. Springer.

https://doi.org/10.1007/978-3-662-45237-0_61

Li, J. (2019). Cyber security meets artificial intelligence: A survey. Frontiers of

Information Technology & Electronic Engineering Journal, 19, 1462-1474.

https://doi.org/10.1631/ FITEE.1800573

Min, H. (2015). Artificial intelligence in design and quality assurance management.

International Journal of Logistics Management, 1855(4), 12-17. https://www.

emeraldgrouppublishing.com/journal/ijlm

Mittal, S., Joshi, A., & Finin, W. (2019). Cyber-all-intel: An AI for security related threat

intelligence. Computer Science ArcXiv Archive, 1905.02895. https://arxiv.org/abs/

1905.02895#:~:text=In%20this%20paper%20we%20present%2C%20Cyber-All-

Intel%20an%20artificial,cybersecurity%20informatics%20domain.%20It%20uses

%20multiple%20knowledge%20representations

73

National Initiative for Cybersecurity Careers and Studies [NICCS]. (2018). Glossary.

https:// niccs.cisa.gov/about-niccs/cybersecurity-glossary

Ogunc, A. & Hill, C. (2008) Using Excel: Companion to Principles of Econometrics (3rd

ed.).

https://econweb.tamu.edu/hwang/CLASS/Ecmt463/Lecture%20Notes/Excel/Exce

l_Lessons.pdf

Palmer, T. (2017). Vectra cognito—Automating security operations with AI. ESG lab

review. https://info.vectra.ai/hs-fs/hub/388196/file-1918923738.pdf

Pantano, E., & Pizzi, G. (2020). Forecasting artificial intelligence on online customer

assistance: Evidence from chatbot patents analysis. Journal of Retailing and

Consumer Services, 55, 10-39. https://doi.org/10.1016/j.jretconser.2020.102096

Parrend, P., Navarro, J., Guigou, F., Deruyver, A., & Collet, P. (2018). Foundations and

applications of artificial intelligence for zero-day and multi-step attack detection.

EURASIP Journal on Information Security, 4(2018), 1-85.

https://doi.org/10.1186/ s13635-018-0074-y

Patel, M. (2017). QRadar UBA app adds machine learning and peer group analyses to

detect anomalies in user’s activities. https://securityintelligence.com/qradar-uba-

app-adds-machine-learning-and-peer-group-analyses-to-detect-anomalies-in-

users-activities

Pham, B. T., Le, L. M., Le, T., Bui, K. T., Le, V. M., Hai-Bang, L., & Prakash, I. (2020).

Development of advanced artificial intelligence models for daily rainfall

prediction. Atmospheric Research, 237, 104845.

https://doi.org/10.1016/j.atmosres.2020.104845

74

Rajbanshi, A., Bhimrajka, S., & Raina, C. K. (2017). Artificial intelligence in

cybersecurity. International Journal for Research in Applied Science and

Engineering Technology, 2(3), 132-137.

https://ijsrcseit.com/paper/CSEIT1722265.pdf

Sarker, I. H., Abushark, Y. B., Alsolami, F., & Khan, A. I. (2020). IntruDTree: A

machine learning based cyber security intrusion detection model. Symmetry,

12(5), 754. https://doi.org/10.3390/sym12050754

Sarker, I. H., Kayes, A. S., Al Qahtani, H., & Watters, P. A. (2020). Cybersecurity data

science: An overview from machine learning perspective. Journal of Big Data,

7(41). Doi: 10.1186/s40537-020-00318-5

Schuurmans, D. (1995). Convex training algorithms: Explaining machine learning. IEEE

Transactions on Pattern Analysis and Machine Intelligence Journal, 86, 218-323.

http://www.computer.org

Selden, H. (2016). Deep instinct: A new way to prevent malware, with deep learning.

Tom’s Hardware. https://www.tomshardware.com/news/deep-instinct-deep-

learning-malware-detection,31079.html

Sokol, P., & Gajdos, A. (2018). Prediction of attacks against honeynet based on time

series modeling. Advances in Intelligent Systems and Computing, 662, 360-371.

DOI:10.1007/ 978-3-319-67621-0_33

Soni, S. & Bhushan, B. (2019). Use of machine learning algorithms for designing

efficient cyber security solutions. 2019 2nd International Conference on

Intelligent Computing, Instrumentation and Control Technologies, 2019, 1496-

75

1501. Kannur, Kerala, India, July 2019. doi:10.1109/ICICICT46008.2019.

8993253.

SparkCognition. (2018). A cognitive approach to system protection.

https://www.sparkcognition. com/deep-armor-cognitive-anti-malware

Stonefly. (2018). Amazon Macie: Artificial intelligence for efficient data security. https://

stonefly.com/blog/amazon-macie-artificialintelligence-efficient-data-security

Truve, S. (2017). Machine learning in cyber security: Age of the centaurs. http://www.

brookcourtsolutions.com/wp-content/uploads/2017/07/Machine-Learning-in-

Cyber-Security-White-Paper-Brookcourt.pdf

Tyugu, E. (2011). Artificial intelligence in cyber defense. In C. Czosseck, E. Tyugu, & T.

Wingfield (Eds.), 3rd International Conference on Cyber Conflict, 3, 1-11.

Tallinn, Estonia, June 2011. CCD COE Publications.

Virmani, C., Choudhary, T., Pillai, A., & Rani, M. (2020). Applications of machine

learning in cyber security. In G. Padmavathi, & D. Shanmugapriya (Eds.),

Handbook of research on machine and deep learning applications for cyber

security (pp. 83-103). ICI Global.

Williams J., & McGregor, S. (2020). What can artificial intelligence do for security

analysis? IBM QRadar Advisor with Watson. https://www.ibm.com/us-

en/marketplace/cognitive-security-analytics

Xin, Y., Kong, L., Liu, Z., Chen, Y., Li, Y., Zhu, H., Gao, M., Hou, H., & Wang, C.

(2018). Machine learning and deep learning methods for cybersecurity. IEEE

Access, 6, 3335365-35381 DOI: 10.1109/ACCESS.2018.2836950

76

Yavanoglu, O., & Aydos, M. (2017). A review on cybersecurity datasets for machine

learning algorithms. 2017 IEEE International Conference on Big Data, 2017,

2186-2193. Honolulu, June 2017. doi: 10.1109/BigData.2017.8258167

Zavadskaya, A. (2017). Artificial intelligence in finance: Forecasting stock market

returns using artificial neural networks. The Alan Turing Institute Journal,

N510129, 1-177. https:// www. turing.ac.uk/research/research-programmes/data-

centric-engineering/journal

Zhang, Z., Yu, Y., Zhang, H., Newberry, E., Mastorakis, S., Li, Y., Afanasyev, A., &

Zhang, L. (2018). An overview of security support in named data networking.

NDN, Technical Report NDN-0057. http://named-data.net/techreports.htm

ProQuest Number:

INFORMATION TO ALL USERS The quality and completeness of this reproduction is dependent on the quality

and completeness of the copy made available to ProQuest.

Distributed by ProQuest LLC ( ). Copyright of the Dissertation is held by the Author unless otherwise noted.

This work may be used in accordance with the terms of the Creative Commons license or other rights statement, as indicated in the copyright statement or in the metadata

associated with this work. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder.

This work is protected against unauthorized copying under Title 17, United States Code and other applicable copyright laws.

Microform Edition where available © ProQuest LLC. No reproduction or digitization of the Microform Edition is authorized without permission of ProQuest LLC.

ProQuest LLC 789 East Eisenhower Parkway

P.O. Box 1346 Ann Arbor, MI 48106 - 1346 USA

28962006

2022

<< /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Warning /CompatibilityLevel 1.4 /CompressObjects /Tags /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /CMYK /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams false /MaxSubsetPct 35 /Optimize true /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo true /PreserveFlatness true /PreserveHalftoneInfo false /PreserveOPIComments true /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 300 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.00000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.00000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.00000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /CreateJDFFile false /Description << /ARA <FEFF06270633062A062E062F0645002006470630064700200627064406250639062F0627062F0627062A002006440625064606340627062100200648062B062706260642002000410064006F00620065002000500044004600200645062A064806270641064206290020064406440637062806270639062900200641064A00200627064406450637062706280639002006300627062A0020062F0631062C0627062A002006270644062C0648062F0629002006270644063906270644064A0629061B0020064A06450643064600200641062A062D00200648062B0627062606420020005000440046002006270644064506460634062306290020062806270633062A062E062F062706450020004100630072006F0062006100740020064800410064006F006200650020005200650061006400650072002006250635062F0627063100200035002E0030002006480627064406250635062F062706310627062A0020062706440623062D062F062B002E0635062F0627063100200035002E0030002006480627064406250635062F062706310627062A0020062706440623062D062F062B002E> /BGR <FEFF04180437043f043e043b043704320430043904420435002004420435043704380020043d0430044104420440043e0439043a0438002c00200437043000200434043000200441044a0437043404300432043004420435002000410064006f00620065002000500044004600200434043e043a0443043c0435043d04420438002c0020043c0430043a04410438043c0430043b043d043e0020043f044004380433043e04340435043d04380020043704300020043204380441043e043a043e043a0430044704350441044204320435043d0020043f04350447043004420020043704300020043f044004350434043f0435044704300442043d04300020043f043e04340433043e0442043e0432043a0430002e002000200421044a04370434043004340435043d043804420435002000500044004600200434043e043a0443043c0435043d044204380020043c043e0433043004420020043404300020044104350020043e0442043204300440044f0442002004410020004100630072006f00620061007400200438002000410064006f00620065002000520065006100640065007200200035002e00300020043800200441043b0435043404320430044904380020043204350440044104380438002e> /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e9ad88d2891cf76845370524d53705237300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002> /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc9ad854c18cea76845370524d5370523786557406300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002> /CZE <FEFF005400610074006f0020006e006100730074006100760065006e00ed00200070006f0075017e0069006a007400650020006b0020007600790074007600e101590065006e00ed00200064006f006b0075006d0065006e0074016f002000410064006f006200650020005000440046002c0020006b00740065007200e90020007300650020006e0065006a006c00e90070006500200068006f006400ed002000700072006f0020006b00760061006c00690074006e00ed0020007400690073006b00200061002000700072006500700072006500730073002e002000200056007900740076006f01590065006e00e900200064006f006b0075006d0065006e007400790020005000440046002000620075006400650020006d006f017e006e00e90020006f007400650076015900ed007400200076002000700072006f006700720061006d0065006300680020004100630072006f00620061007400200061002000410064006f00620065002000520065006100640065007200200035002e0030002000610020006e006f0076011b006a016100ed00630068002e> /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000620065006400730074002000650067006e006500720020007300690067002000740069006c002000700072006500700072006500730073002d007500640073006b007200690076006e0069006e00670020006100660020006800f8006a0020006b00760061006c0069007400650074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e> /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f006300680077006500720074006900670065002000500072006500700072006500730073002d0044007200750063006b0065002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e> /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f0062006500200061006400650063007500610064006f00730020007000610072006100200069006d0070007200650073006900f3006e0020007000720065002d0065006400690074006f007200690061006c00200064006500200061006c00740061002000630061006c0069006400610064002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e> /ETI <FEFF004b00610073007500740061006700650020006e0065006900640020007300e4007400740065006900640020006b00760061006c006900740065006500740073006500200074007200fc006b006900650065006c007300650020007000720069006e00740069006d0069007300650020006a0061006f006b007300200073006f00620069006c0069006b0065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740069006400650020006c006f006f006d006900730065006b0073002e00200020004c006f006f0064007500640020005000440046002d0064006f006b0075006d0065006e00740065002000730061006100740065002000610076006100640061002000700072006f006700720061006d006d006900640065006700610020004100630072006f0062006100740020006e0069006e0067002000410064006f00620065002000520065006100640065007200200035002e00300020006a00610020007500750065006d006100740065002000760065007200730069006f006f006e00690064006500670061002e000d000a> /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f0075007200200075006e00650020007100750061006c0069007400e90020006400270069006d007000720065007300730069006f006e00200070007200e9007000720065007300730065002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e> /GRE <FEFF03a703c103b703c303b903bc03bf03c003bf03b903ae03c303c403b5002003b103c503c403ad03c2002003c403b903c2002003c103c503b803bc03af03c303b503b903c2002003b303b903b1002003bd03b1002003b403b703bc03b903bf03c503c103b303ae03c303b503c403b5002003ad03b303b303c103b103c603b1002000410064006f006200650020005000440046002003c003bf03c5002003b503af03bd03b103b9002003ba03b103c42019002003b503be03bf03c703ae03bd002003ba03b103c403ac03bb03bb03b703bb03b1002003b303b903b1002003c003c103bf002d03b503ba03c403c503c003c903c403b903ba03ad03c2002003b503c103b303b103c303af03b503c2002003c503c803b703bb03ae03c2002003c003bf03b903cc03c403b703c403b103c2002e0020002003a403b10020005000440046002003ad03b303b303c103b103c603b1002003c003bf03c5002003ad03c703b503c403b5002003b403b703bc03b903bf03c503c103b303ae03c303b503b9002003bc03c003bf03c103bf03cd03bd002003bd03b1002003b103bd03bf03b903c703c403bf03cd03bd002003bc03b5002003c403bf0020004100630072006f006200610074002c002003c403bf002000410064006f00620065002000520065006100640065007200200035002e0030002003ba03b103b9002003bc03b503c403b103b303b503bd03ad03c303c403b503c103b503c2002003b503ba03b403cc03c303b503b903c2002e> /HEB <FEFF05D405E905EA05DE05E905D5002005D105D405D205D305E805D505EA002005D005DC05D4002005DB05D305D9002005DC05D905E605D505E8002005DE05E105DE05DB05D9002000410064006F006200650020005000440046002005D405DE05D505EA05D005DE05D905DD002005DC05D405D305E405E105EA002005E705D305DD002D05D305E405D505E1002005D005D905DB05D505EA05D905EA002E002005DE05E105DE05DB05D90020005000440046002005E905E005D505E605E805D5002005E005D905EA05E005D905DD002005DC05E405EA05D905D705D4002005D105D005DE05E605E205D505EA0020004100630072006F006200610074002005D5002D00410064006F00620065002000520065006100640065007200200035002E0030002005D505D205E805E105D005D505EA002005DE05EA05E705D305DE05D505EA002005D905D505EA05E8002E05D005DE05D905DD002005DC002D005000440046002F0058002D0033002C002005E205D905D905E005D5002005D105DE05D305E805D905DA002005DC05DE05E905EA05DE05E9002005E905DC0020004100630072006F006200610074002E002005DE05E105DE05DB05D90020005000440046002005E905E005D505E605E805D5002005E005D905EA05E005D905DD002005DC05E405EA05D905D705D4002005D105D005DE05E605E205D505EA0020004100630072006F006200610074002005D5002D00410064006F00620065002000520065006100640065007200200035002E0030002005D505D205E805E105D005D505EA002005DE05EA05E705D305DE05D505EA002005D905D505EA05E8002E> /HRV (Za stvaranje Adobe PDF dokumenata najpogodnijih za visokokvalitetni ispis prije tiskanja koristite ove postavke. Stvoreni PDF dokumenti mogu se otvoriti Acrobat i Adobe Reader 5.0 i kasnijim verzijama.) /HUN <FEFF004b0069007600e1006c00f30020006d0069006e0151007300e9006701710020006e0079006f006d00640061006900200065006c0151006b00e90073007a00ed007401510020006e0079006f006d00740061007400e100730068006f007a0020006c006500670069006e006b00e1006200620020006d0065006700660065006c0065006c0151002000410064006f00620065002000500044004600200064006f006b0075006d0065006e00740075006d006f006b0061007400200065007a0065006b006b0065006c0020006100200062006500e1006c006c00ed007400e10073006f006b006b0061006c0020006b00e90073007a00ed0074006800650074002e0020002000410020006c00e90074007200650068006f007a006f00740074002000500044004600200064006f006b0075006d0065006e00740075006d006f006b00200061007a0020004100630072006f006200610074002000e9007300200061007a002000410064006f00620065002000520065006100640065007200200035002e0030002c0020007600610067007900200061007a002000610074007400f3006c0020006b00e9007301510062006200690020007600650072007a006900f3006b006b0061006c0020006e00790069007400680061007400f3006b0020006d00650067002e> /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f00620065002000500044004600200070006900f900200061006400610074007400690020006100200075006e00610020007000720065007300740061006d0070006100200064006900200061006c007400610020007100750061006c0069007400e0002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e> /JPN <FEFF9ad854c18cea306a30d730ea30d730ec30b951fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a306b306f30d530a930f330c8306e57cb30818fbc307f304c5fc59808306730593002> /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020ace0d488c9c80020c2dcd5d80020c778c1c4c5d00020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e> /LTH <FEFF004e006100750064006f006b0069007400650020016100690075006f007300200070006100720061006d006500740072007500730020006e006f0072011700640061006d00690020006b0075007200740069002000410064006f00620065002000500044004600200064006f006b0075006d0065006e007400750073002c0020006b00750072006900650020006c0061006200690061007500730069006100690020007000720069007400610069006b007900740069002000610075006b01610074006f00730020006b006f006b007900620117007300200070006100720065006e006700740069006e00690061006d00200073007000610075007300640069006e0069006d00750069002e0020002000530075006b0075007200740069002000500044004600200064006f006b0075006d0065006e007400610069002000670061006c006900200062016b007400690020006100740069006400610072006f006d00690020004100630072006f006200610074002000690072002000410064006f00620065002000520065006100640065007200200035002e0030002000610072002000760117006c00650073006e0117006d00690073002000760065007200730069006a006f006d00690073002e> /LVI <FEFF0049007a006d0061006e0074006f006a00690065007400200161006f00730020006900650073007400610074012b006a0075006d00750073002c0020006c0061006900200076006500690064006f00740075002000410064006f00620065002000500044004600200064006f006b0075006d0065006e007400750073002c0020006b006100730020006900720020012b00700061016100690020007000690065006d01130072006f00740069002000610075006700730074006100730020006b00760061006c0069007401010074006500730020007000690072006d007300690065007300700069006501610061006e006100730020006400720075006b00610069002e00200049007a0076006500690064006f006a006900650074002000500044004600200064006f006b0075006d0065006e007400750073002c0020006b006f002000760061007200200061007400760113007200740020006100720020004100630072006f00620061007400200075006e002000410064006f00620065002000520065006100640065007200200035002e0030002c0020006b0101002000610072012b00200074006f0020006a00610075006e0101006b0101006d002000760065007200730069006a0101006d002e> /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200065007200200062006500730074002000650067006e0065007400200066006f00720020006600f80072007400720079006b006b0073007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e> /POL <FEFF0055007300740061007700690065006e0069006100200064006f002000740077006f0072007a0065006e0069006100200064006f006b0075006d0065006e007400f300770020005000440046002000700072007a0065007a006e00610063007a006f006e00790063006800200064006f002000770079006400720075006b00f30077002000770020007700790073006f006b00690065006a0020006a0061006b006f015b00630069002e002000200044006f006b0075006d0065006e0074007900200050004400460020006d006f017c006e00610020006f007400770069006500720061010700200077002000700072006f006700720061006d006900650020004100630072006f00620061007400200069002000410064006f00620065002000520065006100640065007200200035002e0030002000690020006e006f00770073007a0079006d002e> /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020006d00610069007300200061006400650071007500610064006f00730020007000610072006100200070007200e9002d0069006d0070007200650073007300f50065007300200064006500200061006c007400610020007100750061006c00690064006100640065002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e> /RUM <FEFF005500740069006c0069007a00610163006900200061006300650073007400650020007300650074010300720069002000700065006e007400720075002000610020006300720065006100200064006f00630075006d0065006e00740065002000410064006f006200650020005000440046002000610064006500630076006100740065002000700065006e0074007200750020007400690070010300720069007200650061002000700072006500700072006500730073002000640065002000630061006c006900740061007400650020007300750070006500720069006f006100720103002e002000200044006f00630075006d0065006e00740065006c00650020005000440046002000630072006500610074006500200070006f00740020006600690020006400650073006300680069007300650020006300750020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e00300020015f00690020007600650072007300690075006e0069006c006500200075006c0074006500720069006f006100720065002e> /RUS <FEFF04180441043f043e043b044c04370443043904420435002004340430043d043d044b04350020043d0430044104420440043e0439043a043800200434043b044f00200441043e043704340430043d0438044f00200434043e043a0443043c0435043d0442043e0432002000410064006f006200650020005000440046002c0020043c0430043a04410438043c0430043b044c043d043e0020043f043e04340445043e0434044f04490438044500200434043b044f00200432044b0441043e043a043e043a0430044704350441044204320435043d043d043e0433043e00200434043e043f0435044704300442043d043e0433043e00200432044b0432043e04340430002e002000200421043e043704340430043d043d044b04350020005000440046002d0434043e043a0443043c0435043d0442044b0020043c043e0436043d043e0020043e0442043a0440044b043204300442044c002004410020043f043e043c043e0449044c044e0020004100630072006f00620061007400200438002000410064006f00620065002000520065006100640065007200200035002e00300020043800200431043e043b043504350020043f043e04370434043d043804450020043204350440044104380439002e> /SKY <FEFF0054006900650074006f0020006e006100730074006100760065006e0069006100200070006f0075017e0069007400650020006e00610020007600790074007600e100720061006e0069006500200064006f006b0075006d0065006e0074006f0076002000410064006f006200650020005000440046002c0020006b0074006f007200e90020007300610020006e0061006a006c0065007001610069006500200068006f0064006900610020006e00610020006b00760061006c00690074006e00fa00200074006c0061010d00200061002000700072006500700072006500730073002e00200056007900740076006f00720065006e00e900200064006f006b0075006d0065006e007400790020005000440046002000620075006400650020006d006f017e006e00e90020006f00740076006f00720069016500200076002000700072006f006700720061006d006f006300680020004100630072006f00620061007400200061002000410064006f00620065002000520065006100640065007200200035002e0030002000610020006e006f0076016100ed00630068002e> /SLV <FEFF005400650020006e006100730074006100760069007400760065002000750070006f0072006100620069007400650020007a00610020007500730074007600610072006a0061006e006a006500200064006f006b0075006d0065006e0074006f0076002000410064006f006200650020005000440046002c0020006b006900200073006f0020006e0061006a007000720069006d00650072006e0065006a016100690020007a00610020006b0061006b006f0076006f00730074006e006f0020007400690073006b0061006e006a00650020007300200070007200690070007200610076006f0020006e00610020007400690073006b002e00200020005500730074007600610072006a0065006e006500200064006f006b0075006d0065006e0074006500200050004400460020006a00650020006d006f0067006f010d00650020006f0064007000720065007400690020007a0020004100630072006f00620061007400200069006e002000410064006f00620065002000520065006100640065007200200035002e003000200069006e0020006e006f00760065006a01610069006d002e> /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f00740020006c00e400680069006e006e00e4002000760061006100740069007600610061006e0020007000610069006e006100740075006b00730065006e002000760061006c006d0069007300740065006c00750074007900f6006800f6006e00200073006f00700069007600690061002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e> /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d002000e400720020006c00e4006d0070006c0069006700610020006600f60072002000700072006500700072006500730073002d007500740073006b00720069006600740020006d006500640020006800f600670020006b00760061006c0069007400650074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e> /TUR <FEFF005900fc006b00730065006b0020006b0061006c006900740065006c0069002000f6006e002000790061007a006401310072006d00610020006200610073006b013100730131006e006100200065006e0020006900790069002000750079006100620069006c006500630065006b002000410064006f006200650020005000440046002000620065006c00670065006c0065007200690020006f006c0075015f007400750072006d0061006b0020006900e70069006e00200062007500200061007900610072006c0061007201310020006b0075006c006c0061006e0131006e002e00200020004f006c0075015f0074007500720075006c0061006e0020005000440046002000620065006c00670065006c0065007200690020004100630072006f006200610074002000760065002000410064006f00620065002000520065006100640065007200200035002e003000200076006500200073006f006e0072006100730131006e00640061006b00690020007300fc007200fc006d006c00650072006c00650020006100e70131006c006100620069006c00690072002e> /UKR <FEFF04120438043a043e0440043804410442043e043204430439044204350020044604560020043f043004400430043c043504420440043800200434043b044f0020044104420432043e04400435043d043d044f00200434043e043a0443043c0435043d044204560432002000410064006f006200650020005000440046002c0020044f043a04560020043d04300439043a04400430044904350020043f045604340445043e0434044f0442044c00200434043b044f0020043204380441043e043a043e044f043a04560441043d043e0433043e0020043f0435044004350434043404400443043a043e0432043e0433043e0020043404400443043a0443002e00200020042104420432043e04400435043d045600200434043e043a0443043c0435043d0442043800200050004400460020043c043e0436043d04300020043204560434043a0440043804420438002004430020004100630072006f006200610074002004420430002000410064006f00620065002000520065006100640065007200200035002e0030002004300431043e0020043f04560437043d04560448043e04570020043204350440044104560457002e> /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing. Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.) >> /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ << /AsReaderSpreads false /CropImagesToFrames true /ErrorControl /WarnAndContinue /FlattenerIgnoreSpreadOverrides false /IncludeGuidesGrids false /IncludeNonPrinting false /IncludeSlug false /Namespace [ (Adobe) (InDesign) (4.0) ] /OmitPlacedBitmaps false /OmitPlacedEPS false /OmitPlacedPDF false /SimulateOverprint /Legacy >> << /AddBleedMarks false /AddColorBars false /AddCropMarks false /AddPageInfo false /AddRegMarks false /ConvertColors /ConvertToCMYK /DestinationProfileName () /DestinationProfileSelector /DocumentCMYK /Downsample16BitImages true /FlattenerPreset << /PresetSelector /MediumResolution >> /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles false /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /DocumentCMYK /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ] >> setdistillerparams << /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice