Term paper report

profileVarshini
ExampleforWorkInProgressReportII.pdf

WIP Report 2 for Term Paper

Paper Title: A Comparison of different Classification algorithms for DDoS Attack Detection

Abstract: Distributed Denial of Service aims exhausting the target network with malicious traffic which is a threat to the availability of the Service. Many detection systems, specifically Intrusion Detection System (IDS) have been proposed throughout the last two decades as the internet evolved, although users and organizations find it continuously challenging and defeated while dealing with Distributed Denial of Service (DDoS). Though, IDS is the first point of defense for protecting critical networks against ever evolving issues of intrusive activities, however it should be up to date all the time to detect any anomalous behavior so that integrity, confidentiality and availability of the service can be preserved. But, the accuracy and precision of new detection methods, techniques, algorithms heavily rely on the existence of well-designed datasets. Hence, for this paper a new and recent dataset has been taken to perform analysis which is provided by the Canadian Institute of Cybersecurity distributed on website of University of New Brunswick. Experimentation will be carried out using major supervised classification algorithms to classify the attack accurately from the legitimate flows. Here, I have tried to cover different DDoS attack at different layers and which Machine Learning algorithm is best fit to detect that type of attack.

Term Paper Objective:

Most of the research papers published in conferences and generals have used old dataset (KDD Cup ’99, DARPA) for doing their analysis which I feel less impactful because as the time passes, the cybercrimes and attacks are taken place in an artful way to intrude the target environment. So, doing analysis on recent dataset which has all the variety of novel attack signatures, is much better when the security is concerned. Therefore, here I will be using the CICDDoS2019 dataset to do the analysis. The core objective of this paper is doing data analysis with the most recent dataset specifically of DDoS attack and comparison between different classification algorithms which will be used in the conducting the analysis so that it helps us to reduce the False Positives with highest accuracy which will help any security administrator to get notified in real time that eventually helps in betterment of organization’s service availability- production system Uptime, as well as the reputation.

Plan for Methodology:

To proceed with my research work, I have found out few point/topics on which I need to focus and put more efforts,

1) The dataset which I am going to use has only attack data in different csv files with its DDoS type. I need to combine all the CSV data with some benign data (I need to find out this type of dataset) so as I have a proper dataset for the classification algorithms to test. I might sample those data otherwise that would be too large to do analysis in my laptop.

2) Once the dataset is ready, I need to decide which machine learning algorithms is good fit to do analysis as per the given variety of data. I might choose 5-6 ML algorithms, and whichever is giving good accuracy, I will choose top 4 among them to show in my term paper. I have gone through few research papers where they have done analysis using data mining techniques, that would help me.

3) After the dataset is ready and ML algorithms decided, I will do analysis in python using machine learning library scikit-learn and by modifying algorithm parameters I will try to get maximum accuracy.

4) After having statistical data, I need to plot it in graphical format using matplotlib library so that I can use those graphs and diagrams in my term paper to explain the results.

5) Finally, I need to document everything using latex for the term paper, this will download latex, or might use online latex template for writing research paper.

Plan of Work:

I plan to follow the schedule outlined below to ensure that I stay on task and on point with my research and analysis in order to meet the required deadlines. I have five weeks before my presentation on 21st of April 2020. Here is my rough plan…

Week 1 – I need to find some benign data which can be merged with attack flows. Final dataset preparation, data preprocessing.

Week 2 – after that I need to read few more research papers to figure out which classification algorithms are good for DDoS attack detection, which ML algorithms are overlooked, and which one are used the most. On the bases of that I need to choose 5-6 classification algorithms to perform analysis

Week 3 – Need to setup anaconda distribution for python and will start playing around with the dataset to figure out how I can accurately dig into it.

Week 4 – Continue with data mining process, will get the results, noting it down, creating graphs and diagrams for presentation as well as term paper.

Week 5 – Analyze results, create conclusions and explore options for future research. Finalize paper and submit for grading

References >>>

[1] A Survey of Network-based Intrusion Detection (Link: https://arxiv.org/pdf/1903.02460.pdf)

Explanation:

In this paper author has worked on literature survey of different data sets for network-based intrusion detection and describes the underlying packet and flow-based network data in detail. This helped me to understand that traffic packet variation also need to be considered while analysis will be done.

[2] Low-Rate DDoS Attack Detection Based on Factorization Machine in Software Defined Network (IEEEXplore Link: https://proxy.ulib.csuohio.edu:2443/document/8962081)

Explanation:

Currently most of the existing detection methods are effective for high-rate DDoS attack detection of the control layer, while a low-rate DDoS attack against the SDN data layer is unexplored, and even the detection accuracy against this kind of attack is low. They provided solution on Factorization Machine. This paper is not much help to me.

[3] Detection of Denial-of-Service Attacks Based on Computer Vision Techniques (IEEEXplore Link: https://proxy.ulib.csuohio.edu:2443/document/6967763)

Explanation:

Here the proposed system takes images of traffic records and considers the detection of DoS Attack as computer vision problem They have developed a detection system based on dissimilarity measure Earth Mover’s Distance. This helped me to learn that ML algorithms which uses distance measure can be used for my analysis as well.

[4] Efficient Distributed Denial-of-Service Attack Defense in SDN-Based Cloud (IEEEXplore Link: https://proxy.ulib.csuohio.edu:2443/document/8630919)

Explanation:

This paper made me understand the what is Software defined cloud network. What are the common threats and why we need to worry. Here initially they have used SVM algorithms and also implemented $eHIPF$ scheme to make hybrid machine learning model which I don’t understand. But SVM can be used for my analysis.

[5] Semi-supervised machine learning approach for DDoS detection (ACM Link: https://dl.acm.org/doi/10.1007/s10489-018-1141-2)

Explanation:

Here in this paper they have used an online sequential semi-supervised ML approach for DDoS detection based on network Entropy estimation, Co-clustering, Information Gain which gives accuracy up to 98% reduces the false negatives. This is very much helpful to me.

[6] Similarity-Based Instance Transfer Learning for Botnet Detection (Link: https://infonomics- society.org/wp-content/uploads/ijicr/published-papers/volume-9-2018/Similarity-Based-Instance- Transfer-Learning-for-Botnet-Detection.pdf)

Explanation:

One way to do DDoS attack is via Botnet. So, if we could detect Botnet then we could prevent it. By their analysis and work, they concluded that predictive performance can be improved by using transfer learning across datasets containing network traffic from different Botnets.

[7] DDoS Evaluation Dataset (CICDDoS2019) (Link: https://www.unb.ca/cic/datasets/ddos- 2019.html)

Explanation:

Here is the dataset explanation. How they have generated dataset, at what time they have done which DDoS attack, even they have done dimensionality reduction using CICFloeMeter-V3 tool. How one can use this data. The architecture of the infrastructure setup for better understanding.

[8] Detecting Distributed Denial of Service Attacks Using Data Mining Techniques (Link: https://thesai.org/Downloads/Volume7No1/Paper_59- Detecting_Distributed_Denial_of_Service_Attacks_Using_Data_Mining_Techniques.pdf)

Explanation:

This research paper is extremely helpful to me because they have used Multilayer Perceptron (MLP), Naive Bayes and Random Forest for their classification on the completely new dataset which had been generated by them in controlled environment.

[9] Preventing DDoS attack using Data mining Algorithms (Link: http://www.ijsrp.org/research- paper-1016/ijsrp-p5857.pdf)

Explanation:

This paper is a comprehensive survey of preventing DDOS attack using data mining techniques. They have used k-Nearest Neighbors algorithm (KNN), support vector machines (SVM), Random Forest as well as Naïve Base for their analysis. This also helped me as my term paper very much related to this.

[10] A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems

(Link: https://www.researchgate.net/publication/329045441_A_detailed_analysis_of_CICIDS2017_dataset _for_designing_Intrusion_Detection_Systems)

Explanation:

This paper explores the detailed characteristics of CICIDS2017 dataset and outlines issues inherent to it. This is not directly related to my work but gave me some insight of how CISDDoD2019 data would be and which caution I need to take before starting the analysis.