Paper on SCADA

blackpanther1
Securityofelectric.pdf

1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   60   61   62   63   64   65  

A triple layer intrusion detection system for SCADA

security of electric utility Integrating Machine Learning and a What-if module

Rishabh Samdarshi, Nidul Sinha,Senior Member

Department of Electrical Engineering

National Institute of Technology

Silchar, India

rishi.sam2008@gmail.com, nidulsinha@hotmail.com

Paritosh Tripathi

PricewaterhouseCoopers

Bangalore, India

paritosh.tripathi@in.pwc.com

Abstract— World is talking about connecting everything to

the internet. Electric grids are no exception, and are rather one

of the first application areas of the proposal. This connection

with the internet has raised concerns about the inherent

supervisory control and data acquisition (SCADA) systems,

whose structure too is adapting with the upcoming needs. The

security concerns also include other vulnerabilities, which are not

caused due to the internet connectivity, but rather due to some

disgruntled employee or social engineering. The paper

incorporates the intrusion detection systems (IDSs) created

worldwide and proposes an integrated triple layer module. This

includes a novel “What-if” module based layer to predict the

malicious intent of a command signal. This module is also

expected to predict some impending failure of the system. The

other two layers include one communication protection layer and

one data authentication layer for field devices. We eventually

clarify why all the IDS being proposed worldwide shall benefit if

these work in tandem amongst themselves and moreover in

tandem with the proposed “What-if” module.

Keywords— SCADA, SCADA Security, Intrusion Detection

I. INTRODUCTION

The security aspect of Supervisory Control and Data Acquisition (SCADA), off-late, has been a matter of concern in the relevant engineering circles. With the Internet of Things (IoT) on the verge of being the integral part of society, the SCADA networks do not remain untouched. With them being a part of internet, the vulnerabilities increase manifold. Alongside, there exists the typical problem of a threat to SCADA systems, which may not be caused due to connectivity to internet but rather would be created by some social engineering or a man in the middle. SCADA being a part of the most critical infrastructure of any country, an attack on or intrusion into it is a direct threat to the national security and economy of the country. According to the ICS-CERT (Industrial Control System-Cyber Emergency Response Team) report published in 2015, the energy sector was the most “attacked” sector with 32% of the attacks targeted over it [1].

In an electrical grid, the SCADA system can basically be divided into 3 parts on the basis of its operational features. The “Field”, which generally consists of the sensors and actuators which keep reporting the status of machinery under concern

and subsequently control it according to the commands received from the upper layers. The “Remote Station” consists of the Remote Terminal Units (RTUs) and Intelligent Electronic Devices(IEDs). It collects the raw data from both the sensors as well as the control centre, and processes it. And finally, the “Control Centre” consisting of the Main Telemetry Unit(MTU) and the SCADA server, which receive the data from the previous layers, run the controlling algorithms or receive instructions from a human expert. The data and control signals are sent through all these layers and hence it is always a bidirectional flow of information to and fro the “Field” and “Control Centre” layers.

Owing to the inherent smartness, SCADA systems now have started using the standard TCP/IP based internet protocols due to higher interconnectivity and the associated need of parsing of more data. Hence, the vulnerabilities are quite similar to the ones faced by the computers connected to the internet. Brigham and Rowe [2] mention a multitude of methods which have been used across the world to attack the SCADA systems. These not only include the ones carried out by hackers or by social engineering, but also by a disgruntled employee tampering with the system. Zhu and Sastry [3], have classified the problem faced by the SCADA system into faults, anomaly, misuse, noise, detection, false positive and false negative. People across the world have tried to develop Intrusion detection systems (IDSs). Yasakethu and Jiang [4] have proposed and compared variety of Machine Learning algorithms for the applications in developing IDSs. Yang et al. [5] propose an intrusion detection system on the basis of the rule based approaches on IEC 60870-5-104. Amin et al. [6] propose an observer based intrusion detection mechanism for Water SCADA systems using hydrodynamic models. Pan et al [7] have proposed a data-mining based IDS.

An analysis into the works going on across the world brings us to following three apparent interfaces prone to vulnerabilities:

1. The communication network, which basically is the most exposed entity and is penetrable from remote locations.

2. The instruments used for the purposes of sensing/actuating, transmitting-receiving and control.

IEEE INDICON 2015 1570185491

1

Time taken to build model: 54 seconds

=== Evaluation on test split ===

=== Summary ==

Correctly Classified Instances 167899 99.9595 %

Incorrectly Classified Instances 68 0.0405 %

Kappa statistic 0.9988

Total Number of Instances 167967

Though these are generally present within the secured and restricted regions of the utility, but social engineering or a disgruntled employee responsible for their operation are still a means of tampering with the system. There is also a possibility of getting already tampered instruments from the side of instrument manufacturers.

3. The Human Machine Interfaces (HMIs), which are used as the control centres, may transmit malignant controls, which may further push the system towards instability. This may happen due to an attack on previous layers, leading to faulty information being transmitted to the control centre or an intentional command achieved through social engineering.

The present work proposes a triple layer IDS module to check all these vulnerabilities.

II. TRIPLE LAYER INTRUSION DETECTION SYSTEM

Fig. 1. Information flowin SCADA and the positioning of the proposed IDS layers

The vulnerabilities as mentioned in the previous section form the foundation of the present work. The thought process behind the proposed IDS is the fact that no single layer ensures a complete security of the SCADA system. Fig. 1, showcases the information flow in the typical SCADA system. The remote control centre gets information from the field through a communication network which depending upon certain constraints follows different protocols. The first layer has a role of detecting intrusions in this part. Once the commands are transferred to the next layer, we need to check the authenticity of the commands to detect whether it is malignant for the system and may potentially push the system towards instability. This is where the second layer comes into play. These commands eventually reach the RTUs and IEDs, which further transmit them to the actuators. The IEDs along with sensors give the parametric values back to the control centre. In the network, all these values coming from different locations in the field (and hence, from different IEDs and RTUs) must have an underlying correlation. If an intruder has attacked one of the instruments, or the sequence of events happening are a signature attack scenario, then these need to be detected and

reported ASAP. The third layer takes these things into account. The following sections give a detailed insight on our approach of building these layers.

A. LAYER I (Communication Network Protection)

Knowledge Discovery and Data Mining (KDD) challenge of 1999 first time had proposed a problem data set for anomaly detection in TCP/IP parameters (due to intrusion) produced by DARPA and built by Stofo et al[8]. This dataset has been analyzed and researched upon for more than decade now. Tavallaee et al [9] give an exhaustive description on the dataset and all the offshoot datasets which were produced after them. KDD training dataset consists of approximately 4,900,000 single connection vectors each of which contains 41 features and is labeled into 23 attack scenarios. Tavallaee et al[9] have aggregated these into Denial-of-Service (DoS) attack, User-to- Root (U2R) attack, Remote-to-Local(R2L) attack and Probing attack. At the pre-processing stage of the data, the 23 labels were downsized into the aforementioned categories, and the Machine Learning algorithms were tested over them.

WEKA, an open source software for machine learning, was used to check the applicability of machine learning Algorithms. The explorer module of the software was used. The experiments were initially carried out on PCs. Unfortunately, the dataset was too large for it and it was taking a bit too much time. This prompted the use of NIT-Silchar High Performance Computing centre. The HPC had 1.088 TB (1088 GB) of Total Memory (64 GB per Node) and 34 number of Intel® Xeon® Ivy bridge (E5-2650V2) processors WEKA 3.6.11 being the concerned software was uploaded at the HPC. MOBAXTERM v_7.4 was used for remote computing.

The experiment was done on 10% of the dataset (with approx 490,000 instances). The tree based J48 was used for the classification. The data was split into 1:2 :: Test: Train.

Fig. 2. Results with J48 on 10% KDD dataset using WEKA 3.6.11

2

Time taken to build model: 447.54 seconds

=== Evaluation on test split ===

=== Summary ===

Correctly Classified Instances 167933 99.9798 %

Incorrectly Classified Instances 34 0.0202 %

Kappa statistic 0.9994

Total Number of Instances 167967

The results are represented in Fig. 2, in the WEKA output format. The results remained almost constant through several randomizations as well. Everything remaining the same, the experiment was repeated using Random Forest algorithm. The results were even better than the J48. The U2R attacks showed a better result in comparison to the J48. The time taken to build the model, quite obviously, was a bit more than J48.

Fig. 3. Results with Random Forest on 10% KDD dataset using WEKA 3.6.11

B. LAYER II (Command and state authentication)

The authenticity of commands sent from the control centre is of prime importance. The tampering of data by the intruder might take place in between the RTU/IED-MTU/CONTROL CENTRE exchanges. On the basis of this tampered data, the decision making algorithms might push the system toward instability. For Example, If an intruder changes the frequency data of the generator from 50.02Hz (actual) to 49.97 Hz(tampered), the decision making algorithms would act according to the 49.97 Hz data and try to command the AGC (Automatic Generator Control) to increase the steam/stream flow accordingly. This would further increase the generator speed which already was close to its maximum limit and hence eventually would damage the system.

The What-if module would give us an answer to the question “What would happen, if the system is pushed to the state as commanded by the control signals from MTU?” It would report the control station in case of any suspicious control command. The fact that these things would be exactly similar to each other makes it possible to test the eventual state of the system. The Fig.4 shows the information flow in the What-if Module.

The What-If module would be an exact copy of the physical electric grid under concern. This may be done on a Super Computer or a Real Time Digital Simulator (RTDS). The What-if module is expected to receive the data from the field, and feed it into its model and then pre-process the

commands received by the control centre. This shall not only help us in checking the authenticity of commands, but rather also be warning us against the natural disturbances, which may have a cascading effect, leading to a complete grid failure.

Fig. 4. The What-if module

There exist a few constraints for this module like a very prompt and high-speed decision making due to the nature of critical infrastructure. The advances in high performance computing and the decrease in complexity of state estimation algorithms in the power system models due to the development of Phasor Measurement Units (PMUs), are highly prudent arguments to back the pragmatic application of this module.

C. LAYER III (Data authentication at field)

This layer basically shall be authenticating the data coming from the sensors through the IEDs, prior to it entering the control centre. One major drawback of the independent working of LAYER II is the fact that it is dumb towards the difference between an attack and a natural disturbance scenario. The third layer shall help us remove this. It would use machine learning algorithms to classify the data into disturbances, attacks and normal.

The creation of such a dataset was out of the scope of this work. Fortunately, a relevant dataset, very recently produced and simulated by Oak Ridge National Laboratory and Mississippi State University (ORNL-MSU) [10] in collaboration gave us an opportunity to proceed. The dataset used was one of the fifteen datasets provided, each having two labels for disturbance and normal aggregate together as “Natural” and “Attack”. The test was done using 10 fold cross validation.

The work yet again was carried out using WEKA, but this time we used the experimenter module of the software to compare the initial accuracies. The algorithms put under consideration were probability distribution based Naïve Bayes, decision trees based J48, Random Forest and Adaptive boosting combined with JRipper. As proposed by Hink et al[11] , the JRipper with the adaptive boosting was found out to have the best results. The continuous pruning and the remodeling with adaptation

CONTROL

CENTRE

RTUs/IEDs

HMI

SENSORS ACTUATORS

WHAT-IF

MODULE

STABILITY

3

Tester: weka.experiment.PairedCorrectedTTester

Analysing: Percent_correct

Datasets: 1

Resultsets: 4

Confidence: 0.05 (two tailed)

Dataset (1) bayes.Na | (2) trees (3) trees (4)

meta.

----------------------------------------------------------------------

data10(Revised) (100) 33.88 | 70.90 v 75.68 v 94.01 v

----------------------------------------------------------------------

Key:

(1) bayes.NaiveBayes '' 5995231201785697655

(2) trees.J48 '-C 0.25 -M 2' -217733168393644444

(3) trees.RandomForest '-I 100 -K 0 -S 1' -

2260823972777004705

(4) meta.AdaBoostM1 '-P 100 -S 1 -I 10 -W rules.JRip -- -F 3 -N

2.0 -O 2 -S 1' -7378107808933117974

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 5239 94.0743 %

Incorrectly Classified Instances 330 5.9257 %

Kappa statistic 0.8555

Total Number of Instances 5569

=== Confusion Matrix ===

a b <-- classified as

1438 210 | a = Natural

120 3801 | b = Attack

FPGrowth found 5 rules (displaying top 5)

Using only transactions that contain: first-last

Showing only rules that contain: marker

1. [R1:S=0_binarized=1, marker=Attack]: 3752 ==>

[R4:S=0_binarized=1]: 3709 <conf:(0.99)>

2. [marker=Attack]: 3866 ==> [R4:S=0_binarized=1]:

3811 <conf:(0.99)>

3. [R4:S=0_binarized=1, marker=Attack]: 3811 ==>

[R1:S=0_binarized=1]: 3709 <conf:(0.97)>

4. [marker=Attack]: 3866 ==> [R1:S=0_binarized=1]:

3752 <conf:(0.97)>

proved to be great for the dataset in comparison to the other. Fig. 5 shows the results of the

Fig. 5. Results with on 10th Powersystem ORNL-MSU binary dataset using Experimenter of WEKA 3.6.11

WEKA experimenter. The dataset was further tested on WEKA explorer for a detailed analysis. The results were as shown in Fig. 6. The testing was done with a 10 fold cross validation.

Fig. 6. Results with Adaboost+JRipper

The normal classifier algorithms did the work well. However, there has recently been a proposal of common path mining by Pan et al [7]. They propose a method of classifying the observed events on the basis of system states in certain temporal order.

This motivated us to use the classical data mining

approaches of frequent item-set mining. We tried extracting

rules out of the given training sets using the famous Apriori

and FP-Growth algorithms. However, one challenge here was

converting the numeric attributes into nominal. The previously

proposed method, including the one by Pan et al [7] is an

expert based segregation into highs and lows, especially in

case of power systems. This somehow might not be as fruitful

as it merely would give a binary attribute. This shall work fine

with the algorithms like FP Growth, which require the

attributes to be binary. However, a finer segregation may give

us a better result, especially with Apriori and clustering based

approaches. All the numeric attributes of the dataset including

the voltages, phase angles, frequencies, appearance impedance

for relays and appearance impedance angles for relays were all

divided into four categories on the basis of quartiles. The top

25% belonged to the first quartile, next 25% to second and so

on. The FP Growth algorithm yielded the results as shown in

Fig.7. The “Marker”, i.e. the column of labels was made a

Fig. 7. Results with FP Growth on ORNL-MSU with nominalized attributes

mandatory part for finding out rules. The best 5 rules, along

with confidence levels have been displayed in the WEKA

format. Fig. 8 shows the results with the Apriori. The

confidence levels have been indicated, with the states which

occurred together for the highest number of times. The option

of making the “Marker” compulsory was not available with

this algorithm in the software. Fig. 8 shows the results with the

Apriori algorithm. Though the qualitative study of the same is

not yet available, we at this point of time propose that , if and

when , a rule found out using the frequent item-set mining is

defaulted by a constituent item, we might have some abnormal

activity occurring. Secondly, if and when, a set of items occur,

which form a frequent item-set with the item with marker

“Attack”, we can declare the scenario to be an impending

attack. The confidence levels (i.e. the number of items

matching the frequent item-set with the “Attack” label) may

be optimized accordingly.

4

Best rules found:

1. control_panel_log2=0 4966 ==> control_panel_log1=0

4966 conf:(1)

2. control_panel_log1=0 4966 ==> control_panel_log2=0

4966 conf:(1)

3. control_panel_log3=0 4966 ==> control_panel_log1=0

4966 conf:(1)

4. control_panel_log1=0 4966 ==> control_panel_log3=0

4966 conf:(1)

5. control_panel_log4=0 4966 ==> control_panel_log1=0

4966 conf:(1)

Fig. 8. Results for Apriori on ORNL-MSU Nominalized dataset with WEKA 3.6.11

III. THE EFFECTIVENESS OF TRIPLE LAYER IDS

The works across the world have targeted the IDS in communication layer. There has been a recent trend in the researches on developing IDS with works in power system field data as has been mentioned in previous sections. However, we for the first time propose a triple layer module, with each layer working in tandem with the other.

The functioning of a triple layer module shall solve the following issues:

• In case, an intruder is able to breech through the first layer of security (i.e. the communication network), there still stands a chance of the intrusion detection in the next layers. For example, supposedly if the intruder tampers the controlling signal or even takes physical or remote command of the control centre, any malignant command produced due to this intrusion can still be checked in the second layer (the proposed What-if module).

• In case the intruder tampers the data coming from field (predominantly from sensors), the authenticity of this data can be checked by the third layer. This shall assure the proper functioning of the system as well as that of the What-if module in the second layer as its functioning depends on this data.

• There is always a scope of a false alarm being raised by the second layer, if it gets a set of data from the field, which might just be a natural disturbance, though a bit off the limits for a few moments. Electrical grids are generally tolerant to these small disturbances. The third layer in such a case shall be able to differentiate a natural event from an attack.

Acknowledgment

We would like to thank Mr. Saurabh Garg and Mr. Prateek Singh of C-DAC Mumbai and Mr. Souvik Paul of High Performance Computing Centre at NIT Silchar.

References

[1] US Department of Homeland Security, ICS-CERT Monitor, National Cybrsecuity and Communications Integration Centre , September 2014- Feb 2015.

[2] Bill Miller Brigham and Dale C. Rowe ,A Survey of SCADA and Critical Infrastructure Incidents,SIGITE’12, ACM, , Calgary, Alberta, Canada, October 11–13, 2012.

[3] B. Zhu, A. Joseph, and S. Sastry, “A taxonomy of cyber attacks on SCADA Systems”,Proceedings of The 2011 IEEE International Conference on Internet of Things (iThings’11), pp. 380-388, 2011.

[4] S.L.P. Yasakethu and J. Jiang , “Intrusion Detection via Machine Learning for SCADA System Protection”, BCS Learning and Development Ltd.,Proceedings of the 1st International Symposium for ICS & SCADA Cyber Security Research, 2013.

[5] Y. Yang et al. , Intrusion Detection System for IEC 60870-5-104 based SCADA networks, IEEE Power and Energy Society General Meeting (PES), 21-25 July 2013.

[6] S. Amin, X. Litrico, S. S. Sastry and A. M. Bayen, "Cyber security of water SCADA systems-Part II: Attack detection using enhanced hydrodynamic models", IEEE Trans. Control Syst. Technol., vol. 21, no. 5, pp.1679 -1693 2013.

[7] Shenghi Pan, T. Morris, U. Adhikary, “Classification of Disturbances and Cyber-attacks in Power Systems Using Heterogeneous Time- synchronized Data”, IEEE Trans. on Industrial Informatics.,vol. 11, no.3, pp 650-662, June 2015.

[8] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. K. Chan, “Costbased modeling for fraud and intrusion detection: Results from the jam project,” discex, vol. 02, p. 1130, 2000.

[9] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani ,”A Detailed Analysis of the KDD CUP 99 Data Set”, Proceedings of IEEE CISDA-2009.

[10] R.C.B. Hink, J.M.Beaver, M.A.Buckner, T. Morris, U. Adhikari Shengyi Pan, “Machine learning for power system disturbance and cyber-attack discrimination”, 7th International Symposium on Resilient Control Systems (ISRCS),Denver, Colorado , USA, 21st -23rd August, 2014.

5