Need two Big Data Query & Analysis using Spark SQL, three queries Advanced Analytics using PySpark

Sample.html

Home >Education homework help >Need two Big Data Query & Analysis using Spark SQL, three queries Advanced Analytics using PySpark

Big Data Analytics [CN7031] CRWK 2020-21¶

Group ID: [115]¶

Student 1: Pramod Kumar Gouda u2002425
Student 2: Virender Yadav u2002208
Student 3: Nishitha Angali u2001782
Student 4: Meetkumar Rasikbhai Patel u2001677
Student 5: Maulik Bhikhabhai Padhiyar u2002324

If you want to add comments on your group work, please write it here for us:

Initiate and Configure Spark¶

In [1]:

!sudo apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz
!tar xf spark-3.0.1-bin-hadoop3.2.tgz
!pip install -q findspark

Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:4 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:9 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:10 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ Packages [40.7 kB]
Get:11 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1,372 kB]
Get:12 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease [21.3 kB]
Get:13 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:14 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [237 kB]
Get:15 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [1,814 kB]
Get:16 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [15.3 kB]
Get:19 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main Sources [1,699 kB]
Get:20 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2,136 kB]
Get:21 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [2,243 kB]
Get:22 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [266 kB]
Get:23 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [53.8 kB]
Get:24 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main amd64 Packages [870 kB]
Get:25 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic/main amd64 Packages [46.5 kB]
Fetched 11.1 MB in 3s (3,933 kB/s)
Reading package lists... Done

In [2]:

# Using operating system dependent functionality to read or write a file 
import os
os.environ["JAVA_HOME"]="/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"]="/content/spark-3.0.1-bin-hadoop3.2"

import findspark
findspark.init()

In [3]:

# linking with SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").appName('Group115').getOrCreate()

# Note: If you want to work with RDD, you should use: "from pyspark import SparkContext, SparkConf"

Load Data¶

In [4]:

# Load Data
df1 = spark.read.load("IDS2018/02-14-2018.csv", format="csv", inferSchema=True, header=True)
df2 = spark.read.load("IDS2018/02-15-2018.csv", format="csv", inferSchema=True, header=True)
df3 = spark.read.load("IDS2018/02-16-2018.csv", format="csv", inferSchema=True, header=True)
df4 = spark.read.load("IDS2018/02-21-2018.csv", format="csv", inferSchema=True, header=True)
df5 = spark.read.load("IDS2018/02-22-2018.csv", format="csv", inferSchema=True, header=True)
df6 = spark.read.load("IDS2018/02-23-2018.csv", format="csv", inferSchema=True, header=True)
df7 = spark.read.load("IDS2018/02-28-2018.csv", format="csv", inferSchema=True, header=True)
df8 = spark.read.load("IDS2018/03-01-2018.csv", format="csv", inferSchema=True, header=True)
df9 = spark.read.load("IDS2018/03-02-2018.csv", format="csv", inferSchema=True, header=True)

In [5]:

from functools import reduce
from pyspark.sql import DataFrame

# Create a list of dataframes
dfs = [df1, df2, df3, df4, df5, df6, df7, df2, df9]

# Create a merged dataframe
IDS_df = reduce(DataFrame.unionAll, dfs)

In [ ]:

# Print DF to make sure it is working
IDS_df.show()

+--------+--------+-------------------+-------------+------------+------------+---------------+---------------+---------------+---------------+----------------+---------------+---------------+---------------+----------------+---------------+---------------+-------------+----------------+----------------+------------+------------+------------+----------------+----------------+-----------+-----------+-----------+----------------+----------------+-----------+-----------+-------------+-------------+-------------+-------------+--------------+--------------+-------------+------------+-----------+-----------+--------------+--------------+----------------+------------+------------+------------+------------+------------+------------+--------------+------------+-------------+--------------+----------------+----------------+--------------+--------------+----------------+--------------+--------------+----------------+----------------+----------------+----------------+----------------+-----------------+-----------------+-----------------+----------------+-----------+----------+----------+----------+------------+--------------+-----------+-----------+------+
|Dst Port|Protocol|          Timestamp|Flow Duration|Tot Fwd Pkts|Tot Bwd Pkts|TotLen Fwd Pkts|TotLen Bwd Pkts|Fwd Pkt Len Max|Fwd Pkt Len Min|Fwd Pkt Len Mean|Fwd Pkt Len Std|Bwd Pkt Len Max|Bwd Pkt Len Min|Bwd Pkt Len Mean|Bwd Pkt Len Std|    Flow Byts/s|  Flow Pkts/s|   Flow IAT Mean|    Flow IAT Std|Flow IAT Max|Flow IAT Min| Fwd IAT Tot|    Fwd IAT Mean|     Fwd IAT Std|Fwd IAT Max|Fwd IAT Min|Bwd IAT Tot|    Bwd IAT Mean|     Bwd IAT Std|Bwd IAT Max|Bwd IAT Min|Fwd PSH Flags|Bwd PSH Flags|Fwd URG Flags|Bwd URG Flags|Fwd Header Len|Bwd Header Len|   Fwd Pkts/s|  Bwd Pkts/s|Pkt Len Min|Pkt Len Max|  Pkt Len Mean|   Pkt Len Std|     Pkt Len Var|FIN Flag Cnt|SYN Flag Cnt|RST Flag Cnt|PSH Flag Cnt|ACK Flag Cnt|URG Flag Cnt|CWE Flag Count|ECE Flag Cnt|Down/Up Ratio|  Pkt Size Avg|Fwd Seg Size Avg|Bwd Seg Size Avg|Fwd Byts/b Avg|Fwd Pkts/b Avg|Fwd Blk Rate Avg|Bwd Byts/b Avg|Bwd Pkts/b Avg|Bwd Blk Rate Avg|Subflow Fwd Pkts|Subflow Fwd Byts|Subflow Bwd Pkts|Subflow Bwd Byts|Init Fwd Win Byts|Init Bwd Win Byts|Fwd Act Data Pkts|Fwd Seg Size Min|Active Mean|Active Std|Active Max|Active Min|   Idle Mean|      Idle Std|   Idle Max|   Idle Min| Label|
+--------+--------+-------------------+-------------+------------+------------+---------------+---------------+---------------+---------------+----------------+---------------+---------------+---------------+----------------+---------------+---------------+-------------+----------------+----------------+------------+------------+------------+----------------+----------------+-----------+-----------+-----------+----------------+----------------+-----------+-----------+-------------+-------------+-------------+-------------+--------------+--------------+-------------+------------+-----------+-----------+--------------+--------------+----------------+------------+------------+------------+------------+------------+------------+--------------+------------+-------------+--------------+----------------+----------------+--------------+--------------+----------------+--------------+--------------+----------------+----------------+----------------+----------------+----------------+-----------------+-----------------+-----------------+----------------+-----------+----------+----------+----------+------------+--------------+-----------+-----------+------+
|       0|       0|14/02/2018 08:31:01|    112641719|           3|           0|              0|            0.0|              0|              0|             0.0|            0.0|              0|              0|             0.0|            0.0|            0.0| 0.0266331163|    5.63208595E7|  139.3000358938| 5.6320958E7| 5.6320761E7|1.12641719E8|    5.63208595E7|  139.3000358938|5.6320958E7|5.6320761E7|        0.0|             0.0|             0.0|        0.0|        0.0|            0|            0|            0|            0|             0|             0| 0.0266331163|         0.0|          0|          0|           0.0|           0.0|             0.0|           0|           0|           0|           0|           0|           0|             0|           0|            0|           0.0|             0.0|             0.0|             0|             0|               0|             0|             0|               0|               3|               0|               0|               0|               -1|               -1|                0|               0|        0.0|       0.0|       0.0|       0.0|5.63208595E7|139.3000358938|5.6320958E7|5.6320761E7|Benign|
|       0|       0|14/02/2018 08:33:50|    112641466|           3|           0|              0|            0.0|              0|              0|             0.0|            0.0|              0|              0|             0.0|            0.0|            0.0| 0.0266331761|     5.6320733E7|  114.5512985522| 5.6320814E7| 5.6320652E7|1.12641466E8|     5.6320733E7|  114.5512985522|5.6320814E7|5.6320652E7|        0.0|             0.0|             0.0|        0.0|        0.0|            0|            0|            0|            0|             0|             0| 0.0266331761|         0.0|          0|          0|           0.0|           0.0|             0.0|           0|           0|           0|           0|           0|           0|             0|           0|            0|           0.0|             0.0|             0.0|             0|             0|               0|             0|             0|               0|               3|               0|               0|               0|               -1|               -1|                0|               0|        0.0|       0.0|       0.0|       0.0| 5.6320733E7|114.5512985522|5.6320814E7|5.6320652E7|Benign|
|       0|       0|14/02/2018 08:36:39|    112638623|           3|           0|              0|            0.0|              0|              0|             0.0|            0.0|              0|              0|             0.0|            0.0|            0.0| 0.0266338483|    5.63193115E7|  301.9345955667| 5.6319525E7| 5.6319098E7|1.12638623E8|    5.63193115E7|  301.9345955667|5.6319525E7|5.6319098E7|        0.0|             0.0|             0.0|        0.0|        0.0|            0|            0|            0|            0|             0|             0| 0.0266338483|         0.0|          0|          0|           0.0|           0.0|             0.0|           0|           0|           0|           0|           0|           0|             0|           0|            0|           0.0|             0.0|             0.0|             0|             0|               0|             0|             0|               0|               3|               0|               0|               0|               -1|               -1|                0|               0|        0.0|       0.0|       0.0|       0.0|5.63193115E7|301.9345955667|5.6319525E7|5.6319098E7|Benign|
|      22|       6|14/02/2018 08:40:13|      6453966|          15|          10|           1239|         2273.0|            744|              0|            82.6| 196.7412368715|            976|              0|           227.3| 371.6778922072| 544.1615279659| 3.8735871865|       268915.25|247443.778966007|    673900.0|        22.0|   6453966.0|460997.571428571|123109.423587757|   673900.0|   229740.0|  5637902.0|626433.555555556| 455082.21422401|  1167293.0|      554.0|            0|            0|            0|            0|           488|           328| 2.3241523119|1.5494348746|          0|        976|135.0769230769|277.8347599674|77192.1538461539|           0|           0|           0|           1|           0|           0|             0|           0|            0|        140.48|            82.6|           227.3|             0|             0|               0|             0|             0|               0|              15|            1239|              10|            2273|            65535|              233|                6|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      22|       6|14/02/2018 08:40:23|      8804066|          14|          11|           1143|         2209.0|            744|              0|   81.6428571429| 203.7455453568|            976|              0|  200.8181818182| 362.2498635422| 380.7331748762|  2.839597068|366836.083333333|511356.609732762|   1928102.0|        21.0|   8804066.0|677235.846153846|532416.970958985|  1928102.0|   246924.0|  7715481.0|        771548.1|755543.082716951|  2174893.0|       90.0|            0|            0|            0|            0|           456|           360| 1.5901743581|1.2494227099|          0|        976|128.9230769231|279.7630315931|78267.3538461539|           0|           0|           0|           1|           0|           0|             0|           0|            0|        134.08|   81.6428571429|  200.8181818182|             0|             0|               0|             0|             0|               0|              14|            1143|              11|            2209|             5808|              233|                6|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      22|       6|14/02/2018 08:40:31|      6989341|          16|          12|           1239|         2273.0|            744|              0|         77.4375| 190.8311535538|            976|              0|  189.4166666667| 347.6425694023| 502.4794183028| 4.0061001459|258864.481481481| 291724.14791076|    951098.0|        20.0|   6989341.0|465956.066666667|244363.896416351|   951098.0|   265831.0|  5980598.0|543690.727272727|460713.519752371|  1254338.0|       78.0|            0|            0|            0|            0|           332|           252| 2.2892000834|1.7169000625|          0|        976|121.1034482759|265.7086676402|70601.0960591133|           0|           0|           0|           1|           0|           0|             0|           0|            0|125.4285714286|         77.4375|  189.4166666667|             0|             0|               0|             0|             0|               0|              16|            1239|              12|            2273|             5808|              234|                7|              20|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|       0|       0|14/02/2018 08:39:28|    112640480|           3|           0|              0|            0.0|              0|              0|             0.0|            0.0|              0|              0|             0.0|            0.0|            0.0| 0.0266334092|      5.632024E7|  203.6467529817| 5.6320384E7| 5.6320096E7| 1.1264048E8|      5.632024E7|  203.6467529817|5.6320384E7|5.6320096E7|        0.0|             0.0|             0.0|        0.0|        0.0|            0|            0|            0|            0|             0|             0| 0.0266334092|         0.0|          0|          0|           0.0|           0.0|             0.0|           0|           0|           0|           0|           0|           0|             0|           0|            0|           0.0|             0.0|             0.0|             0|             0|               0|             0|             0|               0|               3|               0|               0|               0|               -1|               -1|                0|               0|        0.0|       0.0|       0.0|       0.0|  5.632024E7|203.6467529817|5.6320384E7|5.6320096E7|Benign|
|       0|       0|14/02/2018 08:42:17|    112641244|           3|           0|              0|            0.0|              0|              0|             0.0|            0.0|              0|              0|             0.0|            0.0|            0.0| 0.0266332286|     5.6320622E7|   62.2253967444| 5.6320666E7| 5.6320578E7|1.12641244E8|     5.6320622E7|   62.2253967444|5.6320666E7|5.6320578E7|        0.0|             0.0|             0.0|        0.0|        0.0|            0|            0|            0|            0|             0|             0| 0.0266332286|         0.0|          0|          0|           0.0|           0.0|             0.0|           0|           0|           0|           0|           0|           0|             0|           0|            0|           0.0|             0.0|             0.0|             0|             0|               0|             0|             0|               0|               3|               0|               0|               0|               -1|               -1|                0|               0|        0.0|       0.0|       0.0|       0.0| 5.6320622E7| 62.2253967444|5.6320666E7|5.6320578E7|Benign|
|      80|       6|14/02/2018 08:47:14|       476513|           5|           3|            211|          463.0|            211|              0|            42.2|  94.3620686505|            463|              0|  154.3333333333| 267.3131746348|1414.4419984345|16.7886290615|68073.2857142857|115865.792656438|    237711.0|        24.0|    476513.0|       119128.25|137379.963358017|   238470.0|      108.0|   238634.0|        119317.0|167621.076693833|   237843.0|      791.0|            0|            0|            0|            0|           168|           104|10.4928931635|6.2957358981|          0|        463| 74.8888888889|161.4058893322|26051.8611111111|           0|           0|           0|           1|           0|           0|             0|           0|            0|         84.25|            42.2|  154.3333333333|             0|             0|               0|             0|             0|               0|               5|             211|               3|             463|            14600|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:15|       475048|           5|           3|            220|          472.0|            220|              0|            44.0|    98.38699101|            472|              0|  157.3333333333| 272.5093270575|1456.6949024099|16.8404034961|         67864.0|115746.933154476|    237494.0|        15.0|    475048.0|        118762.0|137096.759626185|   237853.0|       15.0|   237516.0|        118758.0|167472.584269784|   237179.0|      337.0|            0|            0|            0|            0|           168|           104| 10.525252185| 6.315151311|          0|        472| 76.8888888889|165.0669897681|27247.1111111111|           0|           0|           0|           1|           0|           0|             0|           0|            0|          86.5|            44.0|  157.3333333333|             0|             0|               0|             0|             0|               0|               5|             220|               3|             472|            14600|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:15|       474926|           5|           3|            220|          472.0|            220|              0|            44.0|    98.38699101|            472|              0|  157.3333333333| 272.5093270575|1457.0691012916|16.8447294947|67846.5714285714|115645.740842248|    237162.0|        15.0|    474926.0|        118731.5|136923.365842601|   237497.0|       15.0|   237732.0|        118866.0|167663.503100705|   237422.0|      310.0|            0|            0|            0|            0|           168|           104|10.5279559342|6.3167735605|          0|        472| 76.8888888889|165.0669897681|27247.1111111111|           0|           0|           0|           1|           0|           0|             0|           0|            0|          86.5|            44.0|  157.3333333333|             0|             0|               0|             0|             0|               0|               5|             220|               3|             472|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:16|       477471|           5|           3|            209|          461.0|            209|              0|            41.8|  93.4676414595|            461|              0|  153.6666666667| 266.1584740964|1403.2265833946|16.7549442793|68210.1428571429|116178.228792989|    238389.0|        17.0|    477471.0|       119367.75|137516.508224467|   238515.0|      149.0|   238887.0|        119443.5|168454.755588852|   238559.0|      328.0|            0|            0|            0|            0|           168|           104|10.4718401746|6.2831041048|          0|        461| 74.4444444444|160.5942955954|25790.5277777778|           0|           0|           0|           1|           0|           0|             0|           0|            0|         83.75|            41.8|  153.6666666667|             0|             0|               0|             0|             0|               0|               5|             209|               3|             461|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:16|       512758|           5|           3|            211|          463.0|            211|              0|            42.2|  94.3620686505|            463|              0|  154.3333333333| 267.3131746348|1314.4602327024|15.6019018718|73251.1428571429|124959.473740394|    256188.0|        10.0|    512758.0|        128189.5|148006.106082373|   256523.0|       10.0|   256563.0|        128281.5|180935.190276795|   256222.0|      341.0|            0|            0|            0|            0|           168|           104| 9.7511886699|5.8507132019|          0|        463| 74.8888888889|161.4058893322|26051.8611111111|           0|           0|           0|           1|           0|           0|             0|           0|            0|         84.25|            42.2|  154.3333333333|             0|             0|               0|             0|             0|               0|               5|             211|               3|             463|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:17|       476711|           5|           3|            206|          458.0|            206|              0|            41.2|   92.126000673|            458|              0|  152.6666666667| 264.4264232888|1392.8774456641|16.7816559719|68101.5714285714|115977.215079597|    238034.0|        16.0|    476711.0|       119177.75| 137207.36560009|   238274.0|       78.0|   238033.0|        119016.5|168007.864103143|   237816.0|      217.0|            0|            0|            0|            0|           168|           104|10.4885349824|6.2931209894|          0|        458| 73.7777777778|159.3783060659|25401.4444444444|           0|           0|           0|           1|           0|           0|             0|           0|            0|          83.0|            41.2|  152.6666666667|             0|             0|               0|             0|             0|               0|               5|             206|               3|             458|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:17|       476616|           5|           3|            211|          463.0|            211|              0|            42.2|  94.3620686505|            463|              0|  154.3333333333| 267.3131746348|1414.1363277775|16.7850009232|         68088.0|115553.073301117|    237285.0|         8.0|    476616.0|        119154.0|136576.972942001|   237558.0|        8.0|   237660.0|        118830.0|167563.093937776|   237315.0|      345.0|            0|            0|            0|            0|           168|           104| 10.490625577|6.2943753462|          0|        463| 74.8888888889|161.4058893322|26051.8611111111|           0|           0|           0|           1|           0|           0|             0|           0|            0|         84.25|            42.2|  154.3333333333|             0|             0|               0|             0|             0|               0|               5|             211|               3|             463|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:18|       477161|           5|           3|            211|          463.0|            211|              0|            42.2|  94.3620686505|            463|              0|  154.3333333333| 267.3131746348|1412.5211406632|16.7658295628|68165.8571428572|116324.530541969|    238504.0|        12.0|    477161.0|       119290.25|137729.560342905|   238719.0|       12.0|   238618.0|        119309.0|168449.805841384|   238421.0|      197.0|            0|            0|            0|            0|           168|           104|10.4786434767| 6.287186086|          0|        463| 74.8888888889|161.4058893322|26051.8611111111|           0|           0|           0|           1|           0|           0|             0|           0|            0|         84.25|            42.2|  154.3333333333|             0|             0|               0|             0|             0|               0|               5|             211|               3|             463|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:18|       474670|           5|           3|            214|          466.0|            214|              0|            42.8|   95.703709437|            466|              0|  155.3333333333| 269.0452254424|1432.5742094508|16.8538142288|         67810.0|115371.262364883|    236717.0|        15.0|    474670.0|        118667.5|136423.221362787|   236894.0|       55.0|   236992.0|        118496.0| 167297.22178805|   236793.0|      199.0|            0|            0|            0|            0|           168|           104| 10.533633893|6.3201803358|          0|        466| 75.5555555556|162.6246530443|26446.7777777778|           0|           0|           0|           1|           0|           0|             0|           0|            0|          85.0|            42.8|  155.3333333333|             0|             0|               0|             0|             0|               0|               5|             214|               3|             466|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:19|       476608|           5|           3|            209|          461.0|            209|              0|            41.8|  93.4676414595|            461|              0|  153.6666666667| 266.1584740964|1405.7674231234|16.7852826642|68086.8571428572|116165.357657993|    238154.0|        21.0|    476608.0|        119152.0|137536.399945614|   238349.0|       28.0|   238442.0|        119221.0|168305.556058022|   238231.0|      211.0|            0|            0|            0|            0|           168|           104|10.4908016651|6.2944809991|          0|        461| 74.4444444444|160.5942955954|25790.5277777778|           0|           0|           0|           1|           0|           0|             0|           0|            0|         83.75|            41.8|  153.6666666667|             0|             0|               0|             0|             0|               0|               5|             209|               3|             461|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:19|       479249|           5|           3|            215|          467.0|            215|              0|            43.0|  96.1509230325|            467|              0|  155.6666666667| 269.6225757116|1423.0598290242|16.6927839182|68464.1428571429|116522.185511928|    239061.0|        18.0|    479249.0|       119812.25|137797.592674606|   239270.0|       21.0|   239238.0|        119619.0|168903.768394906|   239052.0|      186.0|            0|            0|            0|            0|           168|           104|10.4329899489|6.2597939693|          0|        467| 75.7777777778|163.0312683029|26579.1944444444|           0|           0|           0|           1|           0|           0|             0|           0|            0|         85.25|            43.0|  155.6666666667|             0|             0|               0|             0|             0|               0|               5|             215|               3|             467|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
|      80|       6|14/02/2018 08:47:20|       475967|           5|           3|            215|          467.0|            215|              0|            43.0|  96.1509230325|            467|              0|  155.6666666667| 269.6225757116|1432.8724470394|16.8078879418|67995.2857142857|115929.081086548|    237703.0|        16.0|    475967.0|       118991.75|137195.538016305|   237904.0|       21.0|   237915.0|        118957.5|167975.337191208|   237734.0|      181.0|            0|            0|            0|            0|           168|           104|10.5049299636|6.3029579782|          0|        467| 75.7777777778|163.0312683029|26579.1944444444|           0|           0|           0|           1|           0|           0|             0|           0|            0|         85.25|            43.0|  155.6666666667|             0|             0|               0|             0|             0|               0|               5|             215|               3|             467|            14480|              219|                1|              32|        0.0|       0.0|       0.0|       0.0|         0.0|           0.0|        0.0|        0.0|Benign|
+--------+--------+-------------------+-------------+------------+------------+---------------+---------------+---------------+---------------+----------------+---------------+---------------+---------------+----------------+---------------+---------------+-------------+----------------+----------------+------------+------------+------------+----------------+----------------+-----------+-----------+-----------+----------------+----------------+-----------+-----------+-------------+-------------+-------------+-------------+--------------+--------------+-------------+------------+-----------+-----------+--------------+--------------+----------------+------------+------------+------------+------------+------------+------------+--------------+------------+-------------+--------------+----------------+----------------+--------------+--------------+----------------+--------------+--------------+----------------+----------------+----------------+----------------+----------------+-----------------+-----------------+-----------------+----------------+-----------+----------+----------+----------+------------+--------------+-----------+-----------+------+
only showing top 20 rows

In [ ]:

# The total number of attacks per label
IDS_df.select('Label').groupBy('Label').count().orderBy('count', ascending=False).show()

+--------------------+-------+
|               Label|  count|
+--------------------+-------+
|              Benign|6870186|
|    DDOS attack-HOIC| 686012|
|    DoS attacks-Hulk| 461912|
|                 Bot| 286191|
|      FTP-BruteForce| 193360|
|      SSH-Bruteforce| 187589|
|DoS attacks-SlowH...| 139890|
|DoS attacks-Golde...|  83016|
|       Infilteration|  68871|
|DoS attacks-Slowl...|  21980|
|DDOS attack-LOIC-UDP|   1730|
|    Brute Force -Web|    611|
|    Brute Force -XSS|    230|
|       SQL Injection|     87|
|               Label|     34|
|                   0|      1|
+--------------------+-------+

In [120]:

IDS_df2 = IDS_df.withColumnRenamed("Tot Fwd Pkts","tot_fw_pk").withColumnRenamed("Idle Max","idl_max") \
.withColumnRenamed("dst port","dst_port").withColumnRenamed("Idle Min","idl_min") \
.withColumnRenamed("TotLen Fwd Pkts","tot_l_fw_pkt").withColumnRenamed("Flow Duration","fl_dur") \
.withColumnRenamed("Flow Byts/s","fl_byt_s").withColumnRenamed("Fwd PSH Flags","fw_psh_flag") \
.withColumnRenamed("Active Max","atv_max").withColumnRenamed("Active Min","atv_min") \
.withColumnRenamed("Pkt Size Avg","pkt_size_avg").withColumnRenamed("Fwd Seg Size Avg","fw_seg_avg") \
.withColumnRenamed("Bwd Seg Size Avg","bw_seg_avg")

Task 1: Spark SQL [30 marks]¶

In [72]:

IDS_df2.createOrReplaceTempView("IDS")

In [10]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [11]:

# Pramod Kumar Gouda u2002425
# Query 1 [Briefly explain]: Returns a set of objects with duplicate elements eliminated and it used for collection 


sqlDF=spark.sql("SELECT Protocol, collect_set(tot_fw_pk) as totalfwdpkts FROM IDS WHERE Protocol IS NOT NULL GROUP BY Protocol  ")
sqlDF.show()

+--------+--------------------+
|Protocol|        totalfwdpkts|
+--------+--------------------+
|       6|[2207, 356, 1982,...|
|      17|[97718, 95024, 15...|
|       0|[110, 52, 387, 13...|
+--------+--------------------+

In [12]:

# Pramod Kumar Gouda u2002425
# Query 2 [Briefly explain]: Selecting the number of protocol based on there type

sqlDF=spark.sql("SELECT Protocol, count(*) FROM IDS GROUP BY Protocol")
sqlDF.show()

+--------+--------+
|Protocol|count(1)|
+--------+--------+
|    null|      34|
|       6| 6976276|
|      17| 1919793|
|       0|  105597|
+--------+--------+

In [13]:

pandas_df=sqlDF.toPandas()
pandas_df.sort_values(by='count(1)',ascending=False).plot(x='Protocol',y='count(1)',kind='bar')

Out[13]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fbb82d5d2e8>

In [56]:

# Virender Yadav u2002208
# Query 1 [Briefly explain]: Number of forwarded packets by there count

sqlDF=spark.sql("SELECT tot_fw_pk, count(*) FROM IDS GROUP BY tot_fw_pk HAVING COUNT(tot_fw_pk) > 50 " )
sqlDF.show()

+---------+--------+
|tot_fw_pk|count(1)|
+---------+--------+
|      148|      65|
|       31|    2440|
|       85|     154|
|      137|      82|
|       65|     664|
|       53|     928|
|      133|      84|
|       78|     268|
|      155|      69|
|      108|     203|
|       34|    1945|
|      126|     101|
|      115|     190|
|      101|     167|
|       81|     202|
|       28|    4640|
|       76|     215|
|       26|    6940|
|       27|    5227|
|       44|    1018|
+---------+--------+
only showing top 20 rows

In [52]:

# Virender Yadav u2002208
# Query 2 [Briefly explain]: finding the average flow duration of packets

sqlDF=spark.sql("SELECT avg(fl_dur) from IDS")
sqlDF.show()

+--------------------+
|         avg(fl_dur)|
+--------------------+
|1.0503308551024554E7|
+--------------------+

In [23]:

# Nishitha Angali u2001782
# Query 1 [Briefly explain]: To find the count of maximum time a flow was idle before becomming active
# and minimum time a flow was idle before becomming active with conditions given as 0 in both cases

sqlDF=spark.sql("SELECT count(idl_max),count(idl_min) FROM IDS where idl_max = 0  AND idl_min = 0")
sqlDF.show()

+--------------+--------------+
|count(idl_max)|count(idl_min)|
+--------------+--------------+
|       7925325|       7925325|
+--------------+--------------+

In [26]:

# Nishitha Angali u2001782
# Query 2 [Briefly explain]: Counting the number of idel max group whose count is greater than 20

sqlDF=spark.sql("SELECT idl_max,count(*) FROM IDS GROUP BY idl_max HAVING COUNT(idl_max) > 20 ")
sqlDF.show()

+-----------+--------+
|    idl_max|count(1)|
+-----------+--------+
|5.6320958E7|      21|
| 1.001003E7|      29|
|1.0004066E7|      60|
|1.0014804E7|      23|
|1.0014585E7|      41|
|1.0014137E7|      23|
|5.6319468E7|      24|
|1.0001233E7|      24|
| 1.000114E7|      26|
|1.0010131E7|      31|
|5.6318028E7|      22|
|1.0014601E7|      38|
| 1.001432E7|      24|
|1.0003658E7|      35|
|     6.06E7|      36|
|     1.66E7|      35|
| 1.000975E7|      26|
|1.0001486E7|      29|
| 1.001427E7|      21|
|1.0013879E7|      29|
+-----------+--------+
only showing top 20 rows

In [27]:

# Meetkumar Rasikbhai Patel u2001677
# Query 1 [Briefly explain]: Counting the number of destination ports by there type

sqlDF=spark.sql("SELECT dst_port, count(*) from IDS  GROUP BY dst_port ")
sqlDF.show()

+--------+--------+
|dst_port|count(1)|
+--------+--------+
|   38422|      29|
|   40386|      35|
|   35982|      39|
|    3997|       5|
|    1829|       4|
|   51415|     204|
|   26706|       4|
|   15846|       3|
|   51607|     184|
|   49308|      56|
|   50348|     216|
|   49855|     222|
|   50353|     206|
|   51393|     206|
|   51123|     210|
|   51595|     188|
|   63964|      26|
|   64519|      29|
|   57020|      69|
|   50223|     223|
+--------+--------+
only showing top 20 rows

In [53]:

# Meetkumar Rasikbhai Patel u2001677
# Query 2 [Briefly explain]: summing up the total length of forwarded packets

sqlDF=spark.sql("SELECT sum(tot_l_fw_pkt) from IDS")
sqlDF.show()

+-----------------+
|sum(tot_l_fw_pkt)|
+-----------------+
|      10422158811|
+-----------------+

In [73]:

# Maulik Bhikhabhai Padhiyar u2002324
# Query 1 [Briefly explain]: selecting the different types of PSH flag

sqlDF=spark.sql("SELECT DISTINCT fw_psh_flag from IDS ")
sqlDF.show()

+-----------+
|fw_psh_flag|
+-----------+
|       null|
|          1|
|          0|
+-----------+

In [68]:

# Maulik Bhikhabhai Padhiyar u2002324
# Query 2 [Briefly explain]: average byte rate which is transfered per second

sqlDF=spark.sql("SELECT count(fl_byt_s) from IDS where fl_byt_s > 0")
sqlDF.show()

+---------------+
|count(fl_byt_s)|
+---------------+
|        5781032|
+---------------+

Task 2 - Part1: PySpark [45 marks]¶

In [74]:

IDS_df2 = IDS_df2.na.drop()

In [121]:

# Pramod Kumar Gouda u2002425
# Analytical method 1: We are converting required columns from string to float to find skewness,is a measure of the 
# asymmetry of the data around sample mean


from pyspark.sql.functions import col

selected_features = ['tot_fw_pk','idl_max','atv_max','atv_min','idl_min','tot_l_fw_pkt','pkt_size_avg','fw_seg_avg','bw_seg_avg']
IDS_selected_features_df = IDS_df2.select(*(col(c).cast("float").alias(c) for c in selected_features))
IDS_selected_features_df.show()

+---------+-----------+-------+-------+-----------+------------+------------+----------+----------+
|tot_fw_pk|    idl_max|atv_max|atv_min|    idl_min|tot_l_fw_pkt|pkt_size_avg|fw_seg_avg|bw_seg_avg|
+---------+-----------+-------+-------+-----------+------------+------------+----------+----------+
|      3.0| 5.632096E7|    0.0|    0.0| 5.632076E7|         0.0|         0.0|       0.0|       0.0|
|      3.0|5.6320816E7|    0.0|    0.0|5.6320652E7|         0.0|         0.0|       0.0|       0.0|
|      3.0|5.6319524E7|    0.0|    0.0|5.6319096E7|         0.0|         0.0|       0.0|       0.0|
|     15.0|        0.0|    0.0|    0.0|        0.0|      1239.0|      140.48|      82.6|     227.3|
|     14.0|        0.0|    0.0|    0.0|        0.0|      1143.0|      134.08|  81.64286| 200.81818|
|     16.0|        0.0|    0.0|    0.0|        0.0|      1239.0|   125.42857|   77.4375| 189.41667|
|      3.0|5.6320384E7|    0.0|    0.0|5.6320096E7|         0.0|         0.0|       0.0|       0.0|
|      3.0|5.6320664E7|    0.0|    0.0|5.6320576E7|         0.0|         0.0|       0.0|       0.0|
|      5.0|        0.0|    0.0|    0.0|        0.0|       211.0|       84.25|      42.2| 154.33333|
|      5.0|        0.0|    0.0|    0.0|        0.0|       220.0|        86.5|      44.0| 157.33333|
|      5.0|        0.0|    0.0|    0.0|        0.0|       220.0|        86.5|      44.0| 157.33333|
|      5.0|        0.0|    0.0|    0.0|        0.0|       209.0|       83.75|      41.8| 153.66667|
|      5.0|        0.0|    0.0|    0.0|        0.0|       211.0|       84.25|      42.2| 154.33333|
|      5.0|        0.0|    0.0|    0.0|        0.0|       206.0|        83.0|      41.2| 152.66667|
|      5.0|        0.0|    0.0|    0.0|        0.0|       211.0|       84.25|      42.2| 154.33333|
|      5.0|        0.0|    0.0|    0.0|        0.0|       211.0|       84.25|      42.2| 154.33333|
|      5.0|        0.0|    0.0|    0.0|        0.0|       214.0|        85.0|      42.8| 155.33333|
|      5.0|        0.0|    0.0|    0.0|        0.0|       209.0|       83.75|      41.8| 153.66667|
|      5.0|        0.0|    0.0|    0.0|        0.0|       215.0|       85.25|      43.0| 155.66667|
|      5.0|        0.0|    0.0|    0.0|        0.0|       215.0|       85.25|      43.0| 155.66667|
+---------+-----------+-------+-------+-----------+------------+------------+----------+----------+
only showing top 20 rows

In [81]:

from pyspark.sql import functions as f

IDS_selected_features_df.select(f.skewness(IDS_selected_features_df['tot_fw_pk'])).show()

+-------------------+
|skewness(tot_fw_pk)|
+-------------------+
|  77.74947867361696|
+-------------------+

In [83]:

# Pramod Kumar Gouda u2002425
# Analytical method 2: I am finding the correlation between two columns where if one column increases its corelated 
# with other column and if another column value decreases

IDS_selected_features_df.stat.corr("atv_max","atv_min")

Out[83]:

0.7414425772390605

In [84]:

# Pramod Kumar Gouda u2002425
# Analytical method 3: kernel density estimate on a pyspark dataframe column and use it for 
# creating a new column with the estimates

from pyspark.mllib.stat import KernelDensity
dat_rdd = IDS_selected_features_df.select("tot_fw_pk").rdd
dat_rdd_data = dat_rdd.map(lambda x: x[0])

kd = KernelDensity()
kd.setSample(dat_rdd_data)
kd.estimate([13.0,14.0])

Out[84]:

array([0.0097561 , 0.01106945])

In [86]:

# Virender Yadav u2002208
# Analytical method 1: finding the kurtosis and is about tails of the distribution,measures of outliners

IDS_selected_features_df.select(f.kurtosis(IDS_selected_features_df['atv_min'])).show()

+------------------+
| kurtosis(atv_min)|
+------------------+
|4022.4866270803145|
+------------------+

In [91]:

# Virender Yadav u2002208
# Analytical method 2: find the percentile of a 80th %

IDS_df2.groupby('label').agg(f.expr('percentile(atv_max, array(0.80))')[0].alias('%80')).show()

+--------------------+---------+
|               label|      %80|
+--------------------+---------+
|      SSH-Bruteforce|      0.0|
|               Label|     null|
|       Infilteration|      0.0|
|                   0|      0.0|
|       SQL Injection|      0.0|
|DoS attacks-Slowl...|7678920.2|
|              Benign|      0.0|
|DoS attacks-SlowH...|      0.0|
|                 Bot|      0.0|
|DoS attacks-Golde...|    441.0|
|    Brute Force -XSS|      0.0|
|      FTP-BruteForce|      0.0|
|DDOS attack-LOIC-UDP|      0.0|
|    DoS attacks-Hulk|      0.0|
|    Brute Force -Web|3999879.0|
|    DDOS attack-HOIC|      0.0|
+--------------------+---------+

In [95]:

# Virender Yadav u2002208
# Analytical method 3: calculating standard deviation its value is how far from the normal ,squareroot of variance


IDS_df2.agg(f.stddev("atv_max")).show()

+--------------------+
|stddev_samp(atv_max)|
+--------------------+
|  1749057.5452774959|
+--------------------+

In [85]:

# Nishitha Angali u2001782
# Analytical method 1: To find skewness,is a measure of the asymmetry of the data around sample mean

IDS_selected_features_df.select(f.skewness(IDS_selected_features_df['idl_max'])).show()

+-----------------+
|skewness(idl_max)|
+-----------------+
|992.0339103023476|
+-----------------+

In [97]:

# Nishitha Angali u2001782
# Analytical method 2: finding the correlation between two columns where if one column increases its corelated 
# with other column and if another column value decreases

IDS_selected_features_df.stat.corr("idl_max","idl_min")

Out[97]:

0.33075364446687444

In [101]:

# Nishitha Angali u2001782
# Analytical method 3: find the percentile of a 75th %

IDS_df2.groupby('protocol').agg(f.expr('percentile(idl_max, array(0.75))')[0].alias('%75')).show()

+--------+-------+
|protocol|    %75|
+--------+-------+
|    null|   null|
|       6|    0.0|
|      17|    0.0|
|       0|5.632E7|
+--------+-------+

In [116]:

# Meetkumar Rasikbhai Patel u2001677
# Analytical method 1: finding the kurtosis and is about tails of the distribution,measures of outliners

IDS_selected_features_df.select(f.kurtosis(IDS_selected_features_df['tot_l_fw_pkt'])).show()

+----------------------+
|kurtosis(tot_l_fw_pkt)|
+----------------------+
|     1529504.304325319|
+----------------------+

In [113]:

# Meetkumar Rasikbhai Patel u2001677
# Analytical method 2: calucating the average value for the column

IDS_df2.select(f.mean("tot_l_fw_pkt")).show()

+------------------+
| avg(tot_l_fw_pkt)|
+------------------+
|1157.8033234070226|
+------------------+

In [122]:

# Meetkumar Rasikbhai Patel u2001677
# Analytical method 3: finding the correlation between two columns where if one column increases its corelated 
# with other column and if another column value decreases


IDS_selected_features_df.stat.corr("fw_seg_avg","bw_seg_avg")

Out[122]:

0.37286275767327126

In [119]:

# Maulik Bhikhabhai Padhiyar u2002324
# Analytical method 1: To find skewness,is a measure of the asymmetry of the data around sample mean

IDS_selected_features_df.select(f.skewness(IDS_selected_features_df['pkt_size_avg'])).show()

+----------------------+
|skewness(pkt_size_avg)|
+----------------------+
|    3.4566237116943674|
+----------------------+

In [123]:

# Maulik Bhikhabhai Padhiyar u2002324
# Analytical method 2:

IDS_df2.agg(f.stddev("idl_min")).show()

+--------------------+
|stddev_samp(idl_min)|
+--------------------+
| 8.435726565284234E7|
+--------------------+

In [124]:

# Maulik Bhikhabhai Padhiyar u2002324
# Analytical method 3: Obtaining the maximum value in column

IDS_df2.agg(f.max("idl_max")).show()

+------------+
|max(idl_max)|
+------------+
|  9.79781E11|
+------------+

Task 2 - Part2: PySpark [15 marks]¶

In [ ]:

# pramod Kumar Gouda u2002425
# Machine Learning Technique:
# What to achieve:

from pyspark.mllib.feature import Word2Vec

wv_rdd = IDS_df2.rdd
inp = wv_rdd.map(lambda row: row.split(" "))

word2vec = Word2Vec()
model = word2vec.fit(inp)

synonyms = model.findSynonyms('Benign', 5)

for word, cosine_distance in synonyms:
    print("{}: {}".format(word, cosine_distance))

In [ ]:

# Virender Yadav u2002208
# Machine Learning Technique:
# What to achieve:

from pyspark.mllib.clustering import LDA, LDAModel
from pyspark.mllib.linalg import Vectors

# Load and parse the data
wv_rdd = IDS_df2.rdd
parsedData = wv_rdd.map(lambda line: Vectors.dense([float(x) for x in line.strip().split(' ')]))
# Index documents with unique IDs
corpus = parsedData.zipWithIndex().map(lambda x: [x[1], x[0]]).cache()

# Cluster the documents into three topics using LDA
ldaModel = LDA.train(corpus, k=3)

# Output topics. Each is a distribution over words (matching word count vectors)
print("Learned topics (as distributions over vocab of " + str(ldaModel.vocabSize())
      + " words):")
topics = ldaModel.topicsMatrix()
for topic in range(3):
    print("Topic " + str(topic) + ":")
    for word in range(0, ldaModel.vocabSize()):
        print(" " + str(topics[word][topic]))

In [ ]:

# Nishitha Angali u2001782
# Machine Learning Technique:
# What to achieve:

from numpy import array
from math import sqrt

from pyspark.mllib.clustering import KMeans, KMeansModel

# Load and parse the data
wv_rdd = IDS_df2.rdd
parsedData = wv_rdd.map(lambda line: array([float(x) for x in line.split(',')]))

# Build the model (cluster the data)
clusters = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")

# Evaluate clustering by computing Within Set Sum of Squared Errors
def error(point):
    center = clusters.centers[clusters.predict(point)]
    return sqrt(sum([x**2 for x in (point - center)]))

WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
print("Within Set Sum of Squared Error = " + str(WSSSE))

In [ ]:

# Meetkumar Rasikbhai Patel u2001677
# Machine Learning Technique:
# What to achieve:

from pyspark.mllib.feature import ElementwiseProduct
from pyspark.mllib.linalg import Vectors

data = sc.textFile("data/mllib/kmeans_data.txt")
parsedData = data.map(lambda x: [float(t) for t in x.split(" ")])

# Create weight vector.
transformingVector = Vectors.dense([0.0, 1.0, 2.0])
transformer = ElementwiseProduct(transformingVector)

# Batch transform
transformedData = transformer.transform(parsedData)
# Single-row transform
transformedData2 = transformer.transform(parsedData.first())

In [ ]:

# Maulik Bhikhabhai Padhiyar u2002324
# Machine Learning Technique:
# What to achieve:

from pyspark.mllib.feature import Normalizer
from pyspark.mllib.util import MLUtils

data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
labels = data.map(lambda x: x.label)
features = data.map(lambda x: x.features)

normalizer1 = Normalizer()
normalizer2 = Normalizer(p=float("inf"))

# Each sample in data1 will be normalized using $L^2$ norm.
data1 = labels.zip(normalizer1.transform(features))

# Each sample in data2 will be normalized using $L^\infty$ norm.
data2 = labels.zip(normalizer2.transform(features))

Convert ipynb to HTML for Turnitin submission [10 marks]¶

In [125]:

# install nbconvert
!pip install nbconvert

Requirement already satisfied: nbconvert in /usr/local/lib/python3.6/dist-packages (5.6.1)
Requirement already satisfied: bleach in /usr/local/lib/python3.6/dist-packages (from nbconvert) (3.2.1)
Requirement already satisfied: jinja2>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (2.11.2)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.6.0)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (4.3.3)
Requirement already satisfied: nbformat>=4.4 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (5.0.8)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.8.4)
Requirement already satisfied: pygments in /usr/local/lib/python3.6/dist-packages (from nbconvert) (2.6.1)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.3)
Requirement already satisfied: jupyter-core in /usr/local/lib/python3.6/dist-packages (from nbconvert) (4.7.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (1.4.3)
Requirement already satisfied: testpath in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.4.4)
Requirement already satisfied: webencodings in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (0.5.1)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (1.15.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (20.7)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2>=2.4->nbconvert) (1.1.1)
Requirement already satisfied: decorator in /usr/local/lib/python3.6/dist-packages (from traitlets>=4.2->nbconvert) (4.4.2)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from traitlets>=4.2->nbconvert) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.4->nbconvert) (2.6.0)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->bleach->nbconvert) (2.4.7)

In [ ]:

# convert ipynb to html
# file name: Group115_CN7031_CN7031.ipynb

!jupyter nbconvert --to html Group115_CN7031.ipynb