Need two Big Data Query & Analysis using Spark SQL, three queries Advanced Analytics using PySpark
Big Data Analytics [CN7031] CRWK 2020-21¶
Group ID: [115]¶
- Student 1: Pramod Kumar Gouda u2002425
- Student 2: Virender Yadav u2002208
- Student 3: Nishitha Angali u2001782
- Student 4: Meetkumar Rasikbhai Patel u2001677
- Student 5: Maulik Bhikhabhai Padhiyar u2002324
If you want to add comments on your group work, please write it here for us:
Initiate and Configure Spark¶
In [1]:!sudo apt-get update !apt-get install openjdk-8-jdk-headless -qq > /dev/null !wget -q https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz !tar xf spark-3.0.1-bin-hadoop3.2.tgz !pip install -q findspark
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB] Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B] Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease Hit:4 http://archive.ubuntu.com/ubuntu bionic InRelease Get:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB] Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release Get:9 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB] Get:10 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ Packages [40.7 kB] Get:11 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1,372 kB] Get:12 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease [21.3 kB] Get:13 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB] Get:14 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [237 kB] Get:15 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [1,814 kB] Get:16 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [15.3 kB] Get:19 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main Sources [1,699 kB] Get:20 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2,136 kB] Get:21 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [2,243 kB] Get:22 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [266 kB] Get:23 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [53.8 kB] Get:24 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main amd64 Packages [870 kB] Get:25 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic/main amd64 Packages [46.5 kB] Fetched 11.1 MB in 3s (3,933 kB/s) Reading package lists... DoneIn [2]:
# Using operating system dependent functionality to read or write a file import os os.environ["JAVA_HOME"]="/usr/lib/jvm/java-8-openjdk-amd64" os.environ["SPARK_HOME"]="/content/spark-3.0.1-bin-hadoop3.2" import findspark findspark.init()In [3]:
# linking with SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").appName('Group115').getOrCreate()
# Note: If you want to work with RDD, you should use: "from pyspark import SparkContext, SparkConf"
Load Data¶
In [4]:# Load Data
df1 = spark.read.load("IDS2018/02-14-2018.csv", format="csv", inferSchema=True, header=True)
df2 = spark.read.load("IDS2018/02-15-2018.csv", format="csv", inferSchema=True, header=True)
df3 = spark.read.load("IDS2018/02-16-2018.csv", format="csv", inferSchema=True, header=True)
df4 = spark.read.load("IDS2018/02-21-2018.csv", format="csv", inferSchema=True, header=True)
df5 = spark.read.load("IDS2018/02-22-2018.csv", format="csv", inferSchema=True, header=True)
df6 = spark.read.load("IDS2018/02-23-2018.csv", format="csv", inferSchema=True, header=True)
df7 = spark.read.load("IDS2018/02-28-2018.csv", format="csv", inferSchema=True, header=True)
df8 = spark.read.load("IDS2018/03-01-2018.csv", format="csv", inferSchema=True, header=True)
df9 = spark.read.load("IDS2018/03-02-2018.csv", format="csv", inferSchema=True, header=True)
In [5]:
from functools import reduce from pyspark.sql import DataFrame # Create a list of dataframes dfs = [df1, df2, df3, df4, df5, df6, df7, df2, df9] # Create a merged dataframe IDS_df = reduce(DataFrame.unionAll, dfs)In [ ]:
# Print DF to make sure it is working IDS_df.show()
+--------+--------+-------------------+-------------+------------+------------+---------------+---------------+---------------+---------------+----------------+---------------+---------------+---------------+----------------+---------------+---------------+-------------+----------------+----------------+------------+------------+------------+----------------+----------------+-----------+-----------+-----------+----------------+----------------+-----------+-----------+-------------+-------------+-------------+-------------+--------------+--------------+-------------+------------+-----------+-----------+--------------+--------------+----------------+------------+------------+------------+------------+------------+------------+--------------+------------+-------------+--------------+----------------+----------------+--------------+--------------+----------------+--------------+--------------+----------------+----------------+----------------+----------------+----------------+-----------------+-----------------+-----------------+----------------+-----------+----------+----------+----------+------------+--------------+-----------+-----------+------+ |Dst Port|Protocol| Timestamp|Flow Duration|Tot Fwd Pkts|Tot Bwd Pkts|TotLen Fwd Pkts|TotLen Bwd Pkts|Fwd Pkt Len Max|Fwd Pkt Len Min|Fwd Pkt Len Mean|Fwd Pkt Len Std|Bwd Pkt Len Max|Bwd Pkt Len Min|Bwd Pkt Len Mean|Bwd Pkt Len Std| Flow Byts/s| Flow Pkts/s| Flow IAT Mean| Flow IAT Std|Flow IAT Max|Flow IAT Min| Fwd IAT Tot| Fwd IAT Mean| Fwd IAT Std|Fwd IAT Max|Fwd IAT Min|Bwd IAT Tot| Bwd IAT Mean| Bwd IAT Std|Bwd IAT Max|Bwd IAT Min|Fwd PSH Flags|Bwd PSH Flags|Fwd URG Flags|Bwd URG Flags|Fwd Header Len|Bwd Header Len| Fwd Pkts/s| Bwd Pkts/s|Pkt Len Min|Pkt Len Max| Pkt Len Mean| Pkt Len Std| Pkt Len Var|FIN Flag Cnt|SYN Flag Cnt|RST Flag Cnt|PSH Flag Cnt|ACK Flag Cnt|URG Flag Cnt|CWE Flag Count|ECE Flag Cnt|Down/Up Ratio| Pkt Size Avg|Fwd Seg Size Avg|Bwd Seg Size Avg|Fwd Byts/b Avg|Fwd Pkts/b Avg|Fwd Blk Rate Avg|Bwd Byts/b Avg|Bwd Pkts/b Avg|Bwd Blk Rate Avg|Subflow Fwd Pkts|Subflow Fwd Byts|Subflow Bwd Pkts|Subflow Bwd Byts|Init Fwd Win Byts|Init Bwd Win Byts|Fwd Act Data Pkts|Fwd Seg Size Min|Active Mean|Active Std|Active Max|Active Min| Idle Mean| Idle Std| Idle Max| Idle Min| Label| +--------+--------+-------------------+-------------+------------+------------+---------------+---------------+---------------+---------------+----------------+---------------+---------------+---------------+----------------+---------------+---------------+-------------+----------------+----------------+------------+------------+------------+----------------+----------------+-----------+-----------+-----------+----------------+----------------+-----------+-----------+-------------+-------------+-------------+-------------+--------------+--------------+-------------+------------+-----------+-----------+--------------+--------------+----------------+------------+------------+------------+------------+------------+------------+--------------+------------+-------------+--------------+----------------+----------------+--------------+--------------+----------------+--------------+--------------+----------------+----------------+----------------+----------------+----------------+-----------------+-----------------+-----------------+----------------+-----------+----------+----------+----------+------------+--------------+-----------+-----------+------+ | 0| 0|14/02/2018 08:31:01| 112641719| 3| 0| 0| 0.0| 0| 0| 0.0| 0.0| 0| 0| 0.0| 0.0| 0.0| 0.0266331163| 5.63208595E7| 139.3000358938| 5.6320958E7| 5.6320761E7|1.12641719E8| 5.63208595E7| 139.3000358938|5.6320958E7|5.6320761E7| 0.0| 0.0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0.0266331163| 0.0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 3| 0| 0| 0| -1| -1| 0| 0| 0.0| 0.0| 0.0| 0.0|5.63208595E7|139.3000358938|5.6320958E7|5.6320761E7|Benign| | 0| 0|14/02/2018 08:33:50| 112641466| 3| 0| 0| 0.0| 0| 0| 0.0| 0.0| 0| 0| 0.0| 0.0| 0.0| 0.0266331761| 5.6320733E7| 114.5512985522| 5.6320814E7| 5.6320652E7|1.12641466E8| 5.6320733E7| 114.5512985522|5.6320814E7|5.6320652E7| 0.0| 0.0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0.0266331761| 0.0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 3| 0| 0| 0| -1| -1| 0| 0| 0.0| 0.0| 0.0| 0.0| 5.6320733E7|114.5512985522|5.6320814E7|5.6320652E7|Benign| | 0| 0|14/02/2018 08:36:39| 112638623| 3| 0| 0| 0.0| 0| 0| 0.0| 0.0| 0| 0| 0.0| 0.0| 0.0| 0.0266338483| 5.63193115E7| 301.9345955667| 5.6319525E7| 5.6319098E7|1.12638623E8| 5.63193115E7| 301.9345955667|5.6319525E7|5.6319098E7| 0.0| 0.0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0.0266338483| 0.0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 3| 0| 0| 0| -1| -1| 0| 0| 0.0| 0.0| 0.0| 0.0|5.63193115E7|301.9345955667|5.6319525E7|5.6319098E7|Benign| | 22| 6|14/02/2018 08:40:13| 6453966| 15| 10| 1239| 2273.0| 744| 0| 82.6| 196.7412368715| 976| 0| 227.3| 371.6778922072| 544.1615279659| 3.8735871865| 268915.25|247443.778966007| 673900.0| 22.0| 6453966.0|460997.571428571|123109.423587757| 673900.0| 229740.0| 5637902.0|626433.555555556| 455082.21422401| 1167293.0| 554.0| 0| 0| 0| 0| 488| 328| 2.3241523119|1.5494348746| 0| 976|135.0769230769|277.8347599674|77192.1538461539| 0| 0| 0| 1| 0| 0| 0| 0| 0| 140.48| 82.6| 227.3| 0| 0| 0| 0| 0| 0| 15| 1239| 10| 2273| 65535| 233| 6| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 22| 6|14/02/2018 08:40:23| 8804066| 14| 11| 1143| 2209.0| 744| 0| 81.6428571429| 203.7455453568| 976| 0| 200.8181818182| 362.2498635422| 380.7331748762| 2.839597068|366836.083333333|511356.609732762| 1928102.0| 21.0| 8804066.0|677235.846153846|532416.970958985| 1928102.0| 246924.0| 7715481.0| 771548.1|755543.082716951| 2174893.0| 90.0| 0| 0| 0| 0| 456| 360| 1.5901743581|1.2494227099| 0| 976|128.9230769231|279.7630315931|78267.3538461539| 0| 0| 0| 1| 0| 0| 0| 0| 0| 134.08| 81.6428571429| 200.8181818182| 0| 0| 0| 0| 0| 0| 14| 1143| 11| 2209| 5808| 233| 6| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 22| 6|14/02/2018 08:40:31| 6989341| 16| 12| 1239| 2273.0| 744| 0| 77.4375| 190.8311535538| 976| 0| 189.4166666667| 347.6425694023| 502.4794183028| 4.0061001459|258864.481481481| 291724.14791076| 951098.0| 20.0| 6989341.0|465956.066666667|244363.896416351| 951098.0| 265831.0| 5980598.0|543690.727272727|460713.519752371| 1254338.0| 78.0| 0| 0| 0| 0| 332| 252| 2.2892000834|1.7169000625| 0| 976|121.1034482759|265.7086676402|70601.0960591133| 0| 0| 0| 1| 0| 0| 0| 0| 0|125.4285714286| 77.4375| 189.4166666667| 0| 0| 0| 0| 0| 0| 16| 1239| 12| 2273| 5808| 234| 7| 20| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 0| 0|14/02/2018 08:39:28| 112640480| 3| 0| 0| 0.0| 0| 0| 0.0| 0.0| 0| 0| 0.0| 0.0| 0.0| 0.0266334092| 5.632024E7| 203.6467529817| 5.6320384E7| 5.6320096E7| 1.1264048E8| 5.632024E7| 203.6467529817|5.6320384E7|5.6320096E7| 0.0| 0.0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0.0266334092| 0.0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 3| 0| 0| 0| -1| -1| 0| 0| 0.0| 0.0| 0.0| 0.0| 5.632024E7|203.6467529817|5.6320384E7|5.6320096E7|Benign| | 0| 0|14/02/2018 08:42:17| 112641244| 3| 0| 0| 0.0| 0| 0| 0.0| 0.0| 0| 0| 0.0| 0.0| 0.0| 0.0266332286| 5.6320622E7| 62.2253967444| 5.6320666E7| 5.6320578E7|1.12641244E8| 5.6320622E7| 62.2253967444|5.6320666E7|5.6320578E7| 0.0| 0.0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0.0266332286| 0.0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0.0| 0.0| 0.0| 0| 0| 0| 0| 0| 0| 3| 0| 0| 0| -1| -1| 0| 0| 0.0| 0.0| 0.0| 0.0| 5.6320622E7| 62.2253967444|5.6320666E7|5.6320578E7|Benign| | 80| 6|14/02/2018 08:47:14| 476513| 5| 3| 211| 463.0| 211| 0| 42.2| 94.3620686505| 463| 0| 154.3333333333| 267.3131746348|1414.4419984345|16.7886290615|68073.2857142857|115865.792656438| 237711.0| 24.0| 476513.0| 119128.25|137379.963358017| 238470.0| 108.0| 238634.0| 119317.0|167621.076693833| 237843.0| 791.0| 0| 0| 0| 0| 168| 104|10.4928931635|6.2957358981| 0| 463| 74.8888888889|161.4058893322|26051.8611111111| 0| 0| 0| 1| 0| 0| 0| 0| 0| 84.25| 42.2| 154.3333333333| 0| 0| 0| 0| 0| 0| 5| 211| 3| 463| 14600| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:15| 475048| 5| 3| 220| 472.0| 220| 0| 44.0| 98.38699101| 472| 0| 157.3333333333| 272.5093270575|1456.6949024099|16.8404034961| 67864.0|115746.933154476| 237494.0| 15.0| 475048.0| 118762.0|137096.759626185| 237853.0| 15.0| 237516.0| 118758.0|167472.584269784| 237179.0| 337.0| 0| 0| 0| 0| 168| 104| 10.525252185| 6.315151311| 0| 472| 76.8888888889|165.0669897681|27247.1111111111| 0| 0| 0| 1| 0| 0| 0| 0| 0| 86.5| 44.0| 157.3333333333| 0| 0| 0| 0| 0| 0| 5| 220| 3| 472| 14600| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:15| 474926| 5| 3| 220| 472.0| 220| 0| 44.0| 98.38699101| 472| 0| 157.3333333333| 272.5093270575|1457.0691012916|16.8447294947|67846.5714285714|115645.740842248| 237162.0| 15.0| 474926.0| 118731.5|136923.365842601| 237497.0| 15.0| 237732.0| 118866.0|167663.503100705| 237422.0| 310.0| 0| 0| 0| 0| 168| 104|10.5279559342|6.3167735605| 0| 472| 76.8888888889|165.0669897681|27247.1111111111| 0| 0| 0| 1| 0| 0| 0| 0| 0| 86.5| 44.0| 157.3333333333| 0| 0| 0| 0| 0| 0| 5| 220| 3| 472| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:16| 477471| 5| 3| 209| 461.0| 209| 0| 41.8| 93.4676414595| 461| 0| 153.6666666667| 266.1584740964|1403.2265833946|16.7549442793|68210.1428571429|116178.228792989| 238389.0| 17.0| 477471.0| 119367.75|137516.508224467| 238515.0| 149.0| 238887.0| 119443.5|168454.755588852| 238559.0| 328.0| 0| 0| 0| 0| 168| 104|10.4718401746|6.2831041048| 0| 461| 74.4444444444|160.5942955954|25790.5277777778| 0| 0| 0| 1| 0| 0| 0| 0| 0| 83.75| 41.8| 153.6666666667| 0| 0| 0| 0| 0| 0| 5| 209| 3| 461| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:16| 512758| 5| 3| 211| 463.0| 211| 0| 42.2| 94.3620686505| 463| 0| 154.3333333333| 267.3131746348|1314.4602327024|15.6019018718|73251.1428571429|124959.473740394| 256188.0| 10.0| 512758.0| 128189.5|148006.106082373| 256523.0| 10.0| 256563.0| 128281.5|180935.190276795| 256222.0| 341.0| 0| 0| 0| 0| 168| 104| 9.7511886699|5.8507132019| 0| 463| 74.8888888889|161.4058893322|26051.8611111111| 0| 0| 0| 1| 0| 0| 0| 0| 0| 84.25| 42.2| 154.3333333333| 0| 0| 0| 0| 0| 0| 5| 211| 3| 463| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:17| 476711| 5| 3| 206| 458.0| 206| 0| 41.2| 92.126000673| 458| 0| 152.6666666667| 264.4264232888|1392.8774456641|16.7816559719|68101.5714285714|115977.215079597| 238034.0| 16.0| 476711.0| 119177.75| 137207.36560009| 238274.0| 78.0| 238033.0| 119016.5|168007.864103143| 237816.0| 217.0| 0| 0| 0| 0| 168| 104|10.4885349824|6.2931209894| 0| 458| 73.7777777778|159.3783060659|25401.4444444444| 0| 0| 0| 1| 0| 0| 0| 0| 0| 83.0| 41.2| 152.6666666667| 0| 0| 0| 0| 0| 0| 5| 206| 3| 458| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:17| 476616| 5| 3| 211| 463.0| 211| 0| 42.2| 94.3620686505| 463| 0| 154.3333333333| 267.3131746348|1414.1363277775|16.7850009232| 68088.0|115553.073301117| 237285.0| 8.0| 476616.0| 119154.0|136576.972942001| 237558.0| 8.0| 237660.0| 118830.0|167563.093937776| 237315.0| 345.0| 0| 0| 0| 0| 168| 104| 10.490625577|6.2943753462| 0| 463| 74.8888888889|161.4058893322|26051.8611111111| 0| 0| 0| 1| 0| 0| 0| 0| 0| 84.25| 42.2| 154.3333333333| 0| 0| 0| 0| 0| 0| 5| 211| 3| 463| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:18| 477161| 5| 3| 211| 463.0| 211| 0| 42.2| 94.3620686505| 463| 0| 154.3333333333| 267.3131746348|1412.5211406632|16.7658295628|68165.8571428572|116324.530541969| 238504.0| 12.0| 477161.0| 119290.25|137729.560342905| 238719.0| 12.0| 238618.0| 119309.0|168449.805841384| 238421.0| 197.0| 0| 0| 0| 0| 168| 104|10.4786434767| 6.287186086| 0| 463| 74.8888888889|161.4058893322|26051.8611111111| 0| 0| 0| 1| 0| 0| 0| 0| 0| 84.25| 42.2| 154.3333333333| 0| 0| 0| 0| 0| 0| 5| 211| 3| 463| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:18| 474670| 5| 3| 214| 466.0| 214| 0| 42.8| 95.703709437| 466| 0| 155.3333333333| 269.0452254424|1432.5742094508|16.8538142288| 67810.0|115371.262364883| 236717.0| 15.0| 474670.0| 118667.5|136423.221362787| 236894.0| 55.0| 236992.0| 118496.0| 167297.22178805| 236793.0| 199.0| 0| 0| 0| 0| 168| 104| 10.533633893|6.3201803358| 0| 466| 75.5555555556|162.6246530443|26446.7777777778| 0| 0| 0| 1| 0| 0| 0| 0| 0| 85.0| 42.8| 155.3333333333| 0| 0| 0| 0| 0| 0| 5| 214| 3| 466| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:19| 476608| 5| 3| 209| 461.0| 209| 0| 41.8| 93.4676414595| 461| 0| 153.6666666667| 266.1584740964|1405.7674231234|16.7852826642|68086.8571428572|116165.357657993| 238154.0| 21.0| 476608.0| 119152.0|137536.399945614| 238349.0| 28.0| 238442.0| 119221.0|168305.556058022| 238231.0| 211.0| 0| 0| 0| 0| 168| 104|10.4908016651|6.2944809991| 0| 461| 74.4444444444|160.5942955954|25790.5277777778| 0| 0| 0| 1| 0| 0| 0| 0| 0| 83.75| 41.8| 153.6666666667| 0| 0| 0| 0| 0| 0| 5| 209| 3| 461| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:19| 479249| 5| 3| 215| 467.0| 215| 0| 43.0| 96.1509230325| 467| 0| 155.6666666667| 269.6225757116|1423.0598290242|16.6927839182|68464.1428571429|116522.185511928| 239061.0| 18.0| 479249.0| 119812.25|137797.592674606| 239270.0| 21.0| 239238.0| 119619.0|168903.768394906| 239052.0| 186.0| 0| 0| 0| 0| 168| 104|10.4329899489|6.2597939693| 0| 467| 75.7777777778|163.0312683029|26579.1944444444| 0| 0| 0| 1| 0| 0| 0| 0| 0| 85.25| 43.0| 155.6666666667| 0| 0| 0| 0| 0| 0| 5| 215| 3| 467| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| | 80| 6|14/02/2018 08:47:20| 475967| 5| 3| 215| 467.0| 215| 0| 43.0| 96.1509230325| 467| 0| 155.6666666667| 269.6225757116|1432.8724470394|16.8078879418|67995.2857142857|115929.081086548| 237703.0| 16.0| 475967.0| 118991.75|137195.538016305| 237904.0| 21.0| 237915.0| 118957.5|167975.337191208| 237734.0| 181.0| 0| 0| 0| 0| 168| 104|10.5049299636|6.3029579782| 0| 467| 75.7777777778|163.0312683029|26579.1944444444| 0| 0| 0| 1| 0| 0| 0| 0| 0| 85.25| 43.0| 155.6666666667| 0| 0| 0| 0| 0| 0| 5| 215| 3| 467| 14480| 219| 1| 32| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0| 0.0|Benign| +--------+--------+-------------------+-------------+------------+------------+---------------+---------------+---------------+---------------+----------------+---------------+---------------+---------------+----------------+---------------+---------------+-------------+----------------+----------------+------------+------------+------------+----------------+----------------+-----------+-----------+-----------+----------------+----------------+-----------+-----------+-------------+-------------+-------------+-------------+--------------+--------------+-------------+------------+-----------+-----------+--------------+--------------+----------------+------------+------------+------------+------------+------------+------------+--------------+------------+-------------+--------------+----------------+----------------+--------------+--------------+----------------+--------------+--------------+----------------+----------------+----------------+----------------+----------------+-----------------+-----------------+-----------------+----------------+-----------+----------+----------+----------+------------+--------------+-----------+-----------+------+ only showing top 20 rowsIn [ ]:
# The total number of attacks per label
IDS_df.select('Label').groupBy('Label').count().orderBy('count', ascending=False).show()
+--------------------+-------+ | Label| count| +--------------------+-------+ | Benign|6870186| | DDOS attack-HOIC| 686012| | DoS attacks-Hulk| 461912| | Bot| 286191| | FTP-BruteForce| 193360| | SSH-Bruteforce| 187589| |DoS attacks-SlowH...| 139890| |DoS attacks-Golde...| 83016| | Infilteration| 68871| |DoS attacks-Slowl...| 21980| |DDOS attack-LOIC-UDP| 1730| | Brute Force -Web| 611| | Brute Force -XSS| 230| | SQL Injection| 87| | Label| 34| | 0| 1| +--------------------+-------+In [120]:
IDS_df2 = IDS_df.withColumnRenamed("Tot Fwd Pkts","tot_fw_pk").withColumnRenamed("Idle Max","idl_max") \
.withColumnRenamed("dst port","dst_port").withColumnRenamed("Idle Min","idl_min") \
.withColumnRenamed("TotLen Fwd Pkts","tot_l_fw_pkt").withColumnRenamed("Flow Duration","fl_dur") \
.withColumnRenamed("Flow Byts/s","fl_byt_s").withColumnRenamed("Fwd PSH Flags","fw_psh_flag") \
.withColumnRenamed("Active Max","atv_max").withColumnRenamed("Active Min","atv_min") \
.withColumnRenamed("Pkt Size Avg","pkt_size_avg").withColumnRenamed("Fwd Seg Size Avg","fw_seg_avg") \
.withColumnRenamed("Bwd Seg Size Avg","bw_seg_avg")
Task 1: Spark SQL [30 marks]¶
In [72]:IDS_df2.createOrReplaceTempView("IDS")
In [10]:
import numpy as np import matplotlib.pyplot as plt import pandas as pdIn [11]:
# Pramod Kumar Gouda u2002425
# Query 1 [Briefly explain]: Returns a set of objects with duplicate elements eliminated and it used for collection
sqlDF=spark.sql("SELECT Protocol, collect_set(tot_fw_pk) as totalfwdpkts FROM IDS WHERE Protocol IS NOT NULL GROUP BY Protocol ")
sqlDF.show()
+--------+--------------------+ |Protocol| totalfwdpkts| +--------+--------------------+ | 6|[2207, 356, 1982,...| | 17|[97718, 95024, 15...| | 0|[110, 52, 387, 13...| +--------+--------------------+In [12]:
# Pramod Kumar Gouda u2002425
# Query 2 [Briefly explain]: Selecting the number of protocol based on there type
sqlDF=spark.sql("SELECT Protocol, count(*) FROM IDS GROUP BY Protocol")
sqlDF.show()
+--------+--------+ |Protocol|count(1)| +--------+--------+ | null| 34| | 6| 6976276| | 17| 1919793| | 0| 105597| +--------+--------+In [13]:
pandas_df=sqlDF.toPandas() pandas_df.sort_values(by='count(1)',ascending=False).plot(x='Protocol',y='count(1)',kind='bar')Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbb82d5d2e8>
# Virender Yadav u2002208
# Query 1 [Briefly explain]: Number of forwarded packets by there count
sqlDF=spark.sql("SELECT tot_fw_pk, count(*) FROM IDS GROUP BY tot_fw_pk HAVING COUNT(tot_fw_pk) > 50 " )
sqlDF.show()
+---------+--------+ |tot_fw_pk|count(1)| +---------+--------+ | 148| 65| | 31| 2440| | 85| 154| | 137| 82| | 65| 664| | 53| 928| | 133| 84| | 78| 268| | 155| 69| | 108| 203| | 34| 1945| | 126| 101| | 115| 190| | 101| 167| | 81| 202| | 28| 4640| | 76| 215| | 26| 6940| | 27| 5227| | 44| 1018| +---------+--------+ only showing top 20 rowsIn [52]:
# Virender Yadav u2002208
# Query 2 [Briefly explain]: finding the average flow duration of packets
sqlDF=spark.sql("SELECT avg(fl_dur) from IDS")
sqlDF.show()
+--------------------+ | avg(fl_dur)| +--------------------+ |1.0503308551024554E7| +--------------------+In [23]:
# Nishitha Angali u2001782
# Query 1 [Briefly explain]: To find the count of maximum time a flow was idle before becomming active
# and minimum time a flow was idle before becomming active with conditions given as 0 in both cases
sqlDF=spark.sql("SELECT count(idl_max),count(idl_min) FROM IDS where idl_max = 0 AND idl_min = 0")
sqlDF.show()
+--------------+--------------+ |count(idl_max)|count(idl_min)| +--------------+--------------+ | 7925325| 7925325| +--------------+--------------+In [26]:
# Nishitha Angali u2001782
# Query 2 [Briefly explain]: Counting the number of idel max group whose count is greater than 20
sqlDF=spark.sql("SELECT idl_max,count(*) FROM IDS GROUP BY idl_max HAVING COUNT(idl_max) > 20 ")
sqlDF.show()
+-----------+--------+ | idl_max|count(1)| +-----------+--------+ |5.6320958E7| 21| | 1.001003E7| 29| |1.0004066E7| 60| |1.0014804E7| 23| |1.0014585E7| 41| |1.0014137E7| 23| |5.6319468E7| 24| |1.0001233E7| 24| | 1.000114E7| 26| |1.0010131E7| 31| |5.6318028E7| 22| |1.0014601E7| 38| | 1.001432E7| 24| |1.0003658E7| 35| | 6.06E7| 36| | 1.66E7| 35| | 1.000975E7| 26| |1.0001486E7| 29| | 1.001427E7| 21| |1.0013879E7| 29| +-----------+--------+ only showing top 20 rowsIn [27]:
# Meetkumar Rasikbhai Patel u2001677
# Query 1 [Briefly explain]: Counting the number of destination ports by there type
sqlDF=spark.sql("SELECT dst_port, count(*) from IDS GROUP BY dst_port ")
sqlDF.show()
+--------+--------+ |dst_port|count(1)| +--------+--------+ | 38422| 29| | 40386| 35| | 35982| 39| | 3997| 5| | 1829| 4| | 51415| 204| | 26706| 4| | 15846| 3| | 51607| 184| | 49308| 56| | 50348| 216| | 49855| 222| | 50353| 206| | 51393| 206| | 51123| 210| | 51595| 188| | 63964| 26| | 64519| 29| | 57020| 69| | 50223| 223| +--------+--------+ only showing top 20 rowsIn [53]:
# Meetkumar Rasikbhai Patel u2001677
# Query 2 [Briefly explain]: summing up the total length of forwarded packets
sqlDF=spark.sql("SELECT sum(tot_l_fw_pkt) from IDS")
sqlDF.show()
+-----------------+ |sum(tot_l_fw_pkt)| +-----------------+ | 10422158811| +-----------------+In [73]:
# Maulik Bhikhabhai Padhiyar u2002324
# Query 1 [Briefly explain]: selecting the different types of PSH flag
sqlDF=spark.sql("SELECT DISTINCT fw_psh_flag from IDS ")
sqlDF.show()
+-----------+ |fw_psh_flag| +-----------+ | null| | 1| | 0| +-----------+In [68]:
# Maulik Bhikhabhai Padhiyar u2002324
# Query 2 [Briefly explain]: average byte rate which is transfered per second
sqlDF=spark.sql("SELECT count(fl_byt_s) from IDS where fl_byt_s > 0")
sqlDF.show()
+---------------+ |count(fl_byt_s)| +---------------+ | 5781032| +---------------+
Task 2 - Part1: PySpark [45 marks]¶
In [74]:IDS_df2 = IDS_df2.na.drop()In [121]:
# Pramod Kumar Gouda u2002425
# Analytical method 1: We are converting required columns from string to float to find skewness,is a measure of the
# asymmetry of the data around sample mean
from pyspark.sql.functions import col
selected_features = ['tot_fw_pk','idl_max','atv_max','atv_min','idl_min','tot_l_fw_pkt','pkt_size_avg','fw_seg_avg','bw_seg_avg']
IDS_selected_features_df = IDS_df2.select(*(col(c).cast("float").alias(c) for c in selected_features))
IDS_selected_features_df.show()
+---------+-----------+-------+-------+-----------+------------+------------+----------+----------+ |tot_fw_pk| idl_max|atv_max|atv_min| idl_min|tot_l_fw_pkt|pkt_size_avg|fw_seg_avg|bw_seg_avg| +---------+-----------+-------+-------+-----------+------------+------------+----------+----------+ | 3.0| 5.632096E7| 0.0| 0.0| 5.632076E7| 0.0| 0.0| 0.0| 0.0| | 3.0|5.6320816E7| 0.0| 0.0|5.6320652E7| 0.0| 0.0| 0.0| 0.0| | 3.0|5.6319524E7| 0.0| 0.0|5.6319096E7| 0.0| 0.0| 0.0| 0.0| | 15.0| 0.0| 0.0| 0.0| 0.0| 1239.0| 140.48| 82.6| 227.3| | 14.0| 0.0| 0.0| 0.0| 0.0| 1143.0| 134.08| 81.64286| 200.81818| | 16.0| 0.0| 0.0| 0.0| 0.0| 1239.0| 125.42857| 77.4375| 189.41667| | 3.0|5.6320384E7| 0.0| 0.0|5.6320096E7| 0.0| 0.0| 0.0| 0.0| | 3.0|5.6320664E7| 0.0| 0.0|5.6320576E7| 0.0| 0.0| 0.0| 0.0| | 5.0| 0.0| 0.0| 0.0| 0.0| 211.0| 84.25| 42.2| 154.33333| | 5.0| 0.0| 0.0| 0.0| 0.0| 220.0| 86.5| 44.0| 157.33333| | 5.0| 0.0| 0.0| 0.0| 0.0| 220.0| 86.5| 44.0| 157.33333| | 5.0| 0.0| 0.0| 0.0| 0.0| 209.0| 83.75| 41.8| 153.66667| | 5.0| 0.0| 0.0| 0.0| 0.0| 211.0| 84.25| 42.2| 154.33333| | 5.0| 0.0| 0.0| 0.0| 0.0| 206.0| 83.0| 41.2| 152.66667| | 5.0| 0.0| 0.0| 0.0| 0.0| 211.0| 84.25| 42.2| 154.33333| | 5.0| 0.0| 0.0| 0.0| 0.0| 211.0| 84.25| 42.2| 154.33333| | 5.0| 0.0| 0.0| 0.0| 0.0| 214.0| 85.0| 42.8| 155.33333| | 5.0| 0.0| 0.0| 0.0| 0.0| 209.0| 83.75| 41.8| 153.66667| | 5.0| 0.0| 0.0| 0.0| 0.0| 215.0| 85.25| 43.0| 155.66667| | 5.0| 0.0| 0.0| 0.0| 0.0| 215.0| 85.25| 43.0| 155.66667| +---------+-----------+-------+-------+-----------+------------+------------+----------+----------+ only showing top 20 rowsIn [81]:
from pyspark.sql import functions as f IDS_selected_features_df.select(f.skewness(IDS_selected_features_df['tot_fw_pk'])).show()
+-------------------+ |skewness(tot_fw_pk)| +-------------------+ | 77.74947867361696| +-------------------+In [83]:
# Pramod Kumar Gouda u2002425
# Analytical method 2: I am finding the correlation between two columns where if one column increases its corelated
# with other column and if another column value decreases
IDS_selected_features_df.stat.corr("atv_max","atv_min")
Out[83]:
0.7414425772390605In [84]:
# Pramod Kumar Gouda u2002425
# Analytical method 3: kernel density estimate on a pyspark dataframe column and use it for
# creating a new column with the estimates
from pyspark.mllib.stat import KernelDensity
dat_rdd = IDS_selected_features_df.select("tot_fw_pk").rdd
dat_rdd_data = dat_rdd.map(lambda x: x[0])
kd = KernelDensity()
kd.setSample(dat_rdd_data)
kd.estimate([13.0,14.0])
Out[84]:
array([0.0097561 , 0.01106945])In [86]:
# Virender Yadav u2002208 # Analytical method 1: finding the kurtosis and is about tails of the distribution,measures of outliners IDS_selected_features_df.select(f.kurtosis(IDS_selected_features_df['atv_min'])).show()
+------------------+ | kurtosis(atv_min)| +------------------+ |4022.4866270803145| +------------------+In [91]:
# Virender Yadav u2002208
# Analytical method 2: find the percentile of a 80th %
IDS_df2.groupby('label').agg(f.expr('percentile(atv_max, array(0.80))')[0].alias('%80')).show()
+--------------------+---------+ | label| %80| +--------------------+---------+ | SSH-Bruteforce| 0.0| | Label| null| | Infilteration| 0.0| | 0| 0.0| | SQL Injection| 0.0| |DoS attacks-Slowl...|7678920.2| | Benign| 0.0| |DoS attacks-SlowH...| 0.0| | Bot| 0.0| |DoS attacks-Golde...| 441.0| | Brute Force -XSS| 0.0| | FTP-BruteForce| 0.0| |DDOS attack-LOIC-UDP| 0.0| | DoS attacks-Hulk| 0.0| | Brute Force -Web|3999879.0| | DDOS attack-HOIC| 0.0| +--------------------+---------+In [95]:
# Virender Yadav u2002208
# Analytical method 3: calculating standard deviation its value is how far from the normal ,squareroot of variance
IDS_df2.agg(f.stddev("atv_max")).show()
+--------------------+ |stddev_samp(atv_max)| +--------------------+ | 1749057.5452774959| +--------------------+In [85]:
# Nishitha Angali u2001782 # Analytical method 1: To find skewness,is a measure of the asymmetry of the data around sample mean IDS_selected_features_df.select(f.skewness(IDS_selected_features_df['idl_max'])).show()
+-----------------+ |skewness(idl_max)| +-----------------+ |992.0339103023476| +-----------------+In [97]:
# Nishitha Angali u2001782
# Analytical method 2: finding the correlation between two columns where if one column increases its corelated
# with other column and if another column value decreases
IDS_selected_features_df.stat.corr("idl_max","idl_min")
Out[97]:
0.33075364446687444In [101]:
# Nishitha Angali u2001782
# Analytical method 3: find the percentile of a 75th %
IDS_df2.groupby('protocol').agg(f.expr('percentile(idl_max, array(0.75))')[0].alias('%75')).show()
+--------+-------+ |protocol| %75| +--------+-------+ | null| null| | 6| 0.0| | 17| 0.0| | 0|5.632E7| +--------+-------+In [116]:
# Meetkumar Rasikbhai Patel u2001677 # Analytical method 1: finding the kurtosis and is about tails of the distribution,measures of outliners IDS_selected_features_df.select(f.kurtosis(IDS_selected_features_df['tot_l_fw_pkt'])).show()
+----------------------+ |kurtosis(tot_l_fw_pkt)| +----------------------+ | 1529504.304325319| +----------------------+In [113]:
# Meetkumar Rasikbhai Patel u2001677
# Analytical method 2: calucating the average value for the column
IDS_df2.select(f.mean("tot_l_fw_pkt")).show()
+------------------+ | avg(tot_l_fw_pkt)| +------------------+ |1157.8033234070226| +------------------+In [122]:
# Meetkumar Rasikbhai Patel u2001677
# Analytical method 3: finding the correlation between two columns where if one column increases its corelated
# with other column and if another column value decreases
IDS_selected_features_df.stat.corr("fw_seg_avg","bw_seg_avg")
Out[122]:
0.37286275767327126In [119]:
# Maulik Bhikhabhai Padhiyar u2002324 # Analytical method 1: To find skewness,is a measure of the asymmetry of the data around sample mean IDS_selected_features_df.select(f.skewness(IDS_selected_features_df['pkt_size_avg'])).show()
+----------------------+ |skewness(pkt_size_avg)| +----------------------+ | 3.4566237116943674| +----------------------+In [123]:
# Maulik Bhikhabhai Padhiyar u2002324
# Analytical method 2:
IDS_df2.agg(f.stddev("idl_min")).show()
+--------------------+ |stddev_samp(idl_min)| +--------------------+ | 8.435726565284234E7| +--------------------+In [124]:
# Maulik Bhikhabhai Padhiyar u2002324
# Analytical method 3: Obtaining the maximum value in column
IDS_df2.agg(f.max("idl_max")).show()
+------------+ |max(idl_max)| +------------+ | 9.79781E11| +------------+
Task 2 - Part2: PySpark [15 marks]¶
In [ ]:# pramod Kumar Gouda u2002425
# Machine Learning Technique:
# What to achieve:
from pyspark.mllib.feature import Word2Vec
wv_rdd = IDS_df2.rdd
inp = wv_rdd.map(lambda row: row.split(" "))
word2vec = Word2Vec()
model = word2vec.fit(inp)
synonyms = model.findSynonyms('Benign', 5)
for word, cosine_distance in synonyms:
print("{}: {}".format(word, cosine_distance))
In [ ]:
# Virender Yadav u2002208
# Machine Learning Technique:
# What to achieve:
from pyspark.mllib.clustering import LDA, LDAModel
from pyspark.mllib.linalg import Vectors
# Load and parse the data
wv_rdd = IDS_df2.rdd
parsedData = wv_rdd.map(lambda line: Vectors.dense([float(x) for x in line.strip().split(' ')]))
# Index documents with unique IDs
corpus = parsedData.zipWithIndex().map(lambda x: [x[1], x[0]]).cache()
# Cluster the documents into three topics using LDA
ldaModel = LDA.train(corpus, k=3)
# Output topics. Each is a distribution over words (matching word count vectors)
print("Learned topics (as distributions over vocab of " + str(ldaModel.vocabSize())
+ " words):")
topics = ldaModel.topicsMatrix()
for topic in range(3):
print("Topic " + str(topic) + ":")
for word in range(0, ldaModel.vocabSize()):
print(" " + str(topics[word][topic]))
In [ ]:
# Nishitha Angali u2001782
# Machine Learning Technique:
# What to achieve:
from numpy import array
from math import sqrt
from pyspark.mllib.clustering import KMeans, KMeansModel
# Load and parse the data
wv_rdd = IDS_df2.rdd
parsedData = wv_rdd.map(lambda line: array([float(x) for x in line.split(',')]))
# Build the model (cluster the data)
clusters = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")
# Evaluate clustering by computing Within Set Sum of Squared Errors
def error(point):
center = clusters.centers[clusters.predict(point)]
return sqrt(sum([x**2 for x in (point - center)]))
WSSSE = parsedData.map(lambda point: error(point)).reduce(lambda x, y: x + y)
print("Within Set Sum of Squared Error = " + str(WSSSE))
In [ ]:
# Meetkumar Rasikbhai Patel u2001677
# Machine Learning Technique:
# What to achieve:
from pyspark.mllib.feature import ElementwiseProduct
from pyspark.mllib.linalg import Vectors
data = sc.textFile("data/mllib/kmeans_data.txt")
parsedData = data.map(lambda x: [float(t) for t in x.split(" ")])
# Create weight vector.
transformingVector = Vectors.dense([0.0, 1.0, 2.0])
transformer = ElementwiseProduct(transformingVector)
# Batch transform
transformedData = transformer.transform(parsedData)
# Single-row transform
transformedData2 = transformer.transform(parsedData.first())
In [ ]:
# Maulik Bhikhabhai Padhiyar u2002324
# Machine Learning Technique:
# What to achieve:
from pyspark.mllib.feature import Normalizer
from pyspark.mllib.util import MLUtils
data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
labels = data.map(lambda x: x.label)
features = data.map(lambda x: x.features)
normalizer1 = Normalizer()
normalizer2 = Normalizer(p=float("inf"))
# Each sample in data1 will be normalized using $L^2$ norm.
data1 = labels.zip(normalizer1.transform(features))
# Each sample in data2 will be normalized using $L^\infty$ norm.
data2 = labels.zip(normalizer2.transform(features))
Convert ipynb to HTML for Turnitin submission [10 marks]¶
In [125]:# install nbconvert !pip install nbconvert
Requirement already satisfied: nbconvert in /usr/local/lib/python3.6/dist-packages (5.6.1) Requirement already satisfied: bleach in /usr/local/lib/python3.6/dist-packages (from nbconvert) (3.2.1) Requirement already satisfied: jinja2>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (2.11.2) Requirement already satisfied: defusedxml in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.6.0) Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (4.3.3) Requirement already satisfied: nbformat>=4.4 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (5.0.8) Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.8.4) Requirement already satisfied: pygments in /usr/local/lib/python3.6/dist-packages (from nbconvert) (2.6.1) Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.3) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.6/dist-packages (from nbconvert) (4.7.0) Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.6/dist-packages (from nbconvert) (1.4.3) Requirement already satisfied: testpath in /usr/local/lib/python3.6/dist-packages (from nbconvert) (0.4.4) Requirement already satisfied: webencodings in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (0.5.1) Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (1.15.0) Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from bleach->nbconvert) (20.7) Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from jinja2>=2.4->nbconvert) (1.1.1) Requirement already satisfied: decorator in /usr/local/lib/python3.6/dist-packages (from traitlets>=4.2->nbconvert) (4.4.2) Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from traitlets>=4.2->nbconvert) (0.2.0) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.6/dist-packages (from nbformat>=4.4->nbconvert) (2.6.0) Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->bleach->nbconvert) (2.4.7)In [ ]:
# convert ipynb to html # file name: Group115_CN7031_CN7031.ipynb !jupyter nbconvert --to html Group115_CN7031.ipynb