Python Pandas

profileHGFGkhhdsf
pandas-Copy-Copy-Copy-Copy.html

Task: Covid-19 Data Analysis

This notebook is used to understand the comprehension of Data Analysis techniques using Pandas library.

Data Source:

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports

File naming convention

MM-DD-YYYY.csv in UTC.

Field description

  • Province_State: China - province name; US/Canada/Australia/ - city name, state/province name; Others - name of the event (e.g., "Diamond Princess" cruise ship); other countries - blank.

  • Country_Region: country/region name conforming to WHO (will be updated).

  • Last_Update: MM/DD/YYYY HH:mm (24 hour format, in UTC).

  • Confirmed: the number of confirmed cases. For Hubei Province: from Feb 13 (GMT +8), we report both clinically diagnosed and lab-confirmed cases. For lab-confirmed cases only (Before Feb 17), please refer to who_covid_19_situation_reports. For Italy, diagnosis standard might be changed since Feb 27 to "slow the growth of new case numbers." (Source)

  • Deaths: the number of deaths.

  • Recovered: the number of recovered cases.

Question 1

Read the dataset
In [1]:
 
Out[1]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incident_Rate Case_Fatality_Ratio
0 NaN NaN NaN Afghanistan 2021-01-11 05:21:50 33.939110 67.709953 53489 2277 43948 7264.0 Afghanistan 137.403749 4.256950
1 NaN NaN NaN Albania 2021-01-11 05:21:50 41.153300 20.168300 63595 1241 37648 24706.0 Albania 2209.847800 1.951411
2 NaN NaN NaN Algeria 2021-01-11 05:21:50 28.033900 1.659600 102144 2807 69212 30125.0 Algeria 232.934026 2.748081
3 NaN NaN NaN Andorra 2021-01-11 05:21:50 42.506300 1.521800 8586 85 7724 777.0 Andorra 11112.405358 0.989984
4 NaN NaN NaN Angola 2021-01-11 05:21:50 -11.202700 17.873900 18193 416 13872 3905.0 Angola 55.354627 2.286594
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3972 NaN NaN NaN Vietnam 2021-01-11 05:21:50 14.058324 108.277199 1514 35 1361 118.0 Vietnam 1.555396 2.311757
3973 NaN NaN NaN West Bank and Gaza 2021-01-11 05:21:50 31.952200 35.233200 147400 1604 131117 14679.0 West Bank and Gaza 2889.393847 1.088195
3974 NaN NaN NaN Yemen 2021-01-11 05:21:50 15.552727 48.516388 2104 611 1407 86.0 Yemen 7.054256 29.039924
3975 NaN NaN NaN Zambia 2021-01-11 05:21:50 -13.133897 27.849332 27728 469 20598 6661.0 Zambia 150.827167 1.691431
3976 NaN NaN NaN Zimbabwe 2021-01-11 05:21:50 -19.015438 29.154857 21477 507 12582 8388.0 Zimbabwe 144.500474 2.360665

3977 rows × 14 columns

Display the top 5 rows in the data
In [2]:
 
Out[2]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incident_Rate Case_Fatality_Ratio
0 NaN NaN NaN Afghanistan 2021-01-11 05:21:50 33.93911 67.709953 53489 2277 43948 7264.0 Afghanistan 137.403749 4.256950
1 NaN NaN NaN Albania 2021-01-11 05:21:50 41.15330 20.168300 63595 1241 37648 24706.0 Albania 2209.847800 1.951411
2 NaN NaN NaN Algeria 2021-01-11 05:21:50 28.03390 1.659600 102144 2807 69212 30125.0 Algeria 232.934026 2.748081
3 NaN NaN NaN Andorra 2021-01-11 05:21:50 42.50630 1.521800 8586 85 7724 777.0 Andorra 11112.405358 0.989984
4 NaN NaN NaN Angola 2021-01-11 05:21:50 -11.20270 17.873900 18193 416 13872 3905.0 Angola 55.354627 2.286594
Show the information of the dataset
In [3]:
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3977 entries, 0 to 3976
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   FIPS                 3263 non-null   float64
 1   Admin2               3268 non-null   object 
 2   Province_State       3807 non-null   object 
 3   Country_Region       3977 non-null   object 
 4   Last_Update          3977 non-null   object 
 5   Lat                  3891 non-null   float64
 6   Long_                3891 non-null   float64
 7   Confirmed            3977 non-null   int64  
 8   Deaths               3977 non-null   int64  
 9   Recovered            3977 non-null   int64  
 10  Active               3976 non-null   float64
 11  Combined_Key         3977 non-null   object 
 12  Incident_Rate        3891 non-null   float64
 13  Case_Fatality_Ratio  3935 non-null   float64
dtypes: float64(6), int64(3), object(5)
memory usage: 435.1+ KB
Show the sum of missing values of features in the dataset
In [4]:
 
Out[4]:
FIPS                   714
Admin2                 709
Province_State         170
Country_Region           0
Last_Update              0
Lat                     86
Long_                   86
Confirmed                0
Deaths                   0
Recovered                0
Active                   1
Combined_Key             0
Incident_Rate           86
Case_Fatality_Ratio     42
dtype: int64

Question 2

Show the number of Confirmed cases by Country
In [5]:
 
Out[5]:
Country_Region Confirmed
0 Afghanistan 53489
1 Albania 63595
2 Algeria 102144
3 Andorra 8586
4 Angola 18193
... ... ...
186 Vietnam 1514
187 West Bank and Gaza 147400
188 Yemen 2104
189 Zambia 27728
190 Zimbabwe 21477

191 rows × 2 columns

Show the number of Deaths by Country
In [6]:
 
Out[6]:
Country_Region Deaths
0 Afghanistan 2277
1 Albania 1241
2 Algeria 2807
3 Andorra 85
4 Angola 416
... ... ...
186 Vietnam 35
187 West Bank and Gaza 1604
188 Yemen 611
189 Zambia 469
190 Zimbabwe 507

191 rows × 2 columns

Show the number of Recovered cases by Country
In [7]:
 
Out[7]:
Country_Region Recovered
0 Afghanistan 43948
1 Albania 37648
2 Algeria 69212
3 Andorra 7724
4 Angola 13872
... ... ...
186 Vietnam 1361
187 West Bank and Gaza 131117
188 Yemen 1407
189 Zambia 20598
190 Zimbabwe 12582

191 rows × 2 columns

Show the number of Active Cases by Country
In [8]:
 
Out[8]:
Country_Region Active
0 Afghanistan 7264
1 Albania 24706
2 Algeria 30125
3 Andorra 777
4 Angola 3905
... ... ...
186 Vietnam 118
187 West Bank and Gaza 14679
188 Yemen 86
189 Zambia 6661
190 Zimbabwe 8388

191 rows × 2 columns

Show the latest number of Confirmed, Deaths, Recovered and Active cases Country-wise
In [9]:
 
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  
Out[9]:
Country_Region Confirmed Deaths Recovered Active
0 Afghanistan 53489 2277 43948 7264
1 Albania 63595 1241 37648 24706
2 Algeria 102144 2807 69212 30125
3 Andorra 8586 85 7724 777
4 Angola 18193 416 13872 3905
... ... ... ... ... ...
186 Vietnam 1514 35 1361 118
187 West Bank and Gaza 147400 1604 131117 14679
188 Yemen 2104 611 1407 86
189 Zambia 27728 469 20598 6661
190 Zimbabwe 21477 507 12582 8388

191 rows × 5 columns

Question 3

Show the countries with no recovered cases

In [10]:
 
Out[10]:
Country_Region Recovered
16 Belgium 0
151 Serbia 0
165 Sweden 0
177 US 0
Show the countries with no confirmed cases
In [11]:
 
Out[11]:
Country_Region Confirmed
Show the countries with no deaths
In [12]:
 
Out[12]:
Country_Region Deaths
30 Cambodia 0
50 Dominica 0
75 Holy See 0
95 Laos 0
111 Marshall Islands 0
143 Saint Kitts and Nevis 0
145 Saint Vincent and the Grenadines 0
146 Samoa 0
157 Solomon Islands 0
172 Timor-Leste 0
184 Vanuatu 0
In [13]:
 
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  """Entry point for launching an IPython kernel.
Out[13]:
Country_Region Confirmed Deaths Recovered
30 Cambodia 391 0 374
50 Dominica 106 0 98
75 Holy See 27 0 15
95 Laos 41 0 40
111 Marshall Islands 4 0 4
143 Saint Kitts and Nevis 34 0 31
145 Saint Vincent and the Grenadines 249 0 100
146 Samoa 2 0 2
157 Solomon Islands 17 0 10
172 Timor-Leste 49 0 41
184 Vanuatu 1 0 1

Question 4

Show the Top 10 countries with Confirmed cases
In [14]:
 
Out[14]:
Last_Update Confirmed Deaths Recovered
Country_Region
France 2021-01-10 05:22:12 2762120 67229 178036
United Kingdom 2021-01-10 05:22:12 2613838 70541 2850
Turkey 2021-01-10 05:22:12 2317118 22631 2190047
India 2021-01-10 05:22:12 1965556 50027 1861400
Argentina 2021-01-10 05:22:12 1714409 44417 1504330
Brazil 2021-01-10 05:22:12 1540513 48298 1350513
Poland 2021-01-10 05:22:12 1376389 31011 1113738
Iran 2021-01-10 05:22:12 1280438 56100 1067466
South Africa 2021-01-10 05:22:12 1214176 32824 956712
US 2021-01-10 05:22:12 907077 12090 0
Show the Top 10 Countries with Active cases
In [15]:
 
Out[15]:
Last_Update Confirmed Deaths Recovered Active
Country_Region
United Kingdom 2021-01-10 05:22:12 2613838 70541 2850 2543297
France 2021-01-10 05:22:12 2762120 67229 178036 2516855
US 2021-01-10 05:22:12 907077 12090 0 894987
Peru 2021-01-10 05:22:12 454697 16923 966447 437774
Spain 2021-01-10 05:22:12 421295 11971 40736 368588
Serbia 2021-01-10 05:22:12 356125 3513 0 352612
Mexico 2021-01-10 05:22:12 364770 20251 1143254 346710
Poland 2021-01-10 05:22:12 1376389 31011 1113738 231640
South Africa 2021-01-10 05:22:12 1214176 32824 956712 224640
Netherlands 2021-01-10 05:22:12 214929 2963 5436 211966

Question 5

Plot Country-wise Total deaths, confirmed, recovered and active casaes where total deaths have exceeded 50,000
In [16]:
import matplotlib.pyplot as plt
In [17]:
 
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
  This is separate from the ipykernel package so we can avoid doing imports until

Question 6

Plot Province/State wise Deaths in USA

In [18]:
import plotly.express as px
In [19]:
covid_data= pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-09-2021.csv')
In [20]:
covid_data.columns
Out[20]:
Index(['FIPS', 'Admin2', 'Province_State', 'Country_Region', 'Last_Update',
       'Lat', 'Long_', 'Confirmed', 'Deaths', 'Recovered', 'Active',
       'Combined_Key', 'Incident_Rate', 'Case_Fatality_Ratio'],
      dtype='object')
In [21]:
 

Question 7

Plot Province/State Wise Active Cases in USA

In [22]:
 

Question 8

Plot Province/State Wise Confirmed cases in USA

In [23]:
 
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning:

Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.

Question 9

Plot Worldwide Confirmed Cases over time

In [24]:
import plotly.express as px
import plotly.io as pio
In [25]:
 
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning:

Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.

In [ ]: