Python Pandas
Task: Covid-19 Data Analysis¶
This notebook is used to understand the comprehension of Data Analysis techniques using Pandas library.¶
Data Source:¶
File naming convention¶
MM-DD-YYYY.csv in UTC.
Field description¶
-
Province_State: China - province name; US/Canada/Australia/ - city name, state/province name; Others - name of the event (e.g., "Diamond Princess" cruise ship); other countries - blank.
-
Country_Region: country/region name conforming to WHO (will be updated).
-
Last_Update: MM/DD/YYYY HH:mm (24 hour format, in UTC).
-
Confirmed: the number of confirmed cases. For Hubei Province: from Feb 13 (GMT +8), we report both clinically diagnosed and lab-confirmed cases. For lab-confirmed cases only (Before Feb 17), please refer to who_covid_19_situation_reports. For Italy, diagnosis standard might be changed since Feb 27 to "slow the growth of new case numbers." (Source)
-
Deaths: the number of deaths.
-
Recovered: the number of recovered cases.
Question 1¶
Read the dataset¶
In [1]:Out[1]:
| FIPS | Admin2 | Province_State | Country_Region | Last_Update | Lat | Long_ | Confirmed | Deaths | Recovered | Active | Combined_Key | Incident_Rate | Case_Fatality_Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | Afghanistan | 2021-01-11 05:21:50 | 33.939110 | 67.709953 | 53489 | 2277 | 43948 | 7264.0 | Afghanistan | 137.403749 | 4.256950 |
| 1 | NaN | NaN | NaN | Albania | 2021-01-11 05:21:50 | 41.153300 | 20.168300 | 63595 | 1241 | 37648 | 24706.0 | Albania | 2209.847800 | 1.951411 |
| 2 | NaN | NaN | NaN | Algeria | 2021-01-11 05:21:50 | 28.033900 | 1.659600 | 102144 | 2807 | 69212 | 30125.0 | Algeria | 232.934026 | 2.748081 |
| 3 | NaN | NaN | NaN | Andorra | 2021-01-11 05:21:50 | 42.506300 | 1.521800 | 8586 | 85 | 7724 | 777.0 | Andorra | 11112.405358 | 0.989984 |
| 4 | NaN | NaN | NaN | Angola | 2021-01-11 05:21:50 | -11.202700 | 17.873900 | 18193 | 416 | 13872 | 3905.0 | Angola | 55.354627 | 2.286594 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3972 | NaN | NaN | NaN | Vietnam | 2021-01-11 05:21:50 | 14.058324 | 108.277199 | 1514 | 35 | 1361 | 118.0 | Vietnam | 1.555396 | 2.311757 |
| 3973 | NaN | NaN | NaN | West Bank and Gaza | 2021-01-11 05:21:50 | 31.952200 | 35.233200 | 147400 | 1604 | 131117 | 14679.0 | West Bank and Gaza | 2889.393847 | 1.088195 |
| 3974 | NaN | NaN | NaN | Yemen | 2021-01-11 05:21:50 | 15.552727 | 48.516388 | 2104 | 611 | 1407 | 86.0 | Yemen | 7.054256 | 29.039924 |
| 3975 | NaN | NaN | NaN | Zambia | 2021-01-11 05:21:50 | -13.133897 | 27.849332 | 27728 | 469 | 20598 | 6661.0 | Zambia | 150.827167 | 1.691431 |
| 3976 | NaN | NaN | NaN | Zimbabwe | 2021-01-11 05:21:50 | -19.015438 | 29.154857 | 21477 | 507 | 12582 | 8388.0 | Zimbabwe | 144.500474 | 2.360665 |
3977 rows × 14 columns
Display the top 5 rows in the data¶
In [2]:Out[2]:
| FIPS | Admin2 | Province_State | Country_Region | Last_Update | Lat | Long_ | Confirmed | Deaths | Recovered | Active | Combined_Key | Incident_Rate | Case_Fatality_Ratio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | NaN | NaN | Afghanistan | 2021-01-11 05:21:50 | 33.93911 | 67.709953 | 53489 | 2277 | 43948 | 7264.0 | Afghanistan | 137.403749 | 4.256950 |
| 1 | NaN | NaN | NaN | Albania | 2021-01-11 05:21:50 | 41.15330 | 20.168300 | 63595 | 1241 | 37648 | 24706.0 | Albania | 2209.847800 | 1.951411 |
| 2 | NaN | NaN | NaN | Algeria | 2021-01-11 05:21:50 | 28.03390 | 1.659600 | 102144 | 2807 | 69212 | 30125.0 | Algeria | 232.934026 | 2.748081 |
| 3 | NaN | NaN | NaN | Andorra | 2021-01-11 05:21:50 | 42.50630 | 1.521800 | 8586 | 85 | 7724 | 777.0 | Andorra | 11112.405358 | 0.989984 |
| 4 | NaN | NaN | NaN | Angola | 2021-01-11 05:21:50 | -11.20270 | 17.873900 | 18193 | 416 | 13872 | 3905.0 | Angola | 55.354627 | 2.286594 |
Show the information of the dataset¶
In [3]:<class 'pandas.core.frame.DataFrame'> RangeIndex: 3977 entries, 0 to 3976 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 FIPS 3263 non-null float64 1 Admin2 3268 non-null object 2 Province_State 3807 non-null object 3 Country_Region 3977 non-null object 4 Last_Update 3977 non-null object 5 Lat 3891 non-null float64 6 Long_ 3891 non-null float64 7 Confirmed 3977 non-null int64 8 Deaths 3977 non-null int64 9 Recovered 3977 non-null int64 10 Active 3976 non-null float64 11 Combined_Key 3977 non-null object 12 Incident_Rate 3891 non-null float64 13 Case_Fatality_Ratio 3935 non-null float64 dtypes: float64(6), int64(3), object(5) memory usage: 435.1+ KB
Show the sum of missing values of features in the dataset¶
In [4]:Out[4]:
FIPS 714 Admin2 709 Province_State 170 Country_Region 0 Last_Update 0 Lat 86 Long_ 86 Confirmed 0 Deaths 0 Recovered 0 Active 1 Combined_Key 0 Incident_Rate 86 Case_Fatality_Ratio 42 dtype: int64
Question 2¶
Show the number of Confirmed cases by Country¶
In [5]:Out[5]:
| Country_Region | Confirmed | |
|---|---|---|
| 0 | Afghanistan | 53489 |
| 1 | Albania | 63595 |
| 2 | Algeria | 102144 |
| 3 | Andorra | 8586 |
| 4 | Angola | 18193 |
| ... | ... | ... |
| 186 | Vietnam | 1514 |
| 187 | West Bank and Gaza | 147400 |
| 188 | Yemen | 2104 |
| 189 | Zambia | 27728 |
| 190 | Zimbabwe | 21477 |
191 rows × 2 columns
Show the number of Deaths by Country¶
In [6]:Out[6]:
| Country_Region | Deaths | |
|---|---|---|
| 0 | Afghanistan | 2277 |
| 1 | Albania | 1241 |
| 2 | Algeria | 2807 |
| 3 | Andorra | 85 |
| 4 | Angola | 416 |
| ... | ... | ... |
| 186 | Vietnam | 35 |
| 187 | West Bank and Gaza | 1604 |
| 188 | Yemen | 611 |
| 189 | Zambia | 469 |
| 190 | Zimbabwe | 507 |
191 rows × 2 columns
Show the number of Recovered cases by Country¶
In [7]:Out[7]:
| Country_Region | Recovered | |
|---|---|---|
| 0 | Afghanistan | 43948 |
| 1 | Albania | 37648 |
| 2 | Algeria | 69212 |
| 3 | Andorra | 7724 |
| 4 | Angola | 13872 |
| ... | ... | ... |
| 186 | Vietnam | 1361 |
| 187 | West Bank and Gaza | 131117 |
| 188 | Yemen | 1407 |
| 189 | Zambia | 20598 |
| 190 | Zimbabwe | 12582 |
191 rows × 2 columns
Show the number of Active Cases by Country¶
In [8]:Out[8]:
| Country_Region | Active | |
|---|---|---|
| 0 | Afghanistan | 7264 |
| 1 | Albania | 24706 |
| 2 | Algeria | 30125 |
| 3 | Andorra | 777 |
| 4 | Angola | 3905 |
| ... | ... | ... |
| 186 | Vietnam | 118 |
| 187 | West Bank and Gaza | 14679 |
| 188 | Yemen | 86 |
| 189 | Zambia | 6661 |
| 190 | Zimbabwe | 8388 |
191 rows × 2 columns
Show the latest number of Confirmed, Deaths, Recovered and Active cases Country-wise¶
In [9]:/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.Out[9]:
| Country_Region | Confirmed | Deaths | Recovered | Active | |
|---|---|---|---|---|---|
| 0 | Afghanistan | 53489 | 2277 | 43948 | 7264 |
| 1 | Albania | 63595 | 1241 | 37648 | 24706 |
| 2 | Algeria | 102144 | 2807 | 69212 | 30125 |
| 3 | Andorra | 8586 | 85 | 7724 | 777 |
| 4 | Angola | 18193 | 416 | 13872 | 3905 |
| ... | ... | ... | ... | ... | ... |
| 186 | Vietnam | 1514 | 35 | 1361 | 118 |
| 187 | West Bank and Gaza | 147400 | 1604 | 131117 | 14679 |
| 188 | Yemen | 2104 | 611 | 1407 | 86 |
| 189 | Zambia | 27728 | 469 | 20598 | 6661 |
| 190 | Zimbabwe | 21477 | 507 | 12582 | 8388 |
191 rows × 5 columns
Question 3¶
Show the countries with no recovered cases¶
In [10]:Out[10]:
| Country_Region | Recovered | |
|---|---|---|
| 16 | Belgium | 0 |
| 151 | Serbia | 0 |
| 165 | Sweden | 0 |
| 177 | US | 0 |
Show the countries with no confirmed cases¶
In [11]:Out[11]:
| Country_Region | Confirmed |
|---|
Show the countries with no deaths¶
In [12]:Out[12]:
| Country_Region | Deaths | |
|---|---|---|
| 30 | Cambodia | 0 |
| 50 | Dominica | 0 |
| 75 | Holy See | 0 |
| 95 | Laos | 0 |
| 111 | Marshall Islands | 0 |
| 143 | Saint Kitts and Nevis | 0 |
| 145 | Saint Vincent and the Grenadines | 0 |
| 146 | Samoa | 0 |
| 157 | Solomon Islands | 0 |
| 172 | Timor-Leste | 0 |
| 184 | Vanuatu | 0 |
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead. """Entry point for launching an IPython kernel.Out[13]:
| Country_Region | Confirmed | Deaths | Recovered | |
|---|---|---|---|---|
| 30 | Cambodia | 391 | 0 | 374 |
| 50 | Dominica | 106 | 0 | 98 |
| 75 | Holy See | 27 | 0 | 15 |
| 95 | Laos | 41 | 0 | 40 |
| 111 | Marshall Islands | 4 | 0 | 4 |
| 143 | Saint Kitts and Nevis | 34 | 0 | 31 |
| 145 | Saint Vincent and the Grenadines | 249 | 0 | 100 |
| 146 | Samoa | 2 | 0 | 2 |
| 157 | Solomon Islands | 17 | 0 | 10 |
| 172 | Timor-Leste | 49 | 0 | 41 |
| 184 | Vanuatu | 1 | 0 | 1 |
Question 4¶
Show the Top 10 countries with Confirmed cases¶
In [14]:Out[14]:
| Last_Update | Confirmed | Deaths | Recovered | |
|---|---|---|---|---|
| Country_Region | ||||
| France | 2021-01-10 05:22:12 | 2762120 | 67229 | 178036 |
| United Kingdom | 2021-01-10 05:22:12 | 2613838 | 70541 | 2850 |
| Turkey | 2021-01-10 05:22:12 | 2317118 | 22631 | 2190047 |
| India | 2021-01-10 05:22:12 | 1965556 | 50027 | 1861400 |
| Argentina | 2021-01-10 05:22:12 | 1714409 | 44417 | 1504330 |
| Brazil | 2021-01-10 05:22:12 | 1540513 | 48298 | 1350513 |
| Poland | 2021-01-10 05:22:12 | 1376389 | 31011 | 1113738 |
| Iran | 2021-01-10 05:22:12 | 1280438 | 56100 | 1067466 |
| South Africa | 2021-01-10 05:22:12 | 1214176 | 32824 | 956712 |
| US | 2021-01-10 05:22:12 | 907077 | 12090 | 0 |
Show the Top 10 Countries with Active cases¶
In [15]:Out[15]:
| Last_Update | Confirmed | Deaths | Recovered | Active | |
|---|---|---|---|---|---|
| Country_Region | |||||
| United Kingdom | 2021-01-10 05:22:12 | 2613838 | 70541 | 2850 | 2543297 |
| France | 2021-01-10 05:22:12 | 2762120 | 67229 | 178036 | 2516855 |
| US | 2021-01-10 05:22:12 | 907077 | 12090 | 0 | 894987 |
| Peru | 2021-01-10 05:22:12 | 454697 | 16923 | 966447 | 437774 |
| Spain | 2021-01-10 05:22:12 | 421295 | 11971 | 40736 | 368588 |
| Serbia | 2021-01-10 05:22:12 | 356125 | 3513 | 0 | 352612 |
| Mexico | 2021-01-10 05:22:12 | 364770 | 20251 | 1143254 | 346710 |
| Poland | 2021-01-10 05:22:12 | 1376389 | 31011 | 1113738 | 231640 |
| South Africa | 2021-01-10 05:22:12 | 1214176 | 32824 | 956712 | 224640 |
| Netherlands | 2021-01-10 05:22:12 | 214929 | 2963 | 5436 | 211966 |
Question 5¶
Plot Country-wise Total deaths, confirmed, recovered and active casaes where total deaths have exceeded 50,000¶
In [16]:import matplotlib.pyplot as pltIn [17]:
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead. This is separate from the ipykernel package so we can avoid doing imports until
Question 6¶
Plot Province/State wise Deaths in USA¶
In [18]:import plotly.express as pxIn [19]:
covid_data= pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-09-2021.csv')
In [20]:
covid_data.columnsOut[20]:
Index(['FIPS', 'Admin2', 'Province_State', 'Country_Region', 'Last_Update',
'Lat', 'Long_', 'Confirmed', 'Deaths', 'Recovered', 'Active',
'Combined_Key', 'Incident_Rate', 'Case_Fatality_Ratio'],
dtype='object')
In [21]:
Question 7¶
Plot Province/State Wise Active Cases in USA¶
In [22]:Question 8¶
Plot Province/State Wise Confirmed cases in USA¶
In [23]:/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
Question 9¶
Plot Worldwide Confirmed Cases over time¶
In [24]:import plotly.express as px import plotly.io as pioIn [25]:
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.In [ ]: