Project work operational

profileAbhi
Corona.docx

Week 1 --------

INTRODUCTION:

Problem Description

Viruses have been a treat to human life since decades. Pandemics like Spanish flu, Ebola and SARS virus have laid a drastic impact on human life. The difficult part about these viruses is that, as they change their structure frequently, it is tough to provide vaccination that can help the human immune system to fight these viruses. (RADWAN, HAFEEZ, & MAMDOUH, 2013)

After the recent COVID-19 attack, it became oblivious that there might be more pandemics in near future even if the world fights against this.  (Al-Turaiki, Alshahrani, & Almutairi, 2016)

The problem I chose here to find the impact of corona virus in the United States of America. Main objective is to view the total number of positive, negative cases for the critically affected states which has a death count > 1000 in order to view the mortality rate of the COVID19 virus.

Approach – The database that was selected is regarding the corona cases across the globe. We want to filter the data such that we need only data from United states.

Background –

Dataset – Selection of database was based on the present scenario of the county’s crisis related to COVID19. We get the data set which is a daily feed from John Hopkins White School of Engineering refreshed daily at 9 am Est. Our filter needs to be such that we need data only from United States.

Cleaning up of dataset – Cleaning up the dataset as per the requirement was the heavy lifting that was a much-needed step in order to remove lot of noise so that the story drawn was a much closer to expected. The refined data is now reviewed in Tableau for the cluster analysis and visualize data with a bar/line graph. All the zero and null values are cleansed with some value. This extracted data is copied into a new dataset to perform further clustering and validations.

After reviewing the data set some null values under the Hospitalized column have been detected which suggests that further cleansing of data is needed.

Results –

Data Analysis

The analysis we chose to perform here is the states which have death count > 100.

In order to do that we did upload the filtered data in the selected tool tableau.

Lot of irrelevant columns can be hidden and only the columns which are needed must be seen.

The picture illustrates the proper filtering of death vs state data for the purpose of data analysis.

Further Analysis

1. Sum of total no. of negative cases vs State –

1. Sum of total no. positive cases vs State –

1. Total no. of deaths vs State –

Clustering analysis

In order to perform data mining, we chose to perform cluster analysis on the pre-selected data.

We are applying scatter plot in order to visualize the dataset in cluster method. (tableau, 2003-2020).

The graph listed below represents the total number of positive cases (county-wise) vs the total number of deaths (county-wise).

Evaluation

In order to perform model-evaluation on the data set we chose to apply linear regression.

This is performed by using trend line. (Data Crunch, 2020)

Discussion

The above graphs clearly imply:

1. Active testing is being performed in very state.

1. More than 75% of the tested cases are negative which implies that people with other type of flu diagnosed with similar symptoms have also been tested for corona.

1. Death count in NY, NJ, MI is very high. On further analysis we can see NY, NJ has a very high count of cases, thus deaths would be relatively high on comparison. But coming to MI the positive count is not very high on comparison with WA, LA which indicates either:

2. Active medication is not available in MI,

2. Patients due to lack of awareness are not reaching out to hospitals at an early stage due to various concerns.

Conclusion

Corona virus has a drastic impact on people lives altering their lifestyle.

Washington state data represents the handling of situation and proper medication can help the recovery of people.

New York data demonstrates the importance of social distancing.

Michigan data shows the awareness needed among people in order to reach out to hospitals when people undergo initial symptoms. The more the delay, the more the death count.

Week 2 ------------------

The database that was selected is a daily feed from John Hopkins White School of Engineering refreshed at 9 am Est on daily basis. The data base which was downloaded as of 28th March (most recent feed) regarding the corona cases reported in each country as of 28th March.

We want to filter the data such that we need only data from United states. Hence the extracted is resulted with consists of data set from United states. Further classification is done based on data to be from only 28th and 27th of March as we may not have any data from Washington and California region regarding cases being reported on 28th.

This refined data set is obtained and now reviewed.

This selected data set has some null values under the Hospitalized column.

The rest of the data is non-null and has either zero or values

Hence, this suggests that the data is cleansed and is a good candidate for the visualization purpose.

The main objective as part of data review is to:

View the total number of negative, positive cases for the most critically affected states which has death count>100 to analyze whether testing has been actively being held in the states, how bad is it contagious and the mortality rate.

The tool we selected is Tableau for public version. It is open source.

After successful download, the data set has been uploaded in an excel format.

Although it can still accept json, csv formats.

Now sort the data based on:

1. Death to be > 100 (as these states are greatly impacted),

1. On or after 27th March as we want to analyze for the current data

Once the data has been uploaded you can see the data being populated in the tableau workspace.

Now we can see lot of column such as date modified, fips codes, grade, last update, check in time, commercial state value, etc. which may not be relevant to our objective. Hence, we can hide such column.

Objective: The selected objective is to visualize the graph for the most critically affected states:

1. Sum (negative cases) vs State –

1. Sum (positive cases) vs State –

1. Total number of cases vs State –

These number imply that active testing has been going on and more than 75% of the tested cases are negative which implies that people with other type of flu diagnosed with similar symptoms have also been tested for corona.