wk-7

Aanil
DataAnalysis.edited1.doc

Running Head: DATA ANALYSIS 2

DATA ANALYSIS 2

Data Analysis

Name

Institution Affiliation

Data

Abstract

Data that gets used in statistical analysis needs to get cleaned so that they will give the desired conclusions which are reliable. These cleaning processes can get done by preprocessing so that the appropriate data get used in the analysis.

Original data in statistical analysis

The statistical analysis task of the date requires interpretation of information that has been gathered from the field so that to arrive at the intended outcomes of the research. Original data is unorganized data which is difficult to make any interpretation about in the analysis tasks therefore it is not used in analytics tasks.

Steps in data preprocessing

The process starts with obtaining of relevant database which comprises of data which is gathered from many sources and is combined in the proper format to form a database. These databases vary with one another where databases used in a business are different from the database used in healthcare therefore relevant one has to be selected (Sharda et al.,2010).

Import all the crucial libraries by using the libraries that are reliable such as python pandas to ensure that the data is manipulated and analyzed. pandas will have the data imported and managed by datasets .data is packed in high-performance ease to use of data structure and data analysis.

Then the datasets are imported which have been gathered from the machine learning project at hand. It will be appropriate also to set the present directory as the working directory this directory is set by Spyder in three steps such as saving the python files in the dictionary that has dataset. Move to the file explorer in Spyder and select the dictionary that is required) (Alothman, 2019). Finally, the F5 button is clicked for the file to get executed.

It is crucial to also identify and correctly handle the values that are missing failure to do this will draw inaccurate and faulty conclusions from the data. It can be done by deleting particular rows and calculation of means.

Categorical data should be encoded within the dataset.

Every dataset obtained from machine learning models should be split into two parts such as having the training set and test sets.

Feature scaling should also be done at the end of the data preprocessing models. It is used to standardize the variables which are independent in the dataset.

Importance of preprocessing of data

The preprocessing of data reduces the complexity of data as the data in the world is unclean. When data is missing the attributes, values have outliers these will lead to degradation of data and the preprocessing of data removes these missing attributes making it easy to have the best conclusions out of the analysis.

the process that generates the power of Artificial intelligence.

This power is generated by simulations of human intelligence in the machines where the programs of the computer are programmed to mimic the humans and how they traits therefore the machines exhibit the traits that are similar to those of human beings in solving problems.

Differences between machine learning and deep learning.

Machine learning requires human interventions where human need to hand-code the applied features while in deep learning the system get to learn the features without additional interventions from the human. Machine learning requires simpler while deep learning requires powerful hardware.

References

Sharda, R., Delen, D., Turban, E. (2020). Analytics, Data Science, & Artificial Intelligence: Systems for Decision Support 11E. ISBN: 978-0-13-519201-6.

Alothman, B. (2019, June). Raw network traffic data preprocessing and preparation for automatic analysis. In 2019 International Conference on Cyber Security and Protection of Digital Services (Cyber Security) (pp. 1-5). IEEE.