Discussion

profilereddygs17
DataVisualizationResearch.docx

Section 1: Discuss Topic Background (Citations are required)

Data set is nothing but collection of data. We have data set name as G4ResumeNames. It contains the information about candidate's subtleties we needed in recruiting the worker. In our research paper we are focusing on various fields from data set industry, received call, number of resumes, and experience based on open position. We are working on Industry, no of resume received by enterprises. The data set corresponds with the database tables that represents different variable on the columns and the row corresponds to the given record of the data set in questions (Waskom & etal, 2020).

During the interpretation of their data different data sets we will identify the different plots points and directions and come up with different visualization to show and denote the form of distribution by comparing the candidates experiences and the positions they are looking forward to get there employment. We are presenting the different visualization in our research utilizing bar chart, pie chart, and box plot. Utilizing the data set information of the candidates we will compare the experience of different workers with their position.

Dataset: G4Resumenames

From the given raw data set file “G4Resumenames1”, from the analysis we have identified some of the key factor. From the Analysis we have also build various data visualization reported in the form of :

· Bar Plot

· Pie Chart

· Box plot

We have installed the following packages to develop these data analysis:

1. install.packages("dplyr")

This package provides a set of tools for efficiently manipulating datasets in R.

The main purpose to use this dplyr is faster, has a more consistent API and should be easier to use.

2. install.packages("skimr")

The skim function is a good addition to the summary function. It displays most of the numerical attributes from summary, but it also displays missing values, more quantile information, and an inline histogram for each variable!

3. Install devtools package.

install.packages("devtools")

library(devtools)

Commands used to Build the Data Report from the given Data Report:

Tools: R Studio

Following Commands used to build the project from the Data Analysis perspective:

1.

VIS_Dat:

The Vis_dat() function of the visdat package is a great way to visualize the data type and missing data within a data frame.

2.

> library(skimr)

> skim(G4ResumeNames1)

The skim function is a good addition to the summary function. Which displays most of the numerical attributes from summary, but it also displays missing values, more quantile information, and an inline histogram for each variable!

3. Head:

To begin, we are going to run the head function, which allows us to see the first 6 rows by default. We are going to override the default and ask to preview the first 20 rows.

> head(G4ResumeNames1, 20)

5. G4ResumeNames1 - Loads all the values and display about the table.

G4ResumeNames1$ethnicity - Displaying a particular column as a group.

G4ResumeNames1[1,2] - to precisely pick the respective value from the table.

dim(G4ResumeNames1) - Display the dimension of the table.

6. Creating Data explorer: (output will be generated in separate HTML File)

library(DataExplorer)

DataExplorer::create_report(G4ResumeNames1)

Missing Data Profile:

From the R Studio, using the given raw data, as I have analyzed they were no “Null” data nor a missing profile. The entire raw data set does not have any missing data as attached in figure 2.

Graphical user interface  Description automatically generated with low confidenceFigure 2: Missing Data Profile

Determining Minimum & Maximum Range value:

From the given record, every column as a specific value of minimum and maximum ranges of value. As we have taken 4 values from the given data set “Ethinicity, City, Experience and Industry”.

With the help of R studio, we were able to determine the Minimum, Maximum range of the project.

Below is the attachment of figure 3.

Graphical user interface, application  Description automatically generated

Figure 3: Minimum, Maximum range of value from the given raw data

Data Summary:

We at that point run the rundown capacity to show every segment, its information type, and a couple of different ascribes that are particularly valuable for numeric credits. We can see that for all the numeric credits, it additionally shows min, first quartile, middle, mean, third quartile, and max esteems.

This view helps the user to read and understand the values from the given raw or meta data.

Graphical user interface, application  Description automatically generated

Figure 4(a): Data Summary

Graphical user interface, application  Description automatically generated

Figure 4(b) – Data Summary

Data Profiling Report:

Creating Dataexplorer: (output will be generated in separate HTML File)

> library(DataExplorer)

> DataExplorer::create_report(G4ResumeNames1)

The above command brings the function, and it will pull a full data profile of our data frame.

It will produce a html file with the basic statistics, structure, missing data, distribution visualizations, correlation matrix and principal component analysis for the given data frame!

A picture containing graphical user interface  Description automatically generated

Figure 5(a) – Data Profiling report

Graphical user interface, application  Description automatically generatedFigure 5(b)

References

Waskom, M., Botvinnik, O., Gelbart, M., Ostblom, J., Hobson, P., Lukauskas, S., ... & Brunner, T. (2020). Seaborn: statistical data visualization. Astrophysics Source Code Library, ascl-2012. https://ui.adsabs.harvard.edu/abs/2020ascl.soft12015W/abstract