Analyzing & Visualizing Data - PPT

profilekumar187
PPTdoc.docx

PREFERENCES FOR CAR CHOICE IN THE UNITED STATES 2

PREFERENCES FOR CAR CHOICE IN THE UNITED STATES 2

Table of Contents

Introduction………………………………………………………………………………………..3

Background 3

Data Analysis 4

Data Visualization 9

Conclusion 1 6

References 1 7

Introduction

The most common applications of Statistics is describing a set of descriptive data statistics, regression, and hypothesis testing and inferential statistics. The two main branches are descriptive and inferential statistics. People who do not have any formal training in statistics are more familiar with inferential statistics than with descriptive statistics. In this paper, the data will analyze using descriptive statistics. So we will focus on the descriptive branch of the statistics.

Descriptive Statistics Definition

The descriptive statistics are the type of statistical analysis that helps to describe the data in some meaningful way. The statistics are helpful to describe quantitatively about the essential features of the data or information. The descriptive statistics give the summaries of the given sample as well as the observations done. These summaries or descriptions can either be graphical or quantitative.

Background

This study will focus on and analyzing & Visualizing the data set about Preferences For Car Choice In The United States. The data set contained 4654 observations and 71 columns. There are several different types of graphs that help describe the statistical data. These graphs are histogram, bar graph, box and whisker plot, line graph, scatter plot, ogive, pie chart, and many more. Generally, the kinds of measurements that can use with descriptive statistics are:

The measure of central tendency describes the data which lies in the center of a given frequency distribution. The main steps of central tendency are mean and median and mode (Nick, 2020).

The spread measure describes how the scores are spread across the entire distribution. In the spread, measurements that are included standard deviation, variance, quartiles, range, absolute difference.

Data Analysis

One of the essential concepts of statistics is data analysis. It is the process that is observing the data, analyzing, and modeling the data. The purpose of data analysis is to obtain useful data information and state conclusions which support decision-making. The data analysis can be performed under several techniques using different approaches. The method of data assessment and analysis can be achieved by using analytical and logical approaches to examine each component of the data provided. Data from various sources are collected, reviewed, and then explained for decision making or conclusions. There are several methods for analyzing the results. Data mining, text analytics, and business intelligence are some of the most commonly used techniques and data visualizations.

The data analysis aims to collect raw data and convert it into useful decision-making information. The various stages of analysis of the data are as follows:

i) To make some type of sense out of each data collection

ii) To look for patterns and relationships both within a collection and also across groups,

iii) To make general discoveries about the phenomena you are researching

Before further analysis, I would like to create compactly display the structure of the given dataset.

The below list describes the data contents:

Descriptive summary of the data set: using the r code function 

Figure 1.1 : Car Frame

Figure 1.2 : Price Range

Figure 1.3 : Pollution and Speed

Figure 1.4 : Pollution and Size


Figure 1.5:
 descriptive table for summary.data.frame(Car)

Table 1.1: Abstract table for price

Table 1.2: Abstract table for account

From the descriptive summary table, The minimum price is in term of vehicle divided by the logarithm of income for price one variable is 4.296, price three variable is 4.173 and for price five variable is 4.150 I excluded price 2, price four and price because they have the same mean and median to price 1, price three and price five simultaneously. The ranges intern of hundreds of miles vehicle can travel between refueling/recharging. The mean value for range 1 is 160.49, followed by range three is 240.38, and interval 5 is 312.03.

Data Visualization

Data visualization is the portrayal of data or data in a diagram, outline, or other visual arrangements. It imparts connections to the data with pictures. We need data visualization because a visual outline of data makes it simpler to distinguish examples and patterns than glancing through a large number of lines on a spreadsheet. It is how the human cerebrum works. Since the motivation behind data examination is to pick up experiences, data is considerably more critical if we imagine. Regardless of whether a data investigator can pull bits of knowledge from data without Visualization, it will be progressively hard to convey the significance without Visualization. Outlines and diagrams make communicating data discoveries simpler regardless of whether you can distinguish the examples without them (Sheskin, 2017).

This is significant because it permits patterns and examples to be all the more effectively observed. With the ascent of enormous data upon us, we should have the option to decipher progressively bigger bunches of data. AI makes it simpler to lead investigations, for example, prescient examination, which would then be able to fill in as supportive visualizations to introduce. 

Categorical variable Visualizing and Analyzing.

Figure 2.1: Choice of a vehicle among six propositions

From the pie chart, we can create a table for better understanding.

Choice 5 is the highest percentage, followed by choice 3. While choice 2 is the lowest number of choices.

Table 1.3 : Choice of a vehicle among six propositions

Variables college education, size of household greater than 2, and commute lower than 5 miles a day.

Here 0 represents No, and one represents Yes.

Figure 2.2: College Figure 2.3: Households

Figure 2.4 : column5

The below represent the summary of the three chart:

Table 1.3: column5

Variable types

Body type, one regular car, sport utility vehicle, sports car, station wagon, truck, van, for each proposition z from 1 to 6.

Figure 2.5 :Type 1 Figure 2.6 :Type 2

Figure 2.7 :Type 3 Figure 2.8 :Type 4

Figure 2.9 :Type 5 Figure 3.0 :Type 6

The summary table of the type's variable is given below.

Table 1.4: Summary Variable

The most Preferences car is a regular car in the United States, followed by a truck.

Figure 3.1 :Type Fuel 1 Figure 3.2 :Type Fuel 2

Figure 3.3 :Type Fuel 3 Figure 3.4 :Type Fuel 4

Figure 3.5 :Type Fuel 5 Figure 3.6 :Type Fuel 6

The summary of the fuel variable is given in the table retrieved from the charts.

Table 1.5: Summary Variable

CNG is the most common fuel, and while gasoline is the least common fuel. Variable acceleration, tens of seconds required to reach 30 mph from stop and speeds highest attainable speed in hundreds of mph.

Figure 3.7 : Car Data Figure 3.8 :Car speed

Figure 3.9 :Car vs speed

From the summary table, we can conclude that.

Table 1.6: Summary Pollution

Sizes: 0 for a mini, 1 for a subcompact, 2 for a compact, and 3 for a mid-size or large vehicle.

Figure 4.0 :Car vs speed

A bar chart shows the relations between discrete categories. One axis of the graph represents the individual groups being compared, and the other axis indicates a calculated value, the diagram is shown above informs us that the most preferred configuration is a mid-size or large vehicle for the variable size. In contrast, the least preference is the mini size.

Space: Fraction of luggage space in a comparable new gas vehicle.

Table 1.7: Luggage space

Costs: cost per mile of travel (tens of cents): home recharging for an electric vehicle, station refueling otherwise

Stations: A fraction of stations that can refuel/recharge the vehicle

Table 1.8: Station refuel or recharge

A scatter plot, or scatter graph, is a visual representation of two variables (Cost and Speed) in a set of data. The plot represents using Cartesian coordinates with the independent variable x (speed) on the horizontal axis and the dependent variable y (cost) on the vertical axis. From the scatter plot, there is a weak positive relationship exist between cost and speed. The correlation coefficient ® measures the linear relationship between two variables, with a value range of -1 to 1. The correlation coefficient ® between cost and speed is 0.145011 shows that there is a weak positive relationship exist between cost and speed.

Conclusion

Based on the analysis, we can conclude that the minimum price in terms of the vehicle divided by the income logarithm for the price 1 variable is 4,296, the price 3 variable is 4,173, and the price 5 variable is 4,150. We excluded price 2, price 4, and price because they have the same mean and mean as price 1, price 3, and price 5 at the same time.

The most preferred choice is choice5, and the least option is choice2, there are 23% of respondents are college not educated while 77% are college-educated. 22% of respondents sizes of households are more significant than 2, and 78% size of household families is smaller than 2. In the sample data, 36% commute shorter than 5 miles a day, while 64% are commute higher than 5 miles a day. The preferable vehicle is a regular car, and the preferred fuel is CNG, and the least chosen fuel is gasoline. The correlation coefficient (r) between cost and speed is 0.145011 shows that there is a weak positive relationship exist between cost and speed.

References Reid, H. (2013, August). Introduction to Statistics. SAGE Publication. Jackson, S. L. (2017). Statistics plain and simple. Boston, MA: Cengage Learning Alan, J. (2018). Ohio touts successes against human trafficking. Ohio: The Columbus Dispatch. Erik, M. (2017). Regression Analysis. Market Research, 12(7), 31. Fishe, R. (2016). the social relationship between the teenager's psychological changes and physiological changes. Journal of medical statistics, 11(2), 32. Sheskin, D. J. (2017). Handbook of parametric and nonparametric statistical procedures. New York: CRC Press. Jackson, S. (2017). Statistics plain and simple. Cengage Learning. Retrieved from phoenix.vitalsource.com/#/books/9781337681728/cfi/6/8!/4/4@0:5.88

price1price3price5range1range3range5

Minimum0.5987260.5987260.6351965075250

Mean4.2962624.1732414.149952160.4856240.3792312.0327

Median4.1386844.0395754.039575125250300

Maximum17.3705617.3705617.37056300400400

acc1acc3acc5speed1speed3speed5

Minimum2.52.52.5558585

Mean4.172544.2729914.05446984.66695107.3055107.3421

Median444859595

Maximum666140140140

choiceCountPercent

choice188719%

choice22696%

choice3134529%

choice43497%

choice5149932%

choice63057%

CountPercentCountPercentCountPercent

choice188719%0107923%0298964%

choice22696%1357577%1166536%

choice3134529%

choice43497%CountPercent

choice5149932%0362178%

choice63057%1103322%

choicecollege

hsg2

coml5

type1type2type3type4type5type6Total

van41092841018624109724992

regcar31387693138362313838510930

truck4871851487117548711415628

sportuv28335283572831071048

stwagon137991137112413719204446

sportcar1998019974199129880

Total46544654465446544654465427924

fuel1fuel2fuel3fuel4fuel5fuel6

cng1178117823302330--

methanol34763476----

electric--2324232411751175

gasoline----34793479

pollution1pollution2pollution3pollution4pollution5pollution6

Mean0.08530.08530.41370.41370.59410.5941

Median000.40.40.60.6

Mode000.40.40.250.25

Minimum000.10.10.250.25

Maximum0.60.60.750.7511

space1space2space3space4space5space6

Mean0.8507740.8507740.9256770.92567711

Median111111

Minimum0.70.70.70.711

Maximum111111

station1station2station3station4station5station6

Mean0.0895140.0895140.3827680.3827680.8239150.823915

Median000.30.311

Minimum000.10.10.10.1

Maximum0.70.70.70.711