data analysis and viz

profileyesh777
ITS530-Week2WorkingwithData.pdf

Analyzing and Visualizing Data

Chapter 4 Working With Data

Data Assets and Tabulation Types

• Two main categories o Data that exist in tables; Datasets o Data that exist as isolated values

• Data Types o Levels of data or scales of measurement o Type of exploratory data analysis you can undertake o Editorial thinking you establish o Specific chart types you might use o Color choices and layout decisions around composition

Data Assets and Tabulation Types cont.

• Textual (Qualitative) o Unstructured streams of words o Descriptive details of a weather forecast for a given city o The full title of an academic research project o The description of a product on Amazon

Data Assets and Tabulation Types cont.

• Nominal (Qualitative) o Ordinal data is still categorical and qualitative in nature o Characteristics of order o The response to a survey question: based on a scale of 1 (unhappy)

to 5 (very happy) o The general weather forecast: expressed as Very Hot, Hot, Mild, Cold,

Freezing

Data Assets and Tabulation Types cont.

• Interval (Quantitative) o Interval data is the less common form of quantitative data o Quantitative and numeric measurement o Measure for temperature

Data Assets and Tabulation Types cont.

• Ratio (Quantitative) o Most common quantitative variable o Age of a survey participant in years o Forecasted amount of rainfall in millimetres o Unlike interval data, for ratio data variables zero means something

Data Assets and Tabulation Types cont.

• Temporal Data o Time-based data o Textual: ‘Four o’clock in the afternoon on Monday, 12 March 2016’

Ordinal: ‘PM’, ‘Afternoon’, ‘March’, ‘Q1’ o Interval: ‘12’, ‘12/03/2016’, ‘2016’ o Ratio: ‘16:00’

Data Assets and Tabulation Types cont.

• Discrete o No ‘in-between’ state o Days of the week o Heads or tails for a coin toss o 1,2,3,4,5,6,etc.

• Continuous o Has in-between state o Height and weight o Temperature o Time o 1.1,1.2,1.3,1.4,1.5,etc.

Data Acquisition

• What data do you need and why? • From where, how, and by whom will the data be acquired? • When can you obtain it?

Data Acquisition cont.

• Curated by You o Primary data collection o Manual collection and data foraging o Extracted from pdf files o Web scraping (also known as web harvesting)

Data Acquisition cont.

• Curated by Others o Issued to you o Download from the Web o System report or export o Third-party services o API

Data Examination

• Data Properties o Data types o Size o Condition

▪ Missing values ▪ Erroneous values ▪ Inconsistencies ▪ Duplicate records ▪ Out of date ▪ Uncommon system characters or line breaks ▪ Leading or trailing spaces

Data Examination cont.

• How to Approach This? o Inspect and scan o Data operations o Statistical methods o Frequency counts o Frequency distribution o Measurements of central tendency o Measurements of spread o Maximum, minimum and range o Percentiles o Standard deviation

Influence on Process

• Moving forward o Purpose map ‘tone’ o Editorial angles o Physical properties influence scale

Data Transformation

• Potential Activities o Transform to clean o Transform to convert o Transform to create o Transform to consolidate

Data Exploration

• Exploratory Data Analysis o Instinct of the analyst o Reasoning

▪ Deductive ▪ Inductive

o Chart types o Research o Statistical methods o Nothings o Not always needed