data analysis and viz
Analyzing and Visualizing Data
Chapter 4 Working With Data
Data Assets and Tabulation Types
• Two main categories o Data that exist in tables; Datasets o Data that exist as isolated values
• Data Types o Levels of data or scales of measurement o Type of exploratory data analysis you can undertake o Editorial thinking you establish o Specific chart types you might use o Color choices and layout decisions around composition
Data Assets and Tabulation Types cont.
• Textual (Qualitative) o Unstructured streams of words o Descriptive details of a weather forecast for a given city o The full title of an academic research project o The description of a product on Amazon
Data Assets and Tabulation Types cont.
• Nominal (Qualitative) o Ordinal data is still categorical and qualitative in nature o Characteristics of order o The response to a survey question: based on a scale of 1 (unhappy)
to 5 (very happy) o The general weather forecast: expressed as Very Hot, Hot, Mild, Cold,
Freezing
Data Assets and Tabulation Types cont.
• Interval (Quantitative) o Interval data is the less common form of quantitative data o Quantitative and numeric measurement o Measure for temperature
Data Assets and Tabulation Types cont.
• Ratio (Quantitative) o Most common quantitative variable o Age of a survey participant in years o Forecasted amount of rainfall in millimetres o Unlike interval data, for ratio data variables zero means something
Data Assets and Tabulation Types cont.
• Temporal Data o Time-based data o Textual: ‘Four o’clock in the afternoon on Monday, 12 March 2016’
Ordinal: ‘PM’, ‘Afternoon’, ‘March’, ‘Q1’ o Interval: ‘12’, ‘12/03/2016’, ‘2016’ o Ratio: ‘16:00’
Data Assets and Tabulation Types cont.
• Discrete o No ‘in-between’ state o Days of the week o Heads or tails for a coin toss o 1,2,3,4,5,6,etc.
• Continuous o Has in-between state o Height and weight o Temperature o Time o 1.1,1.2,1.3,1.4,1.5,etc.
Data Acquisition
• What data do you need and why? • From where, how, and by whom will the data be acquired? • When can you obtain it?
Data Acquisition cont.
• Curated by You o Primary data collection o Manual collection and data foraging o Extracted from pdf files o Web scraping (also known as web harvesting)
Data Acquisition cont.
• Curated by Others o Issued to you o Download from the Web o System report or export o Third-party services o API
Data Examination
• Data Properties o Data types o Size o Condition
▪ Missing values ▪ Erroneous values ▪ Inconsistencies ▪ Duplicate records ▪ Out of date ▪ Uncommon system characters or line breaks ▪ Leading or trailing spaces
Data Examination cont.
• How to Approach This? o Inspect and scan o Data operations o Statistical methods o Frequency counts o Frequency distribution o Measurements of central tendency o Measurements of spread o Maximum, minimum and range o Percentiles o Standard deviation
Influence on Process
• Moving forward o Purpose map ‘tone’ o Editorial angles o Physical properties influence scale
Data Transformation
• Potential Activities o Transform to clean o Transform to convert o Transform to create o Transform to consolidate
Data Exploration
• Exploratory Data Analysis o Instinct of the analyst o Reasoning
▪ Deductive ▪ Inductive
o Chart types o Research o Statistical methods o Nothings o Not always needed