Analyzing and Visualizing Data

ravikiran333
DocumentingResearchGuide.pdf

Documenting Research Guide Last Revised: 8/30/2020 1

Documenting Research Guide

Contents Outline Structure and Content ................................................................................................................................ 2

Outline Example coinciding with Unit 3 ................................................................................................................ 4

Writing Tips ............................................................................................................................................................ 7

Example Research Paper coinciding with Unit 3, annotated .................................................................................. 8

Example Research Paper coinciding with Unit 3 ................................................................................................. 22

Documenting Research Guide Last Revised: 8/30/2020 2

Outline Structure and Content The outline is an organization document to provide structure for the research paper. Use the outline to document

research.

Section 1, Level 1 Section Heading: This heading is the title of the paper.

• Background, topic, introduction

• Describe the broader context in which the problem exists, the topic

• Lead the reader to the problem statement

• Do not explicitly state the problem, research questions, or methodology

This section introduces the research topic and provides a high-level summary of what the reader can expect

to find in the rest of the paper.

Section 1, Level 2 Section Heading: Statement of the Problem

• This section may come straight from an assignment's instructions

• Provide the ideal, current, and intent of the problem for research

Section 2, Level 1 Section Heading:Research Methodology

• Begins with an introduction to all the content in the research methodology section

Section 2, Level 2 Section Heading: Research Questions

• This may come straight from the assignment's instructions

• Ensure that developed questions conform to the standards defined in the first lecture

Section 2, Level 2 Section Heading: Sample Data

• Review the sample data – variable names do not identify what the content represents – so do not use

variable names!

• Explain and describe what each of the variables represents, connecting the sample to the

background, problem, and question so the reader can understand what the data represents and why it

is suitable data to answer the research questions

Section 2, Level 2 Section Heading: Analysis Method and Limitations

• A plan, defining what type of analysis will address each research question.

• The plan will include statistical assumptions, limitations to the analysis method, and mitigating steps

taken for the limitations.

• This section is not a programming plan! This section does not include the programming procedure or

steps. Define this section before conducting any programming or analysis.

This section finishes with a summary of the content of the section

Develop everything above this statement is before analysis. Work on everything below after the analysis.

EXCEPTION: Develop the reference section before and after analysis.

Some of the elements above this statement could change after analysis.

Section 3, Level 1 Section Heading:Results and Discussion

• Begins with an introduction to all the content in the results and discussion section

Documenting Research Guide Last Revised: 8/30/2020 3

• This objective of the research is to answer the research questions – the purpose of this section.

o If there is more than one research question, address them individually

o Pay attention to the generalizability!

• Provide interpretations of all results!

• Inclusion of figures or tables must conform to APA 7 standards

• Include all findings, even if they do not support the desired outcome

o If the analysis method for the findings has statistical assumptions, address the statistical

assumptions before presenting the findings

• There is no programming code in this section.

• Finishes with a summary of the content in this section

Section 3, Level 2 Section Heading: Recommendations for Future Research

• After the analysis, how can it be improved?

• Different analysis methods?

• Different sample data?

• Different data structures?

The recommendations must come from the research; do not recommend different data collection methods – that

is not part of the research! (This course only uses secondary data, data consolidated by someone else.)

Section 4, Level 1 Section Heading:Conclusion

• The conclusion is a summary of everything in the entire paper

• Do not introduce new ideas in the conclusion

• Highlight key points of the research or findings

Section 5, Level 1 Section Heading:References

• Reference section and references per APA 7

• Some of the standards for this section per APA 7

o References always begin on a new page

▪ use insert new page to ensure this section starts at the top of a separate page from the rest

of the document

o References are in alphabetical order

o Annotated with a hanging indent

▪ The reference begins flush with the left one-inch margin

▪ Indent wrapped text is one-half inch

Documenting Research Guide Last Revised: 8/30/2020 4

Outline Example: based on the analysis in Unit 3

The 2016 Presidential Campaign Polling

• The 2016 election was tumultuous

o Distinct perception Trump would not win

o Bias may have played a part

o Polling samples

o shy voters

• The research includes analysis of the polls' results and how the

results relate to the outcome of the election.

Statement of the Problem

Neutral polling, collected from a sample genuinely representative of

the voters, will provide an accurate prediction of the winner of an

election. Polling seemed to indicate that Clinton was going to win,

but the electoral vote significantly favored the Trump campaign.

Exploration of the polling results throughout the campaign and a

particularly close look at the ratings at the end of the campaign may

provide insight into the source of the significantly different outcome

than the media portrayed with the election of President Trump.

Research Methodology

Research Question

Considering the 2016 presidential campaign, using the polling data

consolidated by Silver et al. (2016) and the election results

consolidated by Ballotpedia. (n.d.), what relationships exist between

the polling and the 2016 election results that indicate that President

Trump would win the election?

Sample Data

Note: Keep in mind that if the data used in an assignment has

variables not used in the analysis, those variables are not part of the

sample! Take note of this in the data. There are several fields not

discussed here, because the fields were not part of the analysis

• The secondary sample data from Silver et al. (2016) includes

polling data that represents

o Location: fifty states, national polls, and Washington

DC

o Dates: November 2015 to November 2016, the ending

date for each poll

o Size: the sample size of each poll

The title is capitalized in

title case. This is the

first section heading and

the title of the paper in

the final document.

For most of the course this is

provided. In the outline and

research paper, the entire

statement is provided.

Cite the source(s) of the

sample data.

Provide a summary of the

document in the introduction.

While the outline has sentence fragments and bullets throughout – the research paper will not. The

organizational statements in the outline are written as well-developed paragraphs in the research paper.

In APA 7, a level 1

section heading is in

bold, centered between

the one inch left and

right margins.

In APA 7, a level 2

section heading is in

bold, flush to the one

inch left margin.

All research questions

belong in the outline.

Explain the sample in

words.

Explain how the data is

represented, such as parts

per million or percentage

of votes.

Documenting Research Guide Last Revised: 8/30/2020 5

o Vote: the percentage of votes for President Trump

and for Clinton each poll in the data

• The secondary sample data used from Ballotpedia (n.d.)

represents:

o fifty states and Washington DC

o electoral votes available in each state

o 2016 election vote percentage of each state for

President Trump and Clinton

Analysis Method and Limitations

• What relationships exist between the pre-election polling attributes, the

2016 election, and each state's allocated electoral votes that indicate that

President Trump would win the election?

• assessed via visual analysis

o not parametric, therefore no statistical assumptions

o limitations of visual analysis

▪ high dimensionality is challenging to assess

▪ possibility of inadequate assessment leading to

incorrect conclusions

▪ the more comparisons, the higher likelihood of false

discoveries (Zhao et al., 2017)

o mitigation for inadequate assessment

▪ explore interesting findings via multiple facets, to

ensure adequate assessment

o mitigation for false discoveries

▪ Attempt to view any key finding from multiple

perspectives, to validate the finding

Develop everything above this statement for the outline, along with the

reference section.

Develop everything below after the analysis, along with the reference

section. There may be updates to the other sections.

Type of analysis for each research

question; list each question!

Declare how this method can

address each of the research

questions.

Declare any statistical assumptions

for this method of analysis with a

credible reference.

Provide limitations to the method

of analysis and methods to

mitigate limitation if it impacts

the validity or reliability of the

research.

In other words, if the limitation

can lead to incorrect conclusions,

how will correct conclusions be

determined?

Declare the headings for

the remaining fields

The design for analysis

Documenting Research Guide Last Revised: 8/30/2020 6

Results and Discussion

Recommendations for Future Research

Conclusion

References

Include the reference(s) of the data, in APA

7.

Include a citation for every

reference

Include a reference for every

citation

The reference section begins on a separate page.

Documenting Research Guide Last Revised: 8/30/2020 7

Writing Tips • When writing a paper or developing a presentation, always include a summary of the document within

the introduction and the conclusion.

• Focus the writing on the purpose: solve the problem, answer the question, or prove the expected outcome. In this course, the assignments will all have research questions. Focus on the questions.

• Write concisely. This is not a persuasive paper. Writing superfluously devalues your work.

• When you finish writing: o Read the document aloud.

▪ This is the single, most effective method to identify elements of the document that require editing.

▪ Think about the problem, research questions or the expected outcome:

• Did you focus on it throughout the document?

• Did you provide answers to the research question(s)? o If you are not particularly confident in your writing:

▪ Take time to identify the topic sentence in every paragraph, in every section, and within the introduction and conclusion.

▪ There should be transition sentences between the ideas in the document. Does the writing jump from one idea to the next?

▪ The writing center is an excellent resource, as well. ▪ Use the outline to organize your graduate-level writing.

• Do not concern yourself with your SafeAssign score.

o Ensure that quoted words, paraphrasing, and direct references to external sources have citations

and references to the original source of the information. Still not sure? Email me.

o Think about it! What do you think the average SafeAssign percentage is for the outline?

▪ A significant portion of the outline will come from the assignment instructions.

▪ The matching criteria from SafeAssign typically allocates 60-80% scores to submissions

that are correctly written.

• Cite every reference. Include all references in the reference section.

• Evaluation of all writing assignments by APA 7 criteria.

o Student papers do not include an abstract.

o Vertical spacing is uniform between lines of text

▪ Microsoft Word automatically adds paragraph padding – remove it or use the template.

o The text alignment throughout the document is left-align, not justify.

o Do not solely rely on citation and reference generators. These tools are fallible.

8

Documenting Research Guide Last Revised: 8/30/2020 8

Example Research Paper: with notations

The 2016 Presidential Campaign Polling

Dr. Kathy A. McClure

University of the Cumberlands

ITS-530: Data Analysis and Visualization

Dr. Kathy A. McClure

July 23, 2020

One of two places in the

document correctly

documented with non-

uniform vertical spacing.

.

The top name is author.

When you see my name

again it is for the

professor of the course.

The only element in the header is the page number in the

same font as the document, starting at 1. (As this is part of

an example document the numberling is different.)

There is no footer in the student research paper.

There is no footer in a student research paper, per APA 7.

This footer is for document control.

9

Documenting Research Guide Last Revised: 8/30/2020 9

The 2016 Presidential Campaign Polling

The 2016 presidential campaign was tumultuous. It had seemed impossible that President

Trump would win the election. Silver et al. (2016)

indicated that there was a 71.4% chance that Clinton

would win the election. During the campaign, the media

led voters, including elected members of the republican

party, to believe that President Trump would not win the

election (Hohman, 2016). Regardless of the media,

Hohman (2016) retroactively identified that there were

many voters that were not pro-Clinton leading up to the

election. Stevenson (2016) interviewed American

University professor Dr. Allan Lichtman, who overtly

stated that President Trump would win the election based

on historical voting in this country. Dr. Lichtman

specified to exceptions to this claim: candidate Johnson

must receive at least five percent of the vote and

President Trump's unpredictable behavior. Goldmacher and

Schreckinger (2016) stated that President Trump winning

the election was the "…biggest upset in U.S. history"

(title). Many believed Clinton would win.

Problem Statement

Polling samples that represent the population will

provide an accurate prediction of the election winner.

Note that the outline was not followed

explicitly for the topic/introduction

Don’t forget to cite and reference sources

of information

Use evidence to support any assertions

that are not common knowledge

Example: “Sampling bias was an issue in

all polls.” That statement infers this is a

fact – when it is not and it would be

impossible to prove this statement!

You must have a citation and reference for

assertions.

From the outline:

• The 2016 election was tumultuous

• Distinct perception Trump would not win

• Bias may have played a part

• Polling samples

• shy voters

• The research includes analysis of the polls' results and how the results

relate to the outcome of the election

Why did this quote end with the word “title”

in parentheses? It is cited correctly. The

statement began with the source authors and

date. A quote requires three parts in the

citations, author, data, and the page number.

The reference is a website, so there are no

page or paragraph numbers. It must identify

where the quote was found, in this case, the

title.

The problem statement is verbatim from the

outline, unless it was insufficient.

10

Documenting Research Guide Last Revised: 8/30/2020 10

Polling results appeared to indicate that Clinton was going to win, but the election resulted in

President Trump swearing-in as the 45th president. Exploration of the polling and election results

may provide insight as to why the election winner was unexpected.

Method

Research Question

Considering the 2016 presidential campaign,

using the polling data consolidated by Silver et al. (2016)

and the election results consolidated by Ballotpedia. (n.d.), what relationships exist between the

polling and the 2016 election results that indicate that President Trump would win the election?

Sample

This research employed two secondary data sources

for the analysis. Consolidated polling data collected by

Silver et al. (2016) is the first data source. Each observed

poll includes the percentage of votes by location, ending

date, and sample size for Clinton and President Trump.

Ballotpedia (n.d.) election data is also necessary for this

analysis and includes the percentage of votes by location

for Clinton and President Trump. Available electoral

votes for each location is another attribute in the election

data. Locations between the two secondary data sources

differed.

The polls' locations include the entire nation, each

state, and Washington, DC, and specific districts within Nebraska and Maine. The district polls

The research question(s) are verbatim from

the outline unless the question was

insufficient.

From the outline:

The secondary sample data from

Silver et al. (2016) includes polling

data that represents

• fifty states, national polls, and Washington DC

• November 2015 to November 2016, the ending date for each poll

• the sample size of each poll

• provides a raw percentage of votes for each poll for President Trump

and Clinton

The secondary sample data used from

Ballotpedia (n.d.) represents:

• fifty states and Washington DC

• electoral votes available in each state

• 2016 election vote percentage of each state for President Trump and

Clinton

11

Documenting Research Guide Last Revised: 8/30/2020 11

within Nebraska and Maine were representative of the method of electoral vote distribution.

Splitting the electoral vote is possible in Nebraska and Maine (Coleman, 2020). In the other 48

states and Washington, DC, using winner-take-all, the

popular vote winner for the state receives all the electoral

votes. The election data simplified the locations: each

state and Washington, DC.

Analysis Method and Limitations

The method of analysis must be suitably capable

of meeting the objective of this research, statistical

assumptions identification is necessary, if they exist, and

identification of any limitations is essential, along with

mitigation, where possible. Visual analysis is suitable for

extracting relationships that may exist in the data. This

method is also appropriate for confirming the information

derived from the analysis. There are no formal statistical

assumptions. There are three limitations identified for visual

analysis.

High dimensionality, inadequate assessment, and false discoveries are risks associated

with visual analysis. The scope of this research does not include numerous variables, mitigating

the threats associated with high dimensionality. The potential for inadequate assessment and

false discoveries requires mitigation. Visualizations of data provide a perspective of the

information without context. To mitigate these risks, it is compulsory to assess all key findings

from multiple perspectives. This process ensured that there was an adequate assessment of that

From the outline:

Analysis Method and Limitations

• assessed via visual analysis

• not parametric, therefore no statistical assumptions

• limitations of visual analysis

• high dimensionality is challenging to assess

• possibility of inadequate assessment leading to incorrect conclusions

• the more comparisons, the higher likelihood of false discoveries (Zhao et

al., 2017)

• mitigation for inadequate assessment

• explore interesting findings via multiple facets, to ensure adequate

assessment

• mitigation for false discoveries

• Attempt to view any key finding from multiple perspectives, to validate the

finding

12

Documenting Research Guide Last Revised: 8/30/2020 12

the perceived information. Focusing on the research question and using two sources of secondary

data, the analysis generated results.

Results

Consolidation of the visual analysis highlighted key findings through four visualizations

of data. Manipulating the data with various summarization techniques generated meaningful

graphics. The sample included nearly a year's worth of polling data, but limiting the data to polls

closest to the election generated the key findings in this research. The term polling vote

represents polls ending in November 2016, consolidated by state and candidate, using the median

value. Geospatial visualization indicates that in 45 of the 50 states the winning candidate in the

polling vote and the election were the same (see Figure 1). In five states, Clinton led in the

polling vote, but President Trump won in the

election. For simplification, the term flipped states

refers to the five states identified in Figure 1.

Due to the non-uniformity of the data, the measure of centrality in this analysis is the

median. Summarizing data can cause misrepresentation of the data. Comparing the polling vote

identified 12 states with five percent or less difference between candidates. Visualizing the 12

states identified the how well the median represents the data (see Figure 2). The evidence

suggests that the median does not misrepresent the results. The 12 states include the five flipped

states identified in Figure 1. The close margins in the polling data of the flipped states

necessitated a deeper investigation, into individual polls. Before documenting the remaining

results of this analysis, the visualization of the difference between candidates requires further

explanation.

Repeating the same information is ill-advised.

Don’t repeat the information in the caption of a

figure or table.

13

Documenting Research Guide Last Revised: 8/30/2020 13

The candidates were compared by subtracting the polling votes for each state (see Figure

3 and Figure 4). The values’ direction is indicative of the winning candidate. Leads held by

Clinton are to the left of zero. Where President Trump’s led, the value is annotated to the right of

zero. The value is indicative of how much lead one candidate has over the other. For example, if

President Trump earned 40% of the vote and Clinton earned 41% of the vote, Clinton led that

vote by one percent. This Clinton lead would be visualized by placing the marker to the left of

zero on the axis marker representing a value of one percent.

APA use of figures & tables is specific. Each figure or table but include enough information to be self-explanatory.

Do not explain the figure in the document. **You must refer to each figure or table in the document, though!**

Results require EVIDENCE. In visual analysis, the evidence is visual!

14

Documenting Research Guide Last Revised: 8/30/2020 14

15

Documenting Research Guide Last Revised: 8/30/2020 15

16

Documenting Research Guide Last Revised: 8/30/2020 16

17

Documenting Research Guide Last Revised: 8/30/2020 17

After identifying the flipped states’ polling vote by candidate differed by five percent or

less, each poll within flipped states ending in November 2016 were analyzed (see Figure 3). The

majority of the individual polls also varied by less than five percent

between the candidates. Clinton held the lead in nearly all polls in

these states. In Florida, there were no polls that exceeded the five

percent margin between candidates. Trump did not lead in any polls

in Wisconsin from this data.

The polling vote and election vote were compared by

candidate all election locations from the data. While five states

flipped, there were other states with close margins. Additionally, the

comparison of the polling vote and election vote visualizes the relationship between the

candidates’ polling vote and the election vote (see Figure 4). The 12 states shown in Figure 2, are

annotated with green text in Figure 4.

Discussion

Q1. Considering the 2016 presidential campaign,

using the polling data consolidated by Silver et al. (2016)

and the election results consolidated by Ballotpedia.

(n.d.), what relationships exist between the polling and

the 2016 election results that indicate that President

Trump would win the election?

The close margins in multiple states in the polling

data indicate that the candidates between candidates

suggest that there were no guarantees in this election.

What is the difference between

an assessment and an assertion?

“I am short” – assessment

“I am 5’6” – assertion

Which one requires evidence?

Every assertion, that is not

common knowledge.

What evidence?

Evidence is derived from

the analysis or

a cited reference.

How did this example begin?

The research question!

Okay, how did this example begin after

the research question?

The close margins in multiple states…

That is the topic sentence for the section.

What can you expect to find in this

section?

Did you notice that this section isn’t all

that long? There are not a lot of findings to

discuss in regards to the research question.

18

Documenting Research Guide Last Revised: 8/30/2020 18

Florida voting was amongst the closest margins in both the polls and in the election (see Figure

4). As a state, Florida has 29 winner-take-all electoral votes and the polling margins were small

enough to state that any uncertainty would indicate that the polling results were not able to

identify a winner. While 29 votes would not have changed the outcome, this state was not the

only state with close margins. Amongst the polling votes, 12 states, representing over 100

electoral votes, had margins of less than five percent between President Trump and Clinton. It is

reasonable to assume that polls are not perfect. The possibility of underrepresenting a genre of

the population is too great of a possibility. Through visual analyses, this evidence suggests that

either candidate could have won the election due to uncertainty.

Recommendations for Future Research

Two identified opportunities may provide more insight into why President Trump won

the election, despite the low likelihood identified by analysts such as Silver et al. (2016). The

polling vote for President Trump is underrepresented in many of the states where he held the lead

(see Figure 4). Conversely, in states that Clinton led the polling vote represents the election

reasonably well. Kurtzleben (2016) did some analysis in this area and inferred that rural voting

was pro-Trump. Analysis conducted by Lee (2017) investigated the impact of rural and urban

voters in the 2016 election. Lee's analysis of voting data in Minnesota and Wisconsin suggests

that urban area voters were strong supporters of Clinton, and rural voters were strong supporters

of President Trump. The dispersion of rural and urban voters may not be recoverable for this

polling data. Uncovering the source of the underrepresented President Trump vote could indicate

a systemic issue in polling conducted in the 2016 presidential election. With additional data, the

first recommendation for future research is to identify poll and election votes that were allocated

to either rural or urban votes. The confidence interval is a statistical measure of uncertainty.

19

Documenting Research Guide Last Revised: 8/30/2020 19

Reassessing this data, implementing poll confidence intervals into an analysis method capable of

prediction is the second recommendation for future research. Either of these research

opportunities could add more insight into the disparity between polling and the 2016 presidential

election.

Conclusion

Assessing relationships in the polling data and election data for the 2016 presidential

election, indicates that due to uncertainty the winner of the

election could not be reasonably determined. Uncertainty in the

polling data and close margins between candidates suggest neither

candidate held the lead. Electoral votes allocated to states with

close margins, along with the even split of states between Clinton and President Trump, suggests

that there is insufficient evidence to determine the likelihood of an election winner. Perhaps

analysts should have followed the method used by Dr. Lichtman when he stated that President

Trump would win (Stevenson, 2016). Afterall, Dr. Lichtman was correct.

This section should summarize the

entire document.

This section may highlight key

findings, as well.

No new information!

20

Documenting Research Guide Last Revised: 8/30/2020 20

References

Ballotpedia. (n.d.). 2016 election results [dataset]. Retrieved July 18, 2020, from

https://docs.google.com/spreadsheets/d/1zxyOQDjNOJS_UkzerorUCf2OAdcMcIQEwRc

iKuYBIZ4/pubhtml?widget=true&headers=false#gid=658726802

Coleman, J. M. (2020, January 9). The electoral college: Maine and Nebraska's crucial

battleground votes. Sabato's Crystal Ball.

http://centerforpolitics.org/crystalball/articles/the-electoral-college-maine-and-nebraskas-

crucial-battleground-votes/

Goldmacher, S., & Schreckinger, B. (2016, November 16). Trump pulls off biggest upset in U.S.

history. Politico. https://www.politico.com/story/2016/11/election-results-2016-clinton-

trump-231070

Hohman, J. (2016, November 9). The daily 202: Why Trump won -- and why the media missed it.

The Washington Post. https://www.washingtonpost.com/news/powerpost/paloma/daily-

202/2016/11/09/daily-202-why-trump-won-and-why-the-media-missed-

it/5822ea17e9b69b6085905dee/

Kurtzleben, D. (2016, November 14). Rural voters played a big part in helping Trump defeat

Clinton. NPR. https://www.npr.org/2016/11/14/501737150/rural-voters-played-a-big-

part-in-helping-trump-defeat-clinton

Lee, M. (2017, January 5). Mapping Wisconsin presidential election results [web log]. Retrieved

August 21, 2020, from https://www.mikelee.co/posts/2016-12-26-wisconsin-presidential-

election-results/

21

Documenting Research Guide Last Revised: 8/30/2020 21

Silver, N., Kanjana, J., & Mehta, D. (2016, November 8). Who will win the presidency?

Fivethirtyeight: 2016 Election Forecast. https://projects.fivethirtyeight.com/2016-

election-forecast/

Stevenson, P. W. (2016, November 9). Professor who predicted 30 years of presidential

elections correctly called a Trump win in September. The Washington Post.

https://www.washingtonpost.com/news/the-fix/wp/2016/10/28/professor-whos-predicted-

30-years-of-presidential-elections-correctly-is-doubling-down-on-a-trump-win/

Zhao, Z., De Stefani, L., Zgraggen, E., Binnig, C., Upfal, E. & Kraska, T. (2017). Controlling

false discoveries during interactive data exploration. In Proceedings of the 2017 ACM

international conference on management of data (pp. 527-540). Association for

Computing Machinery. https://doi.org/10.1145/3035918.3064019

Pay attention to the formatting here! APA 7 is not the same as APA 6.

Every reference MUST be in a citation somewhere in the text document.

When do you cite? Paraphrasing, quoting, or direct reference to a source.

When annotating references:

• Every reference has an author – the author may not be a person

• Every reference has a date – more often than not it is only the year

• Every reference has a title – unless the title is the author!

• Every reference includes a source. o Webpages are sourced from websites. o Journal articles are sourced from journals. o PUBLISHED conference papers are from proceedings from a publisher (so

both are needed! – see the reference to Zhao et al.)

o Conference papers are from conferences, when they are not published.

• Unless there is no electronic version – every reference has a “home” that is included. o IF a DOI exists it must be the link via the DOI. o The website is not optional.

22

Documenting Research Guide Last Revised: 8/30/2020 22

Example Research Paper: without notations

The 2016 Presidential Campaign Polling

Dr. Kathy A. McClure

University of the Cumberlands

ITS-530: Data Analysis and Visualization

Dr. Kathy A. McClure

July 23, 2020

23

Documenting Research Guide Last Revised: 8/30/2020 23

The 2016 Presidential Campaign Polling

The 2016 presidential campaign was tumultuous. It had seemed impossible that President

Trump would win the election. Silver et al. (2016) indicated that there was a 71.4% chance that

Clinton would win the election. During the campaign, the media led voters, including elected

members of the republican party, to believe that President Trump would not win the election

(Hohman, 2016). Regardless of the media, Hohman (2016) retroactively identified that there

were many voters that were not pro-Clinton leading up to the election. Stevenson (2016)

interviewed American University professor Dr. Allan Lichtman, who overtly stated that

President Trump would win the election based on historical voting in this country. Dr. Lichtman

specified to exceptions to this claim: candidate Johnson must receive at least five percent of the

vote and President Trump's unpredictable behavior. Goldmacher and Schreckinger (2016) stated

that President Trump winning the election was the "…biggest upset in U.S. history" (title). Many

believed that Clinton would win.

Problem Statement

Polling samples that represent the population will provide an accurate prediction of the

election winner. Polling results appeared to indicate that Clinton was going to win, but the

election resulted in President Trump swearing-in as the 45th president. Exploration of the polling

and election results may provide insight as to why the election winner was unexpected.

Method

Research Question

Considering the 2016 presidential campaign, using the polling data consolidated by Silver

et al. (2016) and the election results consolidated by Ballotpedia. (n.d.), what relationships exist

24

Documenting Research Guide Last Revised: 8/30/2020 24

between the polling and the 2016 election results that indicate that President Trump would win

the election?

Sample

This research employed two secondary data sources for the analysis. Consolidated polling

data collected by Silver et al. (2016) is the first data source. Each observed poll includes the

percentage of votes by location, ending date, and sample size for Clinton and President Trump.

Ballotpedia (n.d.) election data is also necessary for this analysis and includes the percentage of

votes by location for Clinton and President Trump. Available electoral votes for each location is

another attribute in the election data. Locations between the two secondary data sources differed.

The polls' locations include the entire nation, each state, and Washington, DC, and

specific districts within Nebraska and Maine. The district polls within Nebraska and Maine were

representative of the method of electoral vote distribution. Splitting the electoral vote is possible

in Nebraska and Maine (Coleman, 2020). In the other 48 states and Washington, DC, using

winner-take-all, the popular vote winner for the state receives all the electoral votes. The election

data simplified the locations: each state and Washington, DC.

Analysis Method and Limitations

The method of analysis must be suitably capable of meeting the objective of this

research, statistical assumptions identification is necessary, if they exist, and identification of any

limitations is essential, along with mitigation, where possible. Visual analysis is suitable for

extracting relationships that may exist in the data. This method is also appropriate for confirming

the information derived from the analysis. There are no formal statistical assumptions. There are

three limitations identified for visual analysis.

25

Documenting Research Guide Last Revised: 8/30/2020 25

High dimensionality, inadequate assessment, and false discoveries are risks associated

with visual analysis. The scope of this research does not include numerous variables, mitigating

the threats associated with high dimensionality. The potential for inadequate assessment and

false discoveries requires mitigation. Visualizations of data provide a perspective of the

information without context. To mitigate these risks, it is compulsory to assess all key findings

from multiple perspectives. This process ensured that there was an adequate assessment of that

the perceived information. Focusing on the research question and using two sources of secondary

data, the analysis generated results.

Results

Consolidation of the visual analysis highlighted key findings through four visualizations

of data. Manipulating the data with various summarization techniques generated meaningful

graphics. The sample included nearly a year's worth of polling data, but limiting the data to polls

closest to the election generated the key findings in this research. The term polling vote

represents polls ending in November 2016, consolidated by state and candidate, using the median

value. Geospatial visualization indicates that in 45 of the 50 states the winning candidate in the

polling vote and the election were the same (see Figure 1). In five states, Clinton led in the

polling vote, but President Trump won in the election. For simplification, the term flipped states

refers to the five states identified in Figure 1.

Due to the non-uniformity of the data, the measure of centrality in this analysis is the

median. Summarizing data can cause misrepresentation of the data. Comparing the polling vote

identified 12 states with five percent or less difference between candidates. Visualizing the 12

states identified the how well the median represents the data (see Figure 2). The evidence

suggests that the median does not misrepresent the results. The 12 states include the five flipped

26

Documenting Research Guide Last Revised: 8/30/2020 26

states identified in Figure 1. The close margins in the polling data of the flipped states

necessitated a deeper investigation, into individual polls. Before documenting the remaining

results of this analysis, the visualization of the difference between candidates requires further

explanation.

27

Documenting Research Guide Last Revised: 8/30/2020 27

28

Documenting Research Guide Last Revised: 8/30/2020 28

29

Documenting Research Guide Last Revised: 8/30/2020 29

30

Documenting Research Guide Last Revised: 8/30/2020 30

The candidates were compared by subtracting the polling votes for each state (see Figure

3 and Figure 4). The values’ direction is indicative of the winning candidate. Leads held by

Clinton are to the left of zero. Where President Trump’s led, the value is annotated to the right of

zero. The value is indicative of how much lead one candidate has over the other. For example, if

President Trump earned 40% of the vote and Clinton earned 41% of the vote, Clinton led that

vote by one percent. This Clinton lead would be visualized by placing the marker to the left of

zero on the axis marker representing a value of one percent.

After identifying the flipped states’ polling vote by candidate differed by five percent or

less, each poll within flipped states ending in November 2016 were analyzed (see Figure 3). The

majority of the individual polls also varied by less than five percent between the candidates.

Clinton held the lead in nearly all polls in these states. In Florida, there were no polls that

exceeded the five percent margin between candidates. Trump did not lead in any polls in

Wisconsin from this data.

The polling vote and election vote were compared by candidate all election locations

from the data. While five states flipped, there were other states with close margins. Additionally,

the comparison of the polling vote and election vote visualizes the relationship between the

candidates’ polling vote and the election vote (see Figure 4). The 12 states shown in Figure 2, are

annotated with green text in Figure 4.

Discussion

Q1. Considering the 2016 presidential campaign, using the polling data consolidated by

Silver et al. (2016) and the election results consolidated by Ballotpedia. (n.d.), what relationships

exist between the polling and the 2016 election results that indicate that President Trump would

win the election?

31

Documenting Research Guide Last Revised: 8/30/2020 31

The close margins in multiple states in the polling data indicate that the candidates

between candidates suggest that there were no guarantees in this election. Florida voting was

amongst the closest margins in both the polls and in the election (see Figure 4). As a state,

Florida has 29 winner-take-all electoral votes and the polling margins were small enough to state

that any uncertainty would indicate that the polling results were not able to identify a winner.

While 29 votes would not have changed the outcome, this state was not the only state with close

margins. Amongst the polling votes, 12 states, representing over 100 electoral votes, had margins

of less than five percent between President Trump and Clinton. It is reasonable to assume that

polls are not perfect. The possibility of underrepresenting a genre of the population is too great

of a possibility. Through visual analyses, this evidence suggests that either candidate could have

won the election due to uncertainty.

Recommendations for Future Research

Two identified opportunities may provide more insight into why President Trump won

the election, despite the low likelihood identified by analysts such as Silver et al. (2016). The

polling vote for President Trump is underrepresented in many of the states where he held the lead

(see Figure 4). Conversely, in states that Clinton led the polling vote represents the election

reasonably well. Kurtzleben (2016) did some analysis in this area and inferred that rural voting

was pro-Trump. Analysis conducted by Lee (2017) investigated the impact of rural and urban

voters in the 2016 election. Lee's analysis of voting data in Minnesota and Wisconsin suggests

that urban area voters were strong supporters of Clinton, and rural voters were strong supporters

of President Trump. The dispersion of rural and urban voters may not be recoverable for this

polling data. Uncovering the source of the underrepresented President Trump vote could indicate

a systemic issue in polling conducted in the 2016 presidential election. With additional data, the

32

Documenting Research Guide Last Revised: 8/30/2020 32

first recommendation for future research is to identify poll and election votes that were allocated

to either rural or urban votes. The confidence interval is a statistical measure of uncertainty.

Reassessing this data, implementing poll confidence intervals into an analysis method capable of

prediction is the second recommendation for future research. Either of these research

opportunities could add more insight into the disparity between polling and the 2016 presidential

election.

Conclusion

Assessing relationships in the polling data and election data for the 2016 presidential

election, indicates that due to uncertainty the winner of the election could not be reasonably

determined. Uncertainty in the polling data and close margins between candidates suggest

neither candidate held the lead. Electoral votes allocated to states with close margins, along with

the even split of states between Clinton and President Trump, suggests that there is insufficient

evidence to determine the likelihood of an election winner. Perhaps analysts should have

followed the method used by Dr. Lichtman when he stated that President Trump would win

(Stevenson, 2016). Afterall, Dr. Lichtman was correct.

33

Documenting Research Guide Last Revised: 8/30/2020 33

References

Ballotpedia. (n.d.). 2016 election results [dataset]. Retrieved July 18, 2020, from

https://docs.google.com/spreadsheets/d/1zxyOQDjNOJS_UkzerorUCf2OAdcMcIQEwRc

iKuYBIZ4/pubhtml?widget=true&headers=false#gid=658726802

Coleman, J. M. (2020, January 9). The electoral college: Maine and Nebraska's crucial

battleground votes. Sabato's Crystal Ball.

http://centerforpolitics.org/crystalball/articles/the-electoral-college-maine-and-nebraskas-

crucial-battleground-votes/

Goldmacher, S., & Schreckinger, B. (2016, November 16). Trump pulls off biggest upset in U.S.

history. Politico. https://www.politico.com/story/2016/11/election-results-2016-clinton-

trump-231070

Hohman, J. (2016, November 9). The daily 202: Why Trump won -- and why the media missed it.

The Washington Post. https://www.washingtonpost.com/news/powerpost/paloma/daily-

202/2016/11/09/daily-202-why-trump-won-and-why-the-media-missed-

it/5822ea17e9b69b6085905dee/

Kurtzleben, D. (2016, November 14). Rural voters played a big part in helping Trump defeat

Clinton. NPR. https://www.npr.org/2016/11/14/501737150/rural-voters-played-a-big-

part-in-helping-trump-defeat-clinton

Lee, M. (2017, January 5). Mapping Wisconsin presidential election results [web log]. Retrieved

August 21, 2020, from https://www.mikelee.co/posts/2016-12-26-wisconsin-presidential-

election-results/

34

Documenting Research Guide Last Revised: 8/30/2020 34

Silver, N., Kanjana, J., & Mehta, D. (2016, November 8). Who will win the presidency?

Fivethirtyeight: 2016 Election Forecast. https://projects.fivethirtyeight.com/2016-

election-forecast/

Stevenson, P. W. (2016, November 9). Professor who predicted 30 years of presidential

elections correctly called a Trump win in September. The Washington Post.

https://www.washingtonpost.com/news/the-fix/wp/2016/10/28/professor-whos-predicted-

30-years-of-presidential-elections-correctly-is-doubling-down-on-a-trump-win/

Zhao, Z., De Stefani, L., Zgraggen, E., Binnig, C., Upfal, E. & Kraska, T. (2017). Controlling

false discoveries during interactive data exploration. In Proceedings of the 2017 ACM

international conference on management of data (pp. 527-540). Association for

Computing Machinery. https://doi.org/10.1145/3035918.3064019