RCH 5301 VI PP

profileMBunn
UnitVStudyGuide.pdf

RCH 5301, Research Design and Methods 1

Course Learning Outcomes for Unit V At the end of this unit, you should be able to:

2. Discriminate among the characteristics of qualitative and quantitative research methods. 2.3 Compare and contrast descriptive statistics and inferential statistics.

Required Unit Resources Chapter 8: Quantitative Methods (ULO 2.3) Read the following sections from this chapter:

• Data Analysis • Preregistering the Study Plan • Interpreting Results and Writing a Discussion Section

o Reporting Results o Statistical Tests in Survey Research o Practical Evidence o Context of Previous Studies

Unit Lesson Lesson: Write an Engaging Title (ULO 2.3)

Introduction This unit will introduce descriptive statistics and more advanced statistical procedures, such as correlation, regression, t-test, and ANOVA. New terminology will also be used, such as univariate (one variable) statistics, bivariate (two variable) statistics, and multivariate (multiple variable) statistics.

Descriptive and Univariate Statistics It is important to remember that research terminology is often used interchangeably in common usage. Therefore, some of the terms and phrases used here may differ slightly from those used by others, and, likewise, those used here may differ from those used in other research-based sources. Regardless of the wording or phrasing, it is usually quite easy to determine the context by the purpose and function of the concept being discussed. For example, Bell et al. refer to descriptive and univariate statistics as an appropriate way to begin to describe a data set. If a reader knew nothing else about descriptive and univariate statistics, they might assume these statistics are one in the same. As a greater understanding is achieved, it will become clear that descriptive statistics describe univariate statistics, but the use of univariate statistics extends beyond descriptive statistics. This lecture will address descriptive statistics and inferential statistical procedures that use both univariate and bivariate statistics. Most research articles and textbooks simply refer to descriptive statistics as a means to describe data without specifically naming univariate statistics. Descriptive statistics, by definition, are used to describe single variables, so whenever descriptive statistics are mentioned, it is safe to assume that data is being described on a univariate level. It should also be noted that descriptive statistics, while invaluable for analyzing data, are not inferential statistics and, therefore, cannot be used to generalize results to a population. As mentioned above, univariate statistics are not limited to simply descriptive statistics. In fact, univariate statistics are used in statistical procedures like the one sample t-test and confidence intervals to make

UNIT V STUDY GUIDE Quantitative Data Analysis: Descriptive, Univariate, and Bivariate Statistics

RCH 5301, Research Design and Methods 2

UNIT x STUDY GUIDE Title

inferences about populations of interest. In each of these procedures, the researcher attempts to make an inference from a univariate statistic such as a sample mean. For example, the one-sample t-test, or sometimes referred to as a univariate t-test, is used to determine if the sample mean for a single variable is equal to an unknown population mean. Similarly, the confidence interval uses a sample mean for a single variable to create a range of estimates for an unknown population mean. While univariate statistics can be immensely helpful depending on the problem that is being addressed, inferential statistics can be extended to make inferences about relationships between two or more variables. In addition to univariate statistics, Units V and VI will review some of the fundamental and often used statistical procedures employing bivariate and multivariate statistics for making inferences and generalizing results to populations.

Parametric and Non-Parametric Statistics While descriptive statistics are the most basic form of data analysis, they are also the most important precursor to inferential statistics. Describing the data both visually (e.g., histograms) and statistically (e.g., measures of central tendency) can be useful by itself, such as for identifying outliers. But descriptive statistics perform another important and necessary function, which is to determine if the data meet the required assumptions for the use of inferential statistical procedures. Each procedure has its own required assumptions, but a normal distribution of data is the shared requirement for special procedures call parametric statistical tests (Cooper & Schindler, 2014). (Note: A quick internet search will yield all the assumptions that must be met for whichever procedure the researcher decides to use.) Inferential statistics include parametric and non-parametric statistical procedures. Researchers prefer the use of parametric statistical procedures for hypothesis testing since they are more powerful than non-parametric tests and, therefore, are more likely to find a true effect when one exists. Said differently, parametric tests are less likely to commit a Type II error, or false negative, which is when the null hypothesis is incorrectly accepted as true. Hypothesis testing, Type I and Type II errors, and statistical significance will be covered in much more detail in Unit VI. Non-parametric tests also have assumptions that must be met, but they are far fewer and much less restrictive. For example, under non-parametric statistical procedures, there is no assumption of a normal distribution of data. For this reason, non-parametric tests are sometimes called distribution-free statistics. It also should be noted that parametric tests require the use of continuous data, or data that are measure on an interval or ratio scale. Non-parametric tests are appropriate for data measured on a nominal or ordinal scale (Zikmund et al. 2013). Fortunately for researchers, parametric statistical procedures often have non-parametric equivalents, as seen in the table below.

RCH 5301, Research Design and Methods 3

UNIT x STUDY GUIDE Title

If, after analyzing descriptive statistics, a researcher is still unsure if the assumptions for parametric testing are met, there is an alternative approach that can be taken. Hypothesis testing can use a parametric and non- parametric equivalent procedure using the same data. If the test results are the same (e.g., both statistically significant or both statistically non-significant), there is nothing more to worry about (Norusis, 2008).

Bivariate Statistics Bivariate statistical analysis can be simplified as either looking for relationships or differences between variables. Parametric statistical procedures that test for relationships between variables include correlation and simple regression. Chi-square is a non-parametric procedure that tests for relationships between variables. The t-test is a parametric statistical procedure that tests for differences between variables. Pearson’s Correlation Coefficient (r) Although many course concepts in research methods may be new and foreign, correlation may feel more familiar to most students. Correlation makes intuitive sense to most people since relationships between variables are common in everyday life, such as the relationship between social media consumption and grades, the relationship of education and income, and the relationship between temperature and energy use. Although Pearson’s r can be calculated by hand, as can other statistics, it is much more common to use statistical software that can quickly and accurately compute a correlation coefficient. Microsoft Excel provides an add-on, called Toolpak, that can be easily installed to an existing Excel package at no cost by following the directions: Data Analysis ToolPak in Excel Pearson’s r is a standardized measure of correlation that ranges from -1 to +1. The chart below shows the degrees of correlation between the minimum of -1 and maximum of +1.

(Adapted from “Guideline for Interpreting Correlation Coefficient” by ITH Phanny, 2014)

Pearson’s r indicates both the strength and direction of the relationship between variables. A Pearson’s r of +1 indicates a perfect positive correlation, -1 indicates a perfect negative correlation, and 0 indicates no correlation at all between variables. As shown below, the closer the data points cluster in a linear fashion, the greater the correlation between variables.

(Ćurčić, 2019)

RCH 5301, Research Design and Methods 4

UNIT x STUDY GUIDE Title

Coefficient of Determination (R2) Pearson’s r is very useful for understanding the strength and direction of the relationship between variables, but its use can be extended. By simply squaring r, a statistic called the coefficient of determination, notated as R2, is produced. The coefficient of determination (R2) indicates the amount of variability in one variable that is explained by the other variable, but this should not be confused with causation (Field, 2005). Most statistical software programs will automatically calculate r and R2 when conducting a correlation analysis, so it is quick and easy to see the strength, direction, and explained variance between variables. For example, assume a data set for the variables a) employee overtime hours and b) employee job satisfaction is being tested for correlation, and the results are:

x = employee turnover y = employee morale r = -0.60, indicating a moderately strong negative relationship between turnover and morale. R2 = .36, indicating that 36% of the variability in morale is explained by turnover.

Simple Regression Analysis Correlation analysis, especially Pearson’s r, is a popular inferential statistical procedure. It is intuitive and easy to interpret. Notwithstanding these benefits, correlation analysis is unable to predict how the change in one variable will relate to a change in another variable. Fortunately, there is another inferential statistical procedure that can predict the change in a dependent variable (DV) from changes in an independent variable (IV). That bivariate statistical procedure is simple regression. Simple regression is a useful bivariate statistical procedure that shares many commonalities with correlation, including:

• It analyzes only two variables, an independent (IV) and dependent (DV). • It provides a measure of correlation between the two variables (usually Pearson’s r). • It provides the coefficient of determination.

The difference between correlation and simple regression, however, is game changing. Simple regression provides all the benefits of correlation, plus, it has predictive power. This is accomplished through a regression line that is fit to the data. As shown in the graphic below, the black line is called the line of best fit. The line of best fit represents the line that minimizes the sum of squared residuals between the data points and the line.

RCH 5301, Research Design and Methods 5

UNIT x STUDY GUIDE Title

Adapted from "Multiple Linear Regression" by James Neill, 2008)

If all of the data points fell on a straight line, it would be a perfect linear relationship, which would enable a perfect prediction of the Y axis variable by looking at the X axis variable. A perfect linear relationship is rare, so a regression model is developed using the formula for a straight line, or y = a + b(x), where any x variable can be used to predict the y variable. More specifically, the model can be used to make predictions about how a change in x will relate to a change in y. The x independent variable (IV) is sometimes referred to as a predictor variable, and the y dependent variable (DV) is sometimes referred to as the outcome or criterion variable (Norusis, 2008). The statistical software provides the fixed values for a and b, and arbitrary values of x are entered by the researcher or decision maker to predict y. A word of caution; it is very tempting to state that a change in x causes a change in y but, like correlation, regression does not imply causation in non- experimental research designs.

RCH 5301, Research Design and Methods 6

UNIT x STUDY GUIDE Title

Using the above line of best fit for the bivariate data of salary and years of experience as an example, the model would be represented as:

y (Salary) x (Years of Experience) a (Y intercept) b (Slope) y = a + bx (Regression Model) Salary = 17,017 + (8,294.80) (Years of Experience)

Therefore, if the researcher wanted to predict the annual salary of a person with 12 years of experience, the resulting estimate would be:

$116,554.60 = 17,017 + (8,294.80) (12) Since regression has predictive power through the development of a regression model, whereas correlation does not, there are some nuances that should be noted when interpreting and explaining regression results.

• It is true that simple regression, like correlation, analyzes the relationship between only two variables, an IV and DV, but it is more accurate to state that regression analyzes the relationship between a DV and the regression model. The importance of this verbiage will become clearer when multiple regression is discussed in Unit VI.

• When using regression analysis (simple and multiple), the correlation between the dependent variable and the model is stated in statistical output as Multiple R rather than Pearson’s r.

• Regression analysis (simple and multiple) provides a coefficient of determination (R Square or R2), which is the variability in the DV that is explained by the model.

The usefulness and power of regression analysis is magnified when more than one IV is included in the model to predict changes in a DV. This is the idea behind multiple regression, which is a multivariate statistical procedure that will be covered in Unit VI/Chapter 16. Cross-Tabulation Chi-square (X2) is a common non-parametric inferential statistical procedure that uses cross-tabulations to test hypotheses about either nominal or ordinal variables. For example, a chi-square test would be used to test a hypothesis about cross-tabulation data, such as those displayed in a contingency table. The chi-square test analyzes the relationship between observed frequencies and expected frequencies for two nominal or ordinal variables. The chi-square statistic, X2, indicates the goodness-of-fit between the observed and expected frequencies. Cross-tabulations are easy to interpret and visually intuitive. It should be noted that cross-tabulation is flexible and can be used to analyze univariate, bivariate, or multivariate statistics depending on the number of nominal or ordinal variables that the researcher is interested in comparing (Cooper & Schindler, 2014). t-Test The previous bivariate statistics were used to determine if relationships existed between two variables. Bivariate statistics also include tests to determine if differences exist between variables and groups means. For example, consider these possible research questions:

• Are there differences in mean AA battery life (DV) between Duracell (IV1) and Energizer (IV2)? • Are there differences in mean drug efficacy (DV) between Treatment A (IV1) and Treatment B (IV2)?

The t-test is a bivariate inferential statistical test that could be used to test hypotheses for the above research questions to determine if statistical differences existed between the group means. The t-test is effective and frequently used in social science and physical science because results can be used to make inferences about the effect the IV grouping variable, often called a factor, has on the DV (Salkind, 2010). The IV factors for the

RCH 5301, Research Design and Methods 7

UNIT x STUDY GUIDE Title

above examples are battery type and treatment type respectively. There are, however, two types of t-tests that depend on whether two different groups are being tested or the same group over a period of time. Independent Samples t-test (also called Between Groups test) Using an independent samples t-test, the mean values for a DV are measured and compared between two different groups of people or objects. For example, the product manager for Duracell is interested in determining if the new Duracell AA mean battery life is greater than the new Energizer AA mean battery life. Fifty AA batteries from each brand are tested to determine the mean battery life. An independent samples t- test is then used to test the means of the two independent brands to determine if a statistically significant difference exists between them. Paired Samples t-test (also called Within-Group, Dependent Samples, or Repeated Measure test) Using a paired samples t-test, the mean values for a DV are measured and compared for a single group of people or objects, both before and after treatment or exposure to different conditions. For example, a pharmaceutical company is interested in determining if a new respiratory drug is effective in improving lung capacity. Before treatment, they test the lung capacity of a sample of people to determine the mean capacity. They provide drug treatment to the same sample group and retest lung capacity to determine the mean capacity. A paired samples t-test is then used to test the pretreatment and post treatment means to determine if a statistically significant difference exists between them. Another application of the paired samples t-test would be if the same pharmaceutical company is interested in determining if their new respiratory drug is more effective than their existing drug in improving lung capacity. They treat the sample group with the existing drug and calculate mean lung capacity. They then treat the same sample group with the new drug and calculate mean lung capacity. A paired samples t-test is then used to test the existing drug and new drug means to determine if a statistically significant difference exists between them. The t-test and ANOVA, which will be discussed in Unit VI, are the primary inferential statistical procedures used in controlled experiments to test for mean differences between groups or treatments. However, the t-test is not reserved for experimental research designs only but is also used extensively in non-experimental research applications. Social scientists looking for real-world relationships between variables adopt multiple regression as their primary inferential statistical tool (Field, 2005).

Conclusion Researchers sometimes struggle to determine the most appropriate statistical procedure to use for their studies. The most important question they should ask themselves is whether they are testing for relationships or differences between two variables. If the answer is neither, then they likely need to use descriptive statistics or a univariate test, like a confidence interval or one sample t-test. If the answer is relationships between two variables, then they need to use a bivariate test like correlation, simple regression, or chi- square. If the answer is differences between two variables, they need to use a t-test. This is not the end of the story though. Unit VI will introduce multivariate statistical tests and provide instruction on hypothesis testing for statistical significance.

RCH 5301, Research Design and Methods 8

UNIT x STUDY GUIDE Title

References Bell, E., Bryman, A., & Harley, B. (2022). Business research methods (6th ed.). Oxford University Press.

https://online.vitalsource.com/#/books/9780192640505 Cooper, D. R., & Schindler, P. S. (2014). Business research methods (12th ed.). McGraw-Hill. Creswell, J. W., & Creswell, J. D. (2022). Research design: Qualitative, quantitative, and mixed methods

approaches (6th ed.). SAGE. https://online.vitalsource.com/#/books/9781071817964 Ćurčić M. (2019, February 21). SCATTER PLOT: Definition and examples I BusinessQ [Chart]. BusinessQ

https://businessq-software.com/2019/02/21/scatter_plot/ Field, A. (2005). Discovering stats using SPSS (2nd ed.). SAGE. Neill, J. (2008). Multiple linear regression [Image]. https://www.slideshare.net/jtneill/multiple-linear-regression

Norusis, M. J. (2008). SPSS 16.0 guide to data analysis. Prentice Hall. Phanny, I. (2014). Guideline for interpreting correlation coefficient [Image].

https://www.slideshare.net/phannithrupp/guideline-for-interpreting-correlation-coefficient/2

Salkind, N. J. (2010). Encyclopedia of research design: Causal-comparative design. SAGE.

http://methods.sagepub.com.libraryresources.columbiasouthern.edu/reference/encyc-of-research- design/n42.xml

Zikmund, W. G., Babin, B. J., Carr, J. C., & Griffin, M. (2013). Business research methods (9th ed.). Cengage

Learning.