INTw4Chpter4.docx

Measurement, Statistics, and AppraisalBy Martha Schmidt

To understand God's thoughts we must study statistics, for these are the measure of His purpose. —Florence Nightingale

Essential Questions

· Why should the BSN-prepared nurse read the details within a research article?

· How might published research be flawed?

· How can statistical analysis be demystified when critiquing research?

· Which processes are applied to critically appraise qualitative versus quantitative research?

Introduction

Is there a need for an understanding of statistics in nursing research? Statistics are used every day in nursing and health sciences, and a basic understanding of measurement, statistics, and appraisal is necessary for developing, understanding, and applying research. Think of how many times lab values are based on the normal value for the average person or how frequently patients ask about the odds of a cure being discovered for their type of cancer.

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.future-icon.png

Registered nurses (RN) often have a lack of understanding about measurement, statistics, and the appraisal or critiquing of research studies. Several issues cause this problem. Prior to the technology age, nurses were often instructed using the “see one, do one, teach one” principle, also known as the apprenticeship model. Statistics and research are not required in the associate degree and diploma programs. The Essentials of Baccalaureate Education for Professional Nursing Practice (American Association of Colleges of Nursing [AACN], 2008) provides the educational framework of the baccalaureate nursing programs. “Baccalaureate education provides a basic understanding of how evidence is developed, including the research process, clinical judgment, interprofessional perspectives, and patient preference as applied to practice” (AACN, 2008, p.16). This guide includes basic applied statistics. According to Athanasakis (2013), “Nurses should be educated to another challenge they meet as researchers: the difficulty to understand statistical analysis” (p. 17).

A second issue leading to the lack of understanding statistical analysis was that nursing did not have support from the government in funding research until the National Center for Nursing Research (NCNR) was enacted into law in 1985. Later, the NCNR became a full institute, the National Institute of Nursing Research (NINR), and funding was increased to support grants for studies into topics deemed to be the current important nursing issues.

An understanding of basic statistical terms and methods is important for the health care professional to evaluate research studies critically, and to apply research findings to nursing practice (Zeller, Boerst, & Tabb, 2007). Reviewing these principles is the purpose of this chapter.

Data Collection Process

The data collection process is determined by the type of study and the number of participants needed. It is important that the process is defined to promote quality data.  Nonprobability sampling  involves methods of obtaining participants for qualitative studies. These participants are chosen because they have a disease or experience in relation to the study topic. This is  purposive sampling , and not a random method. A  convenience sample , one that is chosen simply because the participants are available, is often used. An example of this would be the evaluation forms given to a group following an education seminar. A  snowball sampling  is a type of convenience sampling that relies on some people who then are asked to refer others. This could be done through an online pool of people with a disease that the researcher wants to study. Because these convenience samples are not random, they are less likely to give an accurate representation of the whole population; however, in qualitative studies in nursing, participants typically are not placed randomly into groups.

For quantitative studies, a larger pool of research subjects is needed.  Probability sampling  involves random selection of populations in which each participant has an equal opportunity for selection. Randomization trials were performed recently regarding warming of skin preparation solutions while patients were awake. Patients who were receiving pacemakers were randomly placed into the room temperature skin preparation solution group or the warmed skin preparation solution group. Patients reported more comfort during the preparation with the warmed solution, and the skin and body temperature measurement showed less heat loss during preparation. In this way, a comparison between the two groups is valid. Each patient in the pacemaker group had the experience of being in one of the two groups.

Control  of the research in quantitative studies is a tool for managing bias and improving validity of study conclusions. Some controls are randomness and blinding.  Randomizing  subjects’ group assignments strengthens quantitative research by reducing  extraneous variables  or a phenomenon that contaminates the research.

Check for Understanding

Researchers wanted to study the effectiveness of a patient assessment lab for a group of nursing students. The lab was given to students who were randomly placed into two groups that met for four-hour blocks, one beginning in the morning, and one in the afternoon. The morning group’s scores were significantly higher than the afternoon group.

1. Because the only difference between the groups was the time of day, what extraneous variable might have caused the difference?

Double blinding  is another concept that improves data collection. It reduces bias in quantitative research by keeping the treatment unknown to both the researcher and the participant. One example is a study of the effectiveness of stem cell placement into the hearts of congestive heart failure patients. In the study, patients were selected randomly to either the stem cell recipient group or the control group who received no stem cells. In this study, each patient underwent the same procedures, but neither the patient nor the cardiologists knew who had received the stem cells. After two years of collecting physiological measures of heart function and other data, the study was completed and the patients, nurse researchers, and physicians were finally informed which intervention was received. This reduces the placebo effect where patients improve only because they think they are receiving the treatment.

Collecting data in a consistent way is key to collecting accurate data. Having a good plan for the study is important from the beginning to obtain good data. Some steps in the data collection plan include:

1. Addressing the research question. What are the variables to be measured?

2. Describing characteristics about the sample.

3. Identifying valid instruments to collect data.

4. Developing protocols or procedures for collecting data.

Identifying and analyzing these steps in the data collection plan when evaluating a study helps to determine the quality of the data collection. Quantitative research involves concepts of measurement theory to also ascertain the quality of the research.

Concepts of Measurement Theory and Quantitative Research

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.conceptual-icon.png

Understanding the language of research helps when reading studies and making judgements about their quality. Knowledge of the following terms and concepts can aid the health care professional in making decisions regarding the quality of research and whether a specific study should be utilized to improve practice or patient outcomes.

Quantitative research involves measurements that are reproducible and accurate, also known as  direct measures . Direct measures are concrete objects that can be measured, such as heart rate, weight, or temperature. It is important for the researchers to identify what is to be measured and how it is to be measured when planning the study. To critique or develop a study, the following theories and rules have been established to guide how criteria are measured.

According to psychologist Stanley Smith Stevens’s measurement theory, developed in 1946, all measurements in science fall into one of the four  levels of measurement nominal ordinal interval , and  ratio . For research, these levels are used to measure quantitative research variables (see Table 4.1).

Table 4.1

Comparison of Levels of Measurement

Level of Measurement

Characteristics

Statistic Reported

Nominal

Naming

Mode (most frequently occurring score)

Ordinal

Rank order

Frequency counts

Percentages

Interval

Equal numerical distances

Mean

Averages

Ratio

Equal numerical distances and an absolute zero

Mean

Averages

Simply put, nominal scale measurement places names or labels on criteria, such as diagnoses, marital status, and gender. For example, when collecting demographic data on a group of people, one question might be regarding marital status: married, never married, divorced, or widowed. Each person can fit into only one of these groups or categories. In a research study on the health of widowed persons, only the people who fit into the widowed category would be included in the study. Nominal scale measurements cannot be used for statistical computation; however the number of persons in each category can be counted and percentages of persons in each group can be determined mathematically. Nominal measurements are important because some criteria are needed to identify the characteristics of the population of interest.

Ordinal measurements are data that is assigned into a rank category, such as higher or lower, better or worse. Though there is some order to the data, there is an unequal distance between the scores. A group of runners may come in first, second, or third, in a race, but there is no further relationship between their times. In nursing research, this measurement is used in patient satisfaction scores in which the respondent ranks care on a scale of good, fair, or poor. One area where these groupings of rank category are used is with satisfaction studies, such as those performed by Press-Gainey.

Interval measurements have equal numerical distances between each item being measured, and the data is in rank order. Temperature is typically a good example as there are equal distances between each degree mark. Ratio scale measurement is similar to interval measurement, but a zero or starting point is included. For example, when using weight as measurement, the starting point is always zero, and a 250-pound person weighs twice as much as a 125-pound person. A similar example would be measurements, such as inches, meters, or feet. Ratio scale measurements have equal intervals relative to one another and have a zero for a starting point. The interval and ratio scale of measurements are important for the nurse to understand as different from the nominal or ordinal measurement because the interval/ratio levels of measurement, together with how the research question is phrased, dictate which statistical analysis is appropriate. The interval/ratio levels of measurement can have more statistical analysis performed than the nominal and ordinal scale measurements.

Measurement Error

Measurement error  is the difference between the measured object and its true value. Measurements are expected to be accurate. Physiological measurements are often taken in nursing, but they can sometimes be inaccurate. These inaccuracies are referred to as measurement errors. Data collected with errors can change the outcome of the research study. One type of error is  random error , which occurs when there is no pattern to the differences between the measured object and the true value; some are higher measures, and some are lower, but the average is not affected. Random errors can occur when an instrument is affected by some unknown source. For example, a sensitive scale measuring minute weights may be affected by humidity or temperature.

The second type of measurement error is known as a  systematic error . Systemic errors occur when there are differences in one direction, either all greater than true value or less than true value, which does change the average score, for example, a scale that has not been calibrated. To avoid this in a weight-loss study, the instrument must be calibrated, and the participant needs to be weighed at the same time of the day, and with the same amount of clothing. If changes are made in the scale, time of weigh-in daily, or change the amount of clothing worn due to the season, systematic measurement errors will be introduced.

There are three ways to reduce measurement errors. First, pilot test the instrument and obtain feedback from the respondents. Was the measure too easy or too hard, as in a test for nursing skills? Second, the observers should be trained to judge these skills the same. Third, double checking all data by entering it twice. This would be like double checking entries into a balance sheet. Note the measurement errors cannot be eliminated entirely, but the goal is to come as close as possible. This leads to the concept of reliability.

Reliability  is the consistency of the measurement technique as described in the example of the weight-loss study above. This term applies to quantitative research and shows the extent that the data is free from random error. Since there are always some random errors in measurement techniques, reliability is discussed as degrees and usually represented as a  correlational coefficient . Correlation means there is a relationship between variables. Changes in one variable also influences the other variable. This is measured most commonly through a statistic called  Cronbach alpha,  which shows the quality of the measurements of the data collected.

For example, in a test or instrument, are the items measuring the variables of interest? A coefficient of 1.00 indicates perfect reliability and 0.00 would be no reliability. When measurement tools are developed, they are tested with various groups. For a well-developed, tested measurement tool, an acceptable reliability coefficient would be greater than 0.8. If researchers develop their own instrument, then it has not been tested for reliability. A measurement tool that has not been tested would need a coefficient of 0.7 or greater. As noted in the section on measurement error, piloting the instrument or test can help reduce this error. The more a tool is tested, the stronger the reliability can become. This is one reason why instruments or tests already developed are used repeatedly. Testing for reliability needs to be performed even after translation from one language to another.

Pearson’s coefficient (Pearson’s r)  measures the strength of the association between two variables. A scatter plot of the two variables’ measures would be made; conventionally, the  independent variable  (the intervention) on the x-axis (horizontally) and the  dependent variable  (the outcome caused by the intervention) is plotted on the y-axis (vertically) (see Figure 4.1). The closer the points are to a straight line, the more strength there is between the two variables. Often this is termed “if the intervention (x) occurs, then the outcome (y) happens.”

Figure 4.1

Scatter Plot

Figure represents an example scatter plow showing a positive, strong relationship between Variable 1 and Variable 2. As income increases, so does car ownership.

Note. Adapted from “Scatterplots,” by Statistics Canada, 2013.

Validity  means the instrument or tool measures what it is supposed to measure in different settings. Reliability and validity are important properties for a well-developed research study, so utilizing an established instrument or tool for a study is wise.  Internal validity  “is a measure of the extent to which the findings of a study are a true reflection of reality and what is actually happening, rather than arising out of some unquantified reason—such as the data collection tool not actually measuring what it claims to” (Ellis, 2015a, p.117).  External validity  is a term that refers to the extent to which the findings of a study can be generalized and applied to people like those in the research study (Heale & Twycross, 2015). The evaluation of data for accuracy and validity in research studies is important for nursing research students to understand, as interest is great in applying research results from a study to a real-world clinical setting. For example, research conducted on patients in intensive care units found that tight control of blood glucose in diabetics can promote healing; however, these findings are not applicable to patients in acute care units, as the patients are in a less controlled setting. Because the RN has little control over what and when patients are eating or not eating, too many low blood-glucose rates were found. This is dangerous for patients if their glucose drops too low, and healing is slowed when their glucose remains at an elevated level.

Figure 4.2

External and Internal Validity

A diagram representing external validity; can the findings be applied to other settings or people like those in the study. Internal validity is also shown asking if the research was done correctly.

Note. Adapted from “Internal Validity,” by Indiana University Bloomington.

Correlating Terms in Qualitative Research

Different terms are used in qualitative research to show and analyze trustworthiness in the data. In 1985, Lincoln and Guba developed four criteria of rigor in qualitative research:  dependability credibility transferability , and  conformability  (Boswell & Cannon, 2014).  Rigor  is the criteria of accuracy or precision in the research for trustworthiness of data and interpretation of data, which both quantitative and qualitative research require of the researchers.

“Qualitative researchers are not so concerned with reliability but rather with a concept called dependability” (Ellis, 2015b, p. 120). With dependability, the emphasis is not on the stable and reproducible, rather it is about how the researcher responds to and makes sense of changing contexts” (Ellis, 2015b, p.120).  “Qualitative researchers are not so concerned with reliability but rather with a concept called dependability” (Ellis, 2015b, p. 120).This can also be better described as auditability. An audit trail is essential, so other researchers can follow the investigator’s study to see if similar conclusions can be reached. Dependability is also important for replication, which is when a study is repeated with other groups to see if the study results can be  transferred  to another population.

Credibility is the truth value of the data and the analysis. This can be achieved by verifying the information presented by the participants throughout the study, and reviewing the coded data with other researchers and experts in the subject matter.

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.promotion-icon.png

Analysis of the study for bias is conformability. Researcher bias and assumptions should be clearly identified and set aside. Conformability is the extent that the findings are affected by personal interest and bias. An example would be a study in an area the researcher has preconceived ideas in health practices, such as smoking or healthy eating. Often people are identified as noncompliant, for example in a study about diabetic self-care, those opinions of the researcher would need to be kept separate during a qualitative research. One way to evaluate conformability is to determine whether the authors, in reviewing the study, checked with the participants for clarity or discussion of bias?

Theories and Concepts of the Statistical Analysis Process

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.conceptual-icon.png

Statistics theory is a mathematical approach to analyze the relationship between things, describe something, or predict an event. This theory is used in many applications; in nursing and medicine it is used in clinical trials, pharmaceutical testing, and in comparison-testing for evidence-based practice improvements. An example of comparison-testing would be comparing two methods for accessing the hubs of central lines: a standard method of manually scrubbing the hub with alcohol disinfectant prior to use is compared to the use of alcohol-impregnated caps on the hub. The outcome was the reduction in central line infections.

There are two basic types of types of statistics:  descriptive  and  inferential . Descriptive statistics show what the collected data looks like. One example is demographical data, such as a study of a group of 15 to 18-year-old males and their increased risk of accidents. Inferential statistics are used to make comparisons or predictions. Understanding some of these statistical terms and how they are used in nursing research helps the RN analyze the strength of the comparisons made.

Some basic concepts of statistics are the mean, which is also known as the average. In descriptive statistics, an example would be the average number of households who no longer have a landline telephone, but instead only have cell phones. The median is the point in the statistical data where half of the data is above the median, and half is below. This is often used when describing incomes. Perhaps one was curious about the average salary of professional baseball players. Four players make $50,000.00 per season, but the fifth one makes $3,000,000. The average would be $640,000. Most baseball players do not earn this much, as shown in the illustration. The median salary would be $50,000 per season, as two players are below the median, and two are above the median. The median better describes the typical salary of the professional baseball salary.

A third measurement in statistics is  standard deviation (SD) . This measures the variation of the data around the mean or average. In the example of the baseball players’ salary, SD would show the mean, and the one player with the $3 million salary would be further outside of the mean.

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.conceptual-icon.png

Probability theory is the likelihood that certain events will occur only by chance in each situation. Many events in nature cannot be described with perfect certainty. The best one can do is to say how likely the occurrence of a particular event might be. The probability approach states that, in theory, during a coin toss, there is a 50/50 chance that it will land head side up; however, in scientific research, a coin would be tossed, and the number of times it landed head side up would be counted. The number of heads showing would be divided by the total number of coin tosses. Probability values are expressed in a lower-case  p, and values fall on a scale between 0 (impossibility) and 1 (certainty). The arbitrary probability that has been established for a statistically significant difference between two or more sample means is  p<0.05 in most nursing studies. This level of significance is referred to as α (alpha). If the  p value (or probability) found in the statistical analysis is less than 0.05, the experimental and comparison groups are significantly different (i.e., members of different populations). This expresses the chance that the relationship tested in the study did not happen merely by chance. The relationship can actually occur in a given situation. In a research study, knowing the outcome is true, and not occurring just by chance, allows the reader to accept the truth of the relationship between the intervention and outcomes.

Figure 4.3

Decision Tree

Figure represents a decision tree to choose what type of simple statistical test used for different studies. The tree starts with a choice of relationships between variables or comparing means. Choosing relationships between variables, points to decision of what kind of variables. If they are both normal or nominal and/or descriptive, the tree points to chi-square testing. If the variables are both interval and/or ratio, the tree points to correlational testing. On the comparing means of the study, it points to what is being compared. A sample mean to a population mean, a yes points to the z test. If not, it points to the use of a 1 sample t-test. If a sample mean is compared to another sample mean, the next question is: are the samples independent or paired. If independent, it indicates the independent sample t-test should be used. If they are paired, a paired samples t-test should be used.

Note. Adapted from “Decision Tree,” by M. Sajid, 2018. Copyright 2018 by StepUp Analytics.

Decision theory is the process of choosing an alternative in the presence of knowledge that provides some information where there is uncertainty. Decision trees are used often in business, and in nursing multiple decision trees are used. One common tree used in nursing is a Code Blue or arrest situation. The RN can follow a chart that recommends treatment in this situation.

Inference and Generalization

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.safety-icon.png

In discussing how a study is analyzed for validity and application to other groups from a sample,  inference  is a conclusion based on evidence. In research, an inference is a rational, logical conclusion based on the results of statistical analysis from a study of a sample population. An important part of research is to be able to infer the same results can be applied to a larger population because testing the whole population would not be possible. RNs make inferences often throughout their workday. For example, if a patient presents data of a blood glucose reading of 70 and presents signs of shaking, weakness, and sweating, the nurse would infer the presence of hypoglycemia, and begin steps to increase this patient’s glucose intake.

Generalization  is the application of information from the inference findings to the broader population represented by the sample. Differences in sample populations can limit generalization. For example, the results of a study of women smokers in a large city in the Midwestern United States could be generalized to a large city in Australia, but it would be better to replicate the same study with women smokers in a large city in Australia. This will strengthen generalization from the populations identified to women smokers in other large cities.

Normal Curve

The theoretical normal curve (see Figure 4.5) is the distribution of values for a very large single population. It is bell shaped and symmetrical. Many measures closely follow a normal distribution, such as heights of people, size of things produced by machines, grades on a test, and blood pressure or other physiological measure. The normal curve assumption is based on the measurements of many items of interest. IQ is one simple example of something that follows a normal distribution pattern. Most people are of average IQ, the number of people who measure higher or lower than average is smaller and equal, and those whose IQs are extremely higher or lower than average is an even smaller number and yet roughly equivalent. As more and more people are tested, the data plotted becomes closer to the normal curve.

Figure 4.5

Normal Curve

Figure represents the normal (or bell) curve. In testing a large population for IQ scores, a large number (68%) will cluster around the average. A very small number of people will test very high or very low.

The data obtained from a study can be plotted on a graph and known characteristics of the normal curve enables the estimation of a probability of occurrence. In nursing student test exams, most grades would fall around the average (or µ). A few students would score very low or very high, and this would be less than 4.6% of the scores, and 68.2% would cluster above and below the average. Why is this useful? An estimation of population distribution can be made from sample scores when plotted on a graph and the normal distribution of scores form the bell curve. Suppose a researcher is reviewing the number of adults with type 2 diabetes in two different ZIP codes and wanted to know why ZIP Code A had better blood-sugar control than ZIP Code B. The samples from ZIP Code A and ZIP Code B could be plotted on the graph. One might assume that ZIP Code A people had more diabetic education than ZIP Code B.

Figure 4.6

Comparison of Blood Glucose Levels in 2 ZIP Codes and an Average Population

Figure with three curves comparing blood glucose levels in two ZIP codes.

In plotting the numbers of diabetics and their blood sugar control as shown hypothetically in Figure 4.6, ZIP Code A (blue) has better control as shown by the taller normal curve than ZIP Code B (red), is more widespread, showing less blood-sugar control. The black line represents the normal blood-sugar control curve of all people. When comparing diabetic education with blood-sugar control, visual data may not show any difference. When comparing ethnic background in the two ZIP code groups with blood-sugar control, the visual data may reveal a different ethnic mix between the two groups, which can have more influence on the diabetic patient than education alone.

The hypothesis is a clear predictive statement in a quantitative study of the relationship between two or more variables in a specified population. Hypothesis testing first requires that two opposing hypotheses are developed. The first is the null hypothesis, which means nothing is happening, and the second is the alternate hypothesis, which is what the researchers expect will happen. In nursing, when hypotheses are stated, research hypothesis is most often discussed because it is more desirable to state the researcher’s expectations, typically through a PICOT statement.

Type I and Type II Errors

Errors can be made in decision making. A Type I error is also known as a false positive. For example, a researcher testing a new cancer drug concludes that it was the drug that caused the patients' remission when the drug was not effective. A Type II error is the opposite. In this case, the researcher concludes that the cancer drug is not effective, when in fact, it is. Controlling for the risk of making a Type II error can be done by enlarging the sample size.

Concepts of Measurement Strategies

Measurement strategies have been developed by statisticians and mathematicians. The type will vary depending on the type of study. This section will discuss biostatistics, physiological measures, observational measurement, screening tests, interviews, questionnaires, and scales.

Epidemiology and Biostatistics

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.safety-icon.png

“Epidemiology is the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems” (Centers for Disease Control and Prevention, 2012, pp. 1-2). Biostatistics uses statistics and research methodologies, primarily quantitative, to find the causes and risks of various diseases within populations. Biostatistics helps researchers prove whether a treatment is working or to understand factors that contribute to the disease. The populations are from a public health perspective and encompass communities, nations, and the world. Diseases and conditions have been studied for centuries. Lately the study of the outcomes from the Zika virus show the spread of the disease, how it is spread, and how it may be prevented. This included warnings of travel to areas where this virus is known to exist, and even the discovery the virus could be spread through sexual intercourse. Ways to reduce the spread of the virus was an important understanding and application in populations. In addition to disease outbreaks, in 1948 the Framingham Heart Study was initiated to discover what causes heart disease, and participants are still involved today. The Nurses’ Health Study I and II (NHS) is among the largest ongoing and prospective study on women’s health. Members of the study receive an annual survey questionnaire is sent, and occasionally an additional aspect of health is reviewed, such as Alzheimer’s disease. Alzheimer’s disease is costly in the effect on patients, families, caregivers, and opportunities lost. The study asked for volunteers to provide a buccal swab for genetic review for Alzheimer’s disease. These studies are confidential, and participants volunteer. All the aspects of ethics in research are followed. There now is a NHS III recruiting younger nurses both female and male. This experience has been a great way to understand the role of epidemiology and biostatistics. To reach these conclusions, the researchers analyzed the measured biostatistics data accumulated from the surveys provided.

Concepts of Physiologic Measures

Ever wonder why “seasoned” RNs wear their watches with the face of the watch on the inside of their wrists? Many years ago, the pulse rate was counted using the second hand of a watch. Nurses were taught how to easily collect data on physiologic measures such as blood pressure, rhythm, rate and consistency of the heart, lung sounds. These examples of physiologic measures determine how well the body is working. Currently so many machines collect data and record it electronically, and this has really improved nurses’ ability to identify wellness or disease in a concrete, calibrated way.

Accuracy, precision, and error are concepts to address. Accuracy in data collection in many nursing processes is important and is the closeness of a measured value to a standard or known value. When a finding seems unusual for the average patient, the nurse will check the results again. As part of the assessment of the patient, the readings must be accurate. The same is true in data collection for research studies. If a scale or other instrument used in testing is not calibrated, the data is not accurate. Precision is the level of exactness. For example, the precision of manufacturing bolts and nuts for construction is exact by necessity.

Dunemn, Roehrs, and Wilson (2017) stress the use of the most valid and reliable data collection instruments to obtain trustworthy data. In addition, their guide is an excellent step-by-step resource to find and appraise quantitative measurement instruments. Errors in physiologic measurements are grouped into five categories: environment, user, subject, equipment, and interpretation. Examples of environment could be temperature and humidity. The user can improperly use the equipment by incorrectly calibrating equipment or using an instrument that needs to be replaced. Equipment or supplies may be changed during the research study, which could cause false data. Subject errors can occur when the people chosen are not appropriate for the study. An example would be choosing to compare children in a study. Because age can create large differences between children, it would be necessary to narrow the ages of the children studied. Interpretation errors can occur when an unqualified person interprets the results of an EKG. There are many procedures to think about when reviewing research studies for accurate measurements when using instruments. These ideas of accuracy and precision help nurses to review the study for appropriate measurements or collection of data.

Observational measurements include observation of the participant while performing tasks. Examples of this would be utilizing an observational checklist and watching staff perform hand hygiene. If the observation is done by more than one researcher, it is critical that there is inter-rater reliability, meaning that all researchers agree on the measures. Three reviewers are grading an assignment for a nursing course. Two reviewers agree on the main grading criteria of the assignment, but one is grading differently. Thus, a norming process, or retraining of the reviewers to grade similarly is needed. Because observational measurements are more subjective in nature, the strength of the data can be interpreted as less credible than data obtained through instruments.

Screening tests are another area of measurement to review for quality. Screening tests may detect potential diseases or disorders without symptoms of disease. The goal of screening is early detection, lifestyle changes, and early treatment if possible. Examples are mammography and the prostate-sensitive antigen (PSA) test. The quality of screening and diagnostic tests depends upon their specificity and sensitivity. The sensitivity of a test is the rate at which a patient with the disease tests positive for the disease.

Specificity refers to the ability of the test to correctly identify those patients without the disease. False positive results cause anxiety and further, more invasive and expensive testing. In false negative screening, breast tissue appears normal, but cancer is present, which can result in delayed treatment. The mammogram is sensitive enough for revealing abnormalities in breast tissue that it is recommended as a regular examination. On the other hand, the PSA test, which looks for prostate cancer, can be elevated due to several issues, such as age and infection, in addition to cancer and, thus, less specific.

Interviews are developed in several forms for qualitative research. These include questions that are open or closed, and structured, semi-structured, and unstructured interviews and scales. Interviews are either with one respondent or a group. The researcher decides in advance what format it will take. Open-ended questions and unstructured interviewing methods are often used in phenomenological qualitative research, recording in rich detail the person’s experiences. In closed-ended questioning and structured or semi-structured interviews, the questions are developed in advance, which guide the interview. The respondent’s answers are assumed to be true. In this type of data collection, bias and socially appropriate answers can flaw the data.

Questionnaires are detailed items the participant answers, typically alone and not influenced by the researcher. Choosing validated questionnaires saves the researcher’s time because they have already been tested. A questionnaire that is translated into another language would also need to be tested for content and clarity upon translation. A Likert-type scale is often used with already prepared questions. This type of questionnaire can be seen on depression scales and many other instruments. The FACES scale, used for pain, and the FLACC scale (Face, Legs, Activity, Cry, Consolability) used for infants and those with limited communication, are also used often by RNs and researchers. Reviewing the type of study and the questionnaire used is important to reduce bias and to measure what the study question seeks to measure.

There are many tested instruments available in the nursing fields. Dunemn, Rohers, and Wilson (2017) described steps to choosing precise and valid data collection instruments. Step I involves conceptualizing the quantitative study topic. Step II is to locate the instrument or tool that best measures the concepts of the study. These tools are located by searching computerized databases, such as Medline or CINAHL; websites, such as Rand Health; journals, such as the Journal of Nursing Measurement; and books. These validated tools provide important strength to the studies in that they have been tested and validated by repeat studies, reducing errors in measurement. Nurses can review the article for mention of the tools used. The tools also save researchers a great deal of time and energy by utilizing validated instruments. The third step is critical assessment. The purpose of this critical assessment process is to determine the strengths and weaknesses of the selected data collection instrument and ensure that the instrument’s makeup aligns with the needs of the proposed study. The fourth and final step is when the researcher decides whether to use the chosen study instrument. This would involve looking at the fit between research design, the chosen instrument’s ability to measure the concepts, and whether extensive modification might be needed. If so, testing of the modified instrument might be necessary, incurring extra time and expense.

Elements of the Statistical Analysis Process

Elements of the statistical analysis process includes a rich description of the study sample. Demographic variables, such as sex, marital status, and age, are all generally included along with the clinical variable of interest, such as diabetes or heart failure.

Missing data is the data not collected for the variable the study is interested in measuring. For example, a respondent in a survey may not include answers to all the questions being asked. Missing data almost always happens, and is most commonly dealt with by omitting the missing data and analyzing what is left. When the number of respondents is large, then this method would not change the study results appreciably. The best way to deal with missing data is to design the study well and carefully collect the data. The target sample size may be increased to compensate for possible missing data (Kang, 2013). Missing data does not typically influence the study outcomes, unless it is excessive.

Reliability of measurement methods refers to the instrument measuring what it is designed to and the consistency of it across time. One method to measure reliability is the test-retest method in which the test is administered at least twice to the same group. Another method to review reliability is through exploratory analysis, which is a visual representation of the data. The data in nursing research is often organized numerically (e.g., temperature, blood pressure) or categorically (e.g., gender, marital status). Bar charts and graphs are good ways to review reasons for outliers and distributions of the data. Measures of central tendency (e.g., mean) and dispersion (e.g., variance and SD) are described.

Use of Statistics for Inferential Statistical Analysis,

Inferential statistical analysis is a generalization from a sample to the whole. In this section, descriptions to examine relationships will be discussed and tests to predict outcomes will be reviewed. Researchers cannot prove their inferences or generality is correct, but they can examine relationships and predict outcomes. This allows statistical analysis through use of normal distributions or the bell curve discussed earlier.

Using Statistics to Examine Relationships Among Variables or Causal Relationships

Statistical analysis is used in quantitative research to examine the relationship strength between the intervention (independent variable) and the outcome (dependent variable). This is important to the nurse because the knowledge of correlation and the strength of the cause and effect that exists between these variables helps to determine the strength of the research. Figure 4.3 shows a decision tree revealing what kind of test may be done in statistical analysis for basic research.

Figure 4.3

Decision Tree for Statistical Analysis of Research Studies

Figure represents a decision tree to choose what type of simple statistical test used for different studies. The tree starts with a choice of relationships between variables or comparing means. Choosing relationships between variables, points to decision of what kind of variables. If they are both normal or nominal and/or descriptive, the tree points to chi-square testing. If the variables are both interval and/or ratio, the tree points to correlational testing. On the comparing means of the study, it points to what is being compared. A sample mean to a population mean, a yes points to the z test. If not, it points to the use of a 1 sample t-test. If a sample mean is compared to another sample mean, the next question is: are the samples independent or paired. If independent, it indicates the independent sample t-test should be used. If they are paired, a paired samples t-test should be used.

Note. Adapted from “Decision Tree,” by M. Sajid, 2018. Copyright 2018 by StepUp Analytics.

The  chi square  independence test is a procedure for testing whether two categorical variables are related in some population. Note, these would be of nominal or ordinal level numbers. One example where this test would be used, is if the researcher was testing if married or never married people had obtained a higher educational level. Pearson’s correlation (Pearson’s r) tests the strength of the relationship between two variables. This would be of the interval/ratio level data. An example would be determining the relationship between height and weight in a group of people. It does not measure cause and effect.

One of the most common tests in statistics, the t-test, is used to determine whether the means of two groups are equal to each other. The assumption for the test is that both groups are sampled from normal distributions with equal variances. (UC Business Analytics R Programming Guide, n.d., para. 1).

When the sample number is below 30, the z test is used, however, they essentially show the same thing. Scatter plots can be used to show relationships between variables. For example, Figure 4.7 shows a high positive correlation between variables, Figure 4.8 shows a negative relationship between the variables, and Figure 4.9 shows no relationship between the variables.

Figure 4.7

Positive Relationship Between Variables

Figure represents a scatter plot of data showing the positive relationship between the temperature and the sales of ice cream. The higher the temperature, the higher the sales of the ice cream.

Note. Adapted from “Scatter Plots,” by Math is Fun, 2017.

Figure 4.8

Inverse Relationship Between Variables

Figure represents a scatter plot showing an inverse relationship; as the age of the person increases, the number of hours of a person spends playing video games each day decreases.

Note. Adapted from “Scatterplots,” by Statistics Canada, 2013.

Figure 4. 9

No Correlation Between Variables

Figure shows no correlation between the daily ozone levels and the day of the year.

Note. Adapted from “Effects of Transformations,” by IBM.

These scatter plots are visual examples that the researcher should utilize to show relationships or cause and effect between the variables to be measured. If the scatter plots show no correlation between the data, then there is no true cause and effect indicated; however, there is no way to determine 100% cause and effect or lack of it because of the inability to control for extraneous variables. A positive correlation indicates a cause and effect between the variables. A negative correlation shows an inverse relationship between the variables. The reviewer should consider evidence of validity from others who have utilized the scale.

Regression analysis is another method to determine cause and effect when the correlation is positive (see Figure 4.7). It predicts the strength of the relationship between the independent variable and the dependent variable and is also used to predict outcomes. Regression analysis is important because it helps to further understand the cause-and-effect factors in research studies.

The Need for Critical Appraisal of Research Studies

Critical appraisal is necessary to determine the appropriateness of the research to be implemented into practice. Although in nursing research must be peer reviewed to be published in journals, there are still some weaknesses or errors in all studies. In interpreting research outcomes, much of the information is in the discussion section of the research study. The types of results are significant and expected, nonsignificant, significant but opposite of what is expected, mixed, and unexpected. Findings are the results of the study. The significance of findings is the ability for the study to contribute to nursing knowledge. All nursing studies contribute something, but beyond the statistical significance of the findings, clinical importance of findings needs to be evaluated to determine whether the patient’s response to treatment is enough to have implications for nursing care. While the results of statistical analysis may not demonstrate statistical significance, it may in fact demonstrate clinical significance. Examples of clinical significance are often found in the clinician’s observations and measurements, such as weight loss, reducing the patient’s blood pressure, or daily walks improving symptoms of CHF, and in the patient’s perception, often in relief of pain or other distress.

Limitations are also revealed in the discussions section of the report. Some limitations include a small sample size or difficulty in collecting data. Some limitations are theoretical, as in the development of the framework of the study or definitions of variables. Limitations do not disqualify the study for inclusion into practice changes, but they are important to review and search for other stronger studies for analysis for the practice problem.

Conclusions

Conclusions are drawn from the scientific basis of the study. Was the hypothesis proved or disproved? In the study results, was the outcome expected? Generalization of the study findings from the sample to a larger population is important. Can this study, through the statistical findings be applied to other populations? For example, early research on heart disease and heart attacks was done on white men. This research was falsely generalized to women and other ethnic groups. This led to recommendations for further research on women and African Americans, and other ethnicities.

Implications for Nursing

These findings of the critical appraisal are discussed in the results and discussion sections of the research report. These sections guide the nurse to understand the outcomes, their strengths in this study, and the clinical application to evidence-based practice. Implications for nursing are generalized from the conclusions of scientific research for application to the body of nursing knowledge, theory, or practice. Implications provide specific suggestions for implementing the findings in the nursing field, in education, practice settings, and administration.

Recommendations for further studies are revealed in the discussion section of the study, revealing the ideas the researchers have for study directions they feel would benefit nursing research on the topic studied. This may give researchers and students the ideas and opportunities for further research or the direction in which to look for further research on the specified topics. Often searching for a closely related qualitative, quantitative, or mixed methods study in the topic area leads to further knowledge about the subject.

Research, Evidence-Based Practice, and Quality Improvement

·

https://lc.gcumedia.com/nrs433v/nursing-research-understanding-methods-for-best-practice/v1.1/chapter-images/nrs433.teal.future-icon.png

Research, evidence-based practice, and  quality improvement  all have a basis in the development and utilization of research studies, and all can improve clinical outcomes. Research generates new knowledge and adds to nursing’s database. EBP translates knowledge with a goal of improving practice and outcomes. Quality improvement aims to improve patient care processes and outcomes in specific health issues through research and EBP (Ginex, 2017).

Checks for Understanding:

1. Why is an understanding of basic statistical terms and methods important for the health care professional to understand?

2. What are three ways to reduce measurement error in data collection? Choose one and discuss an application.

3. Choose a decision tree from your workplace, and discuss how it is useful in decision making when information may be limited

4. Within data collection, what types of control are important? Why?

Reflective Summary

Research study outcomes in nursing are dependent on a clear description of the issue of interest to the researcher. In order to accept a study for evidence-based practice guidelines and implications, the nurse needs to analyze the quality through statistical applications. Thus, an understanding of the concepts and vocabulary of basic statistical terms are warranted. These will include concepts of measurement theory, validity, reliability, and generalization. Probability theories and understanding of management theories and appropriate data collection processes enable the nurse to analyze a study.

Key Terms

Chi Square: A procedure for testing whether two categorical variables are related in some population.

Conformability: The extent that the findings are affected by personal interest and bias.

Control: The process of consistent management of influences on the dependent variable of the study.

Convenience Sample: A sample population is made up of people who are easy to reach.

Correlation Coefficient: A statistical number that summarizes the strength of relationship between variables.

Credibility: The quality of being trustworthy or believable.

Cronbach Alpha Coefficient: A reliability index that indicates the degree to which the subparts of an instrument are all measuring the same concept.

Dependability: The findings would be repeated if the research was replicated in the same context with the same subjects.

Dependent Variable: The outcome variable. It takes on different values in response to the independent variable.

Descriptive Statistics: Statistics that describe the basic features of the data in a study. They provide simple summaries about the sample and the measures.

Direct Measures: Concrete objects that can be measured, such as heart rate, weight, or temperature.

Double-Blind Study: Experiments in which neither the participants nor the investigators know which participant will receive the experimental drug, treatment, or procedure.

External Validity: The ability of the study results can be generalized to settings or subjects other than the one studied.

Extraneous Variables: A variable that can influence the relationship between the independent and dependent variables; can be controlled either through research design or statistical procedures; were not foreseen or known at the beginning of the study.

Generalization: The degree the findings can be generalized from a sample to a larger population.

Inference: A conclusion reached on the basis of evidence and reasoning.

Inferential Statistics: Statistics that support conclusions based on evidence and reasoning

Internal Validity: The ability of the researcher to minimize external influence on the data achieved in the study.

Interval Level of Measurement: A level of measurement where the variable has rank order, and equal distances on the points of the scale

Levels of Measurement: A system of classifying measurements developed by the measure of permissible mathematical processes

Measurement Error: The difference between a measured value of a quantity and its true value.

Nominal Level of Measurement: Used to name or categorize things; the first level of measurement.

Nonprobability Sampling: A sample population selected by nonrandom methods

Ordinal Level of Measurement: A measurement level that places data in a rank order

Pearson’s Coefficient (Pearson’s r): A correlational coefficient that measures the relationship between two measurements.

Probability Sampling: A sample population chosen by random selection.

Purposive Sampling: A method of sampling in which subjects are selected because they are anticipated to add value to the study results.

Quality Improvement: Systematic and continuous actions that lead to measurable improvement in health care services and the health status of targeted patient groups.

Ratio Level of Measurement: A measurement level with equal distances between the points and a zero-starting point

Random Error: Is always present in a measurement. It is caused by inherently unpredictable fluctuations in the readings of a measurement apparatus or in the experimenter's interpretation of the instrumental reading.

Randomized: Assignment of subjects to separate groups in an unplanned manner.

Reliability: The extent to which an experiment, test, or measuring procedure yields the same results on repeated trials.

Rigor: The accuracy and consistency in data collection.

Significance: The probability that a research relationship could be caused by chance

Snowball Sampling: Sampling method in which one subject helps to recruit or suggest another subject.

Standard Deviation (SD): A measure that is used to quantify the amount of variation or dispersion of a set of data values.

Systematic Error: When the instrument is not accurate, and the data is then inflated or deflated in measurement

Transferability: How applicable the results are to other subjects and in another context.

Validity: The extent to which a concept, conclusion or measurement is well-founded and corresponds accurately to the real world.

References

American Association of Colleges of Nursing. (2008). The essentials of baccalaureate education for professional nursing practice. Retrieved from http://www.aacnnursing.org/Portals/42/Publications/BaccEssentials08.pdf

Athanasakis, E. (2013). Nurses’ research behavior and barriers to research utilization into clinical nursing practice: A closer look.  International Journal of Caring Sciences, 6, 16-28.

Boswell, C. & Cannon, S. (2014).  Introduction to nursing research: Incorporating evidence-based practice (3rd ed.). Burlington, MA: Jones & Bartlett Learning.

Center for Disease Control and Prevention. (2012).  Principles of epidemiology in public health practice (3rd ed.). Atlanta, GA: Author.

Dunemn, K., Roehrs, C. & Wilson, V. (2017). Critical appraisal and selection of data collection instruments: A step-by-step guide . Journal of Nursing Education and Practice, 7(3), 77–82. doi: 10.5430/jnep.v7n3p77

Ellis, P. (2015a). The language of research (Part 5): Research terminology – Validity.  Wounds UK, 11(2), 116-117.

Ellis, P. (2015b). The language of research (Part 7): Research terminology—Reliability.  Wounds UK, 11(4), 120-121.

Ginex, P. (2017). The difference between quality improvement, evidence-based practice and research. Retrieved from https://voice.ons.org/news-and-views/oncology-research-quality-improvement-evidence-based-practice

Heale, R. & Twycross, A. (2015). Validity and reliability in quantitative studies.  Evidence Based Nursing, 18(3), 66-67, doi: 10.1136/eb-2015-102129.

Kang, H. (2013). The prevention and handling of the missing data.  Korean Journal of Anesthesiology, 64(5), 401-406, doi: 10.4097/kjae.2013.64.5.402

Thompson, C. J. (2017). What does “critical appraisal” mean in evidence-based practice? Retrieved from https://nursingeducationexpert.com/critical-appraisal/

UC Business Analytics R Programming Guide. (n.d.). T-test: Comparing group means. Retrieved from https://uc-r.github.io/t_test

Zeller, K., Boerst, C.J., & Tabb, W. (2007). Statistics used in current nursing research.  Journal of Nursing Education, 46, 55-59.

image2.png

image3.png

image4.png

image5.png

image6.png

image7.png

image1.png

image8.png

image9.png

image10.png

image11.png

image12.png