A7 - Field Notes Interviews
Creswell, J. W., & Poth, C. N. (2025). Qualitative inquiry & research design: Choosing among five approaches (5th ed.). Sage Publications.
10 STANDARDS OF VALIDATION AND EVALUATION
Qualitative researchers strive for “understanding,” that deep structure of knowledge that comes from visiting personally with participants, spending extensive time in the field, and probing to obtain detailed meanings. During or after a study, qualitative researchers ask, “Did we get it right?” (Stake, 1995, p. 107) or “Did we publish a ‘wrong’ or inaccurate account?” (J. Thomas, 1993, p. 39). Is it possible to even have a right answer? To answer these questions, researchers need to look to themselves, to the participants, and to the readers. There are multi- or polyvocal discourses at work here that provide insight into the validation and evaluation of a qualitative research account.
In this chapter, we address two interrelated questions: Is the account valid, and by whose standards? and How do we evaluate the quality of qualitative research? Answers to these questions take us into the many perspectives on validation to emerge within the qualitative community and the multiple standards for evaluation discussed by authors with procedural, interpretive, and postmodern perspectives. Then, we examine the quality criteria for each of the five approaches to inquiry and compare standards of evaluation across the approaches. To represent the diversity of qualitative perspectives, we weave our own stances in with those of others from the literature.
VALIDATION AND RELIABILITY IN QUALITATIVE RESEARCH
Validation of qualitative research occurs throughout the research process, yet it is often only in the written study description that a reader becomes aware of the procedures researchers take to ensure the accuracy of their findings and generate evidence of credibility. It is important that we define our qualitative approach to validity and reliability. We define qualitative validity as the researcher checking for the accuracy of the findings by employing certain procedures (Creswell & Creswell, 2023). Qualitative reliability indicates that the researcher’s approach is consistent across different researchers and among different projects (Gibbs, 2018). In the sections to follow, we introduce some of the diverse perspectives that exist regarding the importance of validation in qualitative research and the terms and procedures used by researchers to describe and establish a valid account. We then discuss reliability related to the procedures qualitative researchers use to promote and assess coding consistency across multiple researchers and data sets.
Perspectives on Validation Within Qualitative Research
We begin by recognizing the diverse perspectives within the qualitative community for description of and procedures for establishing the accuracy of a research account. Our view of validation as an evolving construct means that a broad understanding of both traditional and contemporary perspectives is essential for informing the work and reading of qualitative research. In Table 10.1, we present a sample of the perspectives, arranged chronologically, available on validation in the qualitative literature. These perspectives represent positioning validation in qualitative research in a range of ways including (but not limited to) terms of quantitative equivalents, postmodern and interpretive lenses, metaphors used for visualization, and present validity use. Evidence of evolving thinking about validation in qualitative research is apparent within and across authors, and we conclude this section by advancing a rationale for our own stance.
Writers have searched for and found qualitative equivalents that parallel traditional quantitative approaches to validation. LeCompte and Goetz (1982) took this approach when they compared the issues of validation and reliability to their counterparts in experimental design and survey research. They contended that qualitative research had garnered much criticism in the scientific ranks for its failure to “adhere to canons of reliability and validation” (LeCompte & Goetz, 1982, p. 31) in the traditional sense. They applied threats to internal validation in experimental research to ethnographic research (e.g., history and maturation, observer effects, selection and regression, mortality, spurious conclusions). They further identified threats to external validation as “effects that obstruct or reduce a study’s comparability or translatability” (LeCompte & Goetz, 1982, p. 51).
Some writers argue that authors who use positivist terminology facilitate the acceptance of qualitative research in a traditionally focused quantitative world. Ely and colleagues asserted that using quantitative terms tended to be a defensive measure that muddied the waters and that “the language of positivistic research is not congruent with or adequate to qualitative work” (1991, p. 95). Lincoln and Guba (1985) offered alternative terms that, they contended, adhere more to naturalistic research. To establish the “trustworthiness” of a qualitative study, Lincoln and Guba (1985) advanced the use of unique terms, such as credibility, authenticity, transferability, dependability, and confirmability as “the naturalist’s equivalents” for the qualitative equivalent of quantitative internal validation, external validation, reliability, and objectivity (p. 300). To operationalize these new terms, they proposed techniques such as prolonged engagement in the field and the triangulation of data sources, methods, and investigators to establish credibility. They described “thick description” as necessary to ensure the transferability of findings between the researcher and those being studied. Rather than a focus on the reliability of an approach, researchers sought dependability that the results would be subject to change and instability. The naturalistic researcher looked for confirmability rather than objectivity in establishing the value of the data. Both dependability and confirmability were established through an auditing of the research process. We find the Lincoln and Guba criteria still popular today in qualitative reports.
Rather than using the term validation, Eisner (1991) constructed standards such as structural corroboration, consensual validation, and referential adequacy as evidence for asserting the credibility of qualitative research. In structural corroboration, the researcher used multiple types of data to support or contradict the interpretation. Eisner described his purpose to “seek a confluence of evidence that breeds credibility, that allows us to feel confident about our observations, interpretations, and conclusions” (1991, p. 110). He further illustrated this purpose with an analogy drawn from detective work: The researcher compiles bits and pieces of evidence to formulate a “compelling whole.” At this stage, the researcher looked for recurring behaviors or actions and considered disconfirming evidence and contrary interpretations. Moreover, Eisner recommended that to demonstrate credibility, the weight of evidence should become persuasive. In consensual validation, the researcher sought the opinion of others, which Eisner described as “an agreement among competent others that the description, interpretation, evaluation, and thematics of an educational situation are right” (1991, p. 112). Referential adequacy suggested the importance of criticism, and Eisner described the goal of criticism as illuminating the subject matter and bringing about more complex and sensitive human perception and understanding.
Qualitative researchers also have reconceptualized validation with a postmodern perspective. Lather (1991) initially identified four types of validation (i.e., triangulation, construct validation, face validation, and catalytic validation) as a “reconceptualization of validation.” Lather commented that “paradigmatic uncertainty in the human sciences is leading to the re-conceptualizing of validation” and called for “new techniques and concepts for obtaining and defining trustworthy data which avoids the pitfalls of orthodox notions of validation” (1991, p. 66). For Lather, the character of a social science report changed from a closed narrative with a tight argument structure to a more open narrative with holes and questions and an admission of situatedness and partiality. In Getting Smart, Lather (1991) described triangulation as drawing upon multiple data sources, methods, and theoretical schemes. Construct validation recognized the constructs that existed rather than imposing theories or constructs on informants or the context. Face validation was best explained by Kidder (1982) as “a ‘click of recognition’ and a ‘yes, of course,’ instead of ‘yes, but’ experience” (p. 56). Catalytic validation energized participants to consider knowing reality to transform it.
In a later article, Lather’s (1993) terms became more unique and closely related to feminist research in “four frames of validation.” The first, ironic validation, was where the researcher presented truth as a problem. The second, paralogic validation, was concerned with undecidables, limits, paradoxes, and complexities. This validation moved away from theorizing things and toward providing direct exposure to other voices in an almost unmediated way. The third, rhizomatic validation, pertained to questioning proliferations, crossings, and overlaps without underlying structures or deeply rooted connections. The researcher should also question taxonomies, constructs, and interconnected networks in which the reader jumped from one assemblage to another and consequently moved from judgment to understanding. The fourth type was situated, embodied, or voluptuous validation, which meant that the researcher sets out to understand more than one can know and to write toward what one does not understand.
Other writers, such as Wolcott (1990a), had little use for validation, suggesting that “validation neither guides nor informs” his work (p. 136). He did not dismiss validation but rather placed it in a broader perspective. Wolcott’s (1990a) goal was to identify “critical elements” and write “plausible interpretations from them” (p. 146). He ultimately tried to understand rather than convince, and he voiced the view that validation distracted from his work of understanding what was really going on. Wolcott (1990a, 1994) claimed that the term validation did not capture the essence of what he sought, adding that perhaps someone would coin a term more appropriate for the naturalistic paradigm. But for now, he said, the term understanding seemed to encapsulate the idea as well as any other (Wolcott, 1994).
Validation has also been cast within an interpretive approach to qualitative research marked by a focus on the importance of the researcher; a lack of truth in validation; a form of validation based on negotiation and dialogue with participants; and interpretations that are temporal, located, and always open to reinterpretation (Angen, 2000). Angen suggested that within interpretative research, validation was “a judgment of the trustworthiness or goodness of a piece of research” (2000, p. 387). She espoused an ongoing, open dialogue on the topic of what made interpretive research worthy of our trust. Considerations of validation were not definitive as the final word on the topic, nor should every study be required to address them. Further, she advanced two types of validation: ethical validation and substantive validation. Ethical validation meant that all research agendas must question their underlying moral assumptions, their political and ethical implications, and the equitable treatment of diverse voices. It also required research to provide some practical answers to questions. Angen (2000) also proposed that our interpretive qualitative research should have a “generative promise” (p. 389) and raise new possibilities, open up new questions, stimulate new dialogue, and provide nondogmatic answers to the questions posed. In so doing, interpretive qualitative research must have transformative value leading to action and change. Substantive validation meant understanding one’s own topic, understandings derived from other sources, and the documentation of this process in the written study. Self-reflection contributed to the validation of the work. The interpretive qualitative researcher, as a sociohistorical interpreter, interacted with the subject matter to co-create the interpretations derived. Understandings derived from previous research gave substance to the inquiry. Interpretive research also was a chain of interpretations that must be documented for others to judge the trustworthiness of the meanings arrived at in the end. Written accounts must resonate with their intended audiences, and must be compelling, powerful, and convincing.
A synthesis of validation perspectives comes from Whittemore et al. (2001), who analyzed 13 writings about validation and extracted key validation criteria from these studies. They organized these criteria into primary and secondary criteria. They found four primary criteria: credibility (Are the results an accurate interpretation of the participants’ meaning?); authenticity (Are different voices heard?); criticality (Is there a critical appraisal of all aspects of the research?); and integrity (Are the investigators self-critical?). Secondary criteria related to explicitness, vividness, creativity, thoroughness, congruence, and sensitivity. In summary, with these criteria, it seemed like the validation standard had moved toward the interpretive lens of qualitative research, with an emphasis on researcher reflexivity and on researcher challenges that included raising questions about the ideas developed during a research study.
A postmodern perspective drew on the metaphorical image of a crystal. Richardson (in Richardson & St. Pierre, 2005) described this image:
A final perspective was drawn from Lincoln et al. (2011). They captured the many perspectives developed through the years. They suggested that the question of validity criteria is not whether we should have such criteria or whose criteria the scientific community might adopt, but rather how criteria needed to be developed within the projected transformations being suggested by social scientists. To this end, they reviewed establishing authenticity but framed it within the perspectives of a balance of views, raising the level of awareness among participants and other stakeholders, and advancing action on the part of research participants and their training. Lincoln et al. (2011) also saw a role for validity in understanding hidden assumptions and ethical relationships with research participants through such standards as positioning themselves, having discourses, encouraging voices, and being self-reflective.
Given the many perspectives of validation within the qualitative research community, we acknowledge the ongoing debate regarding the many types of and terms for validation in qualitative research (e.g., Beck, 2020). We suggest the need for authors to choose the types and terms with which they are comfortable and reference the terms and strategies writers use.
To summarize our own stance, we consider “validation” in qualitative research to be an attempt to assess the “accuracy” of the findings, as best described by the researcher, the participants, and the readers (or reviewers). This view also suggests that any report of research is a representation by the author. We also view validation as a distinct strength of qualitative research by the researcher spending extensive time in the field setting, detailing thick description in their work, and remaining close to participants in the study. We use the term validation to emphasize a process (see Angen, 2000; Hayashi et al., 2019), rather than verification (which has quantitative overtones) or historical words such as trustworthiness and authenticity (recognizing that many qualitative writers do return to these words, suggesting the “staying power” of Lincoln and Guba’s, 1985, standards; see Whittemore et al., 2001). Our adoption of a “process view” to validation is based on the longstanding premise that the question of whether the account is valid cannot and should not be based on a single strategy. Instead, validation strategies generating evidence should be embedded throughout the qualitative research process and be sufficiently flexible to the multiple contexts in which the qualitative research takes place. As an illustration, we suggest Hayashi et al.’s (2019) reflections highlighting contextual influences on validation strategies throughout their research process.
The subject of validation arises in several of the approaches to qualitative research (e.g., Corbin & Strauss, 2015; Riessman, 2008; Stake, 1995), but we do not think that distinct validation approaches exist for the five approaches to qualitative research. At best, there might be less emphasis on validation in narrative research and more emphasis on it in grounded theory, case study, and ethnography, especially when the authors of these approaches want to employ systematic procedures. We recommend using multiple validation strategies regardless of the type of qualitative approach.
Our framework for thinking about validation in qualitative research is to suggest that researchers employ accepted strategies to document the accuracy of their studies. These we call validation strategies.
Validation Strategies
It is not enough for researchers to be able to define and describe the essential role of validation in qualitative research. They need to be able to translate their understandings of validation into practice as strategies. We describe nine strategies frequently used by qualitative researchers during the process of validation adapted from the work of Creswell and Miller (2000) and provide some general guidance about how we go about implementing these strategies (see Figure 10.1).
Figure 10.1 Strategies for Validation in Qualitative Research
The strategies are not presented in any specific order of importance but are organized in three groups by the lens the strategy represents: researcher’s lens, participant’s lens, and reader’s or reviewer’s lens (Creswell & Báez, 2021).
Researcher’s Lens
Among the many roles a researcher undertakes is checking the accuracy of a qualitative account and any of the following validation strategies can assist the researcher in this effort:
Corroborating evidence through triangulation of multiple data sources. The researcher makes use of multiple and different sources, methods, investigators, and theories to provide corroborating evidence (Bazeley, 2021; Ely et al., 1991; Erlandson et al., 1993; Gibbs, 2018; Glesne, 2016; Lincoln & Guba, 1985; Miles & Huberman, 1994; Patton, 1980, 1990, 2015; Yin, 2017). Typically, this process involves corroborating evidence from different sources or researchers to shed light on a theme or perspective. When qualitative researchers locate evidence to document a code or theme in different sources of data, they are triangulating information and providing validity to their findings. For this validation strategy, we begin considering how various data sources can be used in tandem when planning the study. Then, as data is collected, we further explore evidence of corroboration and use these insights in our interpretation and writing. See Goodrum et al. (2022; Appendix E) for a description of their data triangulation efforts involving multiple components and official documents.
Discovering a negative case analysis or disconfirming evidence. The researcher refines working hypotheses as the inquiry advances in light of negative or rival evidence (Ely et al., 1991; Lincoln & Guba, 1985; Miles & Huberman, 1994; Patton, 1980, 1990, 2015; Yin, 2017). Not all evidence will fit the pattern of a code or a theme. It is necessary then to report this negative evidence, and in doing so, the researcher provides a realistic assessment of the phenomenon under study. In real life, not all evidence is either positive or negative; it is some of both. For this validation strategy, we both admit that we tend to be attentive to such evidence and make a point of following what we call “points of intrigue” throughout the study. We find these points are often our key points of discussion in our writing.
Clarifying researcher bias or engaging in reflexivity. The researcher discloses their understandings about the biases, values, and experiences that they bring to a qualitative research study from the outset of the study so that the reader understands the position from which the researcher undertakes the inquiry (Hammersley & Atkinson, 1995, 2019; Merriam & Tisdell, 2015). In this clarification, according to Weiner-Levey and Popper-Giveon (2013), the researcher illuminates what they call the “dark matter” that is often omitted in qualitative research by commenting on past experiences, biases, prejudices, and orientations that have likely shaped the interpretation and approach to the study. For this validation strategy, we embed opportunities throughout a study for writing and discussing connections that emerge with our past experiences and perspectives. See Trip et al. (2019; Appendix C) for their description of how memoing helped them engage in reflexivity.
Participant’s Lens
Participants can play an important role in the following validation strategies:
Member checking or seeking participant feedback. The researcher solicits participants’ views of the credibility of the findings and interpretations (Bazeley, 2021; Ely et al., 1991; Erlandson et al., 1993; Glesne, 2016; Lincoln & Guba, 1985; Merriam & Tisdell, 2015; Miles & Huberman, 1994). This technique is considered by Lincoln and Guba (1985) to be “the most critical technique for establishing credibility” (p. 314). This approach, writ large in most qualitative studies, involves taking data, analyses, interpretations, and conclusions back to the participants so that they can judge the accuracy and credibility of the account. According to Stake (1995), participants should “play a major role directing as well as acting in case study” research (p. 115). They should be asked to examine rough drafts of the researcher’s work and to provide alternative language, “critical observations or interpretations” (Stake, 1995, p. 115). In so doing, participants play a critical role because they are asked “how well the ongoing data analysis represents their experience” (Hays & Singh, 2012, p. 206). For this validation strategy, we often convene a focus group made up of participants in the study either in person or virtually and ask them to reflect on the accuracy of the account. We do not take back to participants the transcripts or the raw data, but the preliminary analyses consisting of description or themes. We are interested in their views of these written analyses as well as what was missing. Another option is to use e-mail to share a summary of the preliminary findings with focus group participants and seek written feedback. See Richards (2021) for practical guidance about how to interpret participant feedback. Also see Chance (2022; Appendix B) for her description of member checking with participants.
Having a prolonged engagement and persistent observation in the field. The researcher makes field-based decisions about what is salient to study, relevant to the purpose of the study, and of interest for focus. Researchers build rapport with participants and gatekeepers, learn the culture and context, and check for misinformation that stems from distortions introduced by themselves or informants (Ely et al., 1991; Erlandson et al., 1993; Glesne, 2016; Lincoln & Guba, 1985; Merriam & Tisdell, 2015). Fetterman (2019) contended that “participant observation requires close, long-term contact with the people under study” (p. 50). For this validation strategy, we spend as much time in the field as feasible during the study. We begin data collection by familiarizing ourselves with the site and participants. Throughout the study we also reflect upon our observations and documents as our understandings evolve and emerge. See Mac an Ghaill and Haywood (2015; Appendix D) for their description of their 3-year ethnographic work and how access to the community was “greatly enabled by our being known for our social commitment to the local area, working with families in the local community” (p. 111)
Collaborating with participants. The researcher embeds opportunities for participants to be involved throughout the research process in varying ways and degrees. Among the various ways is involvement in key research decisions such as developing data collection protocols and contributing to data analysis and interpretation. The degree to which participants are involved can vary along a continuum from minimal to extensive. Participant involvement is based on the idea (and ever-growing body of research) that the study is more likely to be supported and findings used when participants are involved (Patton, 2011, 2015). For this validation strategy, we are often guided by community-based participatory research practices that involve participants as co-researchers in the study (see, e.g., Hacker, 2013, Wallerstein et al., 2018, for further discussions).
Reader’s or Reviewer’s Lens
Including others beyond the researcher and those involved in the research contribute in the following validation strategies:
Enabling external audits. The researcher facilitates auditing by securing the services of an external consultant, the auditor, to examine both the process and the product of the account to assess their accuracy (Erlandson et al., 1993; Lincoln & Guba, 1985; Merriam & Tisdell, 2015; Miles & Huberman, 1994). This auditor should have no connection to the study. In assessing the study, the auditor examines whether or not the findings, interpretations, and conclusions are supported by the data. Lincoln and Guba (1985) compared this, metaphorically, with a fiscal audit, and the procedure provided a sense of interrater reliability to a study. This process can be assisted by the creation of documentation, sometimes referred to as an audit trail and described by Silver and Lewins (2014) as “comprising a log of all the processes followed, describing the small analytic leaps contributing to the analysis as a whole” (p. 140). For this validation strategy, we engage in two processes: First, we create a tracking document at the beginning of a study on which we detail our key decisions including rationale and potential consequences. Second, when resources permit, we use an auditor to review our process and findings.
Generating a rich, thick description. The researcher allows readers to make decisions regarding transferability because the writer describes in detail the participants or setting under study (Erlandson et al., 1993; Lincoln & Guba, 1985; Merriam & Tisdell, 2015). With such a detailed description, the researcher enables readers to transfer information to other settings and to determine whether the findings can be transferred “because of shared characteristics” (Erlandson et al., 1993, p. 32). Thick description means that the researcher provides details when describing a case or when writing about a theme. According to Stake (2010), “a description is rich if it provided abundant, interconnected details” (p. 49). Detail can emerge through physical description, movement description, and activity description. It can also involve describing from the general ideas to the narrow, interconnecting the details, and using strong action verbs and quotes. For this validation strategy, we devote time to revisiting our raw data soon after its collection to add further description that might be helpful during the analysis—for example, contextual descriptions like atmosphere. See Chan (2010; Appendix A) for her description of how she was able to generate such a detailed narrative story about an individual’s lived experiences from her varied interactions over 2 years that took place at home, at school, with those around her, and the use of extensive field notes.
Having a peer review or debriefing of the data and research process. The researcher seeks an external check by “someone who is familiar with the research or the phenomenon explored” (Creswell & Miller, 2000, p. 129), in much the same spirit as interrater reliability in quantitative research (Ely et al., 1991; Erlandson et al., 1993; Glesne, 2016; Lincoln & Guba, 1985; Merriam & Tisdell, 2015). Lincoln and Guba (1985) defined the role of the peer debriefer as a “devil’s advocate,” an individual who kept the researcher honest; asked hard questions about methods, meanings, and interpretations; and provided the researcher with the opportunity for catharsis by sympathetically listening to the researcher’s feelings. For this validation strategy, we involve colleagues and students as reviewers (and for our students we play this role) in what Lincoln and Guba (1985) called peer debriefing sessions during which both the reviewers and the researcher keep written accounts of the sessions.
Examining these nine validity strategies as a whole (just discussed and outlined in Figure 10.1), we advise that researchers engage in at least two of them in any given qualitative study. Unquestionably, procedures such as triangulating among different data sources (assuming that the investigator collects more than one), writing with detailed and thick description, and taking the entire written narrative back to participants in member checking all are reasonably easy procedures to conduct. They also are the most popular and cost-effective procedures. Other procedures, such as peer audits and external audits, are more time-consuming in their application and may also involve substantial costs to the researcher. We also point out that differences among the validity lenses (i.e., researchers, participants, readers, and reviewers) can be attributed to the philosophical orientation of the researcher and may therefore impact their use of specific validation strategies (see Creswell & Báez, 2021, for further discussion).
Reliability Perspectives and Procedures
Reliability can be addressed in qualitative research in several ways (Bazeley, 2021). Richards (2021) described the need for conveying to readers transparent details about the data-handling processes. This information assured the reliable examination of data records and consistent application of codes and categories. Reliability can be enhanced if the researcher kept detailed field notes and checked the accuracy and completeness of transcriptions of recordings. This completeness included mentioning the trivial, but often crucial, pauses and overlaps. Further coding checks can be done by researchers using a codebook (possibly held by individuals not previously involved in the study). Bazeley (2021) emphasized the need for training a research team and documenting coding and the reliability of coding through computer QDAS programs.
Our focus on reliability here will be on intercoder agreement based on the use of multiple coders to analyze transcript data. In qualitative research, reliability often refers to the stability of responses by multiple coders of data sets. It is important to develop codes and assess the reliability among coders as part of the analysis process (Kuckartz, & Rädiker, 2023; Richards & Morse, 2013; Saldaña, 2021). Saldaña (2021) described intercoder agreement as providing a “crowd-sourcing reality check” for each other in the data analysis process (p. 53).
We find this practice especially used in qualitative health science research and within the form of qualitative research in which inquirers want an external check on the highly interpretive coding process. What seems to be largely missing in the literature (with the exception of Armstrong et al., 1997; Bernard, 2017; Campbell et al., 2013; Miles & Huberman, 1994; and Miles et al., 2014, 2019) is a discussion about the procedures of actually conducting intercoder agreement checks. One of the key issues is determining what exactly the coders agree on, whether they seek agreement on code names, the coded passages, or the same passages coded the same way. We also need to decide on whether researchers seek agreement based on codes, themes, or both codes and themes (see Armstrong et al., 1997). Finally, we need to carefully interpret our findings—as Richards (2021) wisely advised, we cannot expect to find complete consistency in coding over time or between coders. Undoubtedly, there is flexibility in the process, and researchers need to fashion an approach consistent with the resources and time to engage in coding.
Drawing upon our experiences working with multiple coders, we propose the following procedures for assessing intercoder agreement in coding of qualitative research (see Figure 10.2):
Establish a common platform for coding and develop a preliminary code list. The researchers decide which software program or paper-based methods they will use; a common platform is essential to be able to easily share the results of their initial read and preliminary coding efforts. In our recent work, we have often used computer-assisted qualitative data analysis software packages (MAXQDA, ATLAS.ti, or NVivo, depending on familiarity across researchers) and have implemented training sessions at the beginning of the study. Each researcher then reads several transcripts independently and develops a list of preliminary codes. As discussed in Chapter 8, computer software programs have features that facilitate the creation of these lists.
Develop and share the initial codebook among coders. The researchers develop a shared understanding of codes to create a codebook that is stable and represents the coding analysis of four independent coders. To do this, after coding, say, three transcripts (i.e., A, B, and C), we then meet and examine the codes, their names, and the text segments that each researcher codes. We begin developing an initial codebook of the major codes. This codebook contains a definition of each code and the text segments that we assigned to each code. In this initial codebook, we had main codes and subcodes. In this initial codebook, we focus on the main codes we were finding in the data rather than forming an exhaustive list. Additional codes are added as the analyses proceed.
Apply the codebook to additional transcripts and compare coding across multiple researchers. The researchers independently apply the shared codebook to additional transcripts and then compare their coding to assess consistency. To do this, the researchers had to agree upon a type of data segment to compare, whether it be a phrase, sentence, or paragraph, and then each researcher independently codes three additional transcripts (i.e., D, E, and F). With no clear guidance in the literature as to what the appropriate data segment should be for coding, consensus can be difficult to reach (Hruschka et al., 2004). To streamline the process in their exploratory study, Campbell and colleagues (2013) suggested having the lead researcher determine the type of data segments to be coded. We feel it is more important to have agreement on the type of data segments we were assigning to codes than to have the same, exact passages coded. The decision about the type of data segment is crucial because it becomes the basis on which intercoder agreement is defined, when we are ready to actually compare our coding. Thus, intercoder agreement means that we agree that when we assign a code word to a passage, we all assign this same code word to the passage. It does not mean that we all code the same passages—an ideal that we believe would be hard to achieve because some people code short passages and others longer passages. Nor does it mean that we all bracket the exact same lines for our code word, another ideal difficult to achieve.
Assess and report intercoder agreement among researchers. The researchers define individual instances of intercoder agreement before assessing overall intercoder agreement. In our work, we tend to take a realistic stance, look at the passages that are coded across researchers, and ask ourselves whether we all assign the same code word to the passage based on our tentative definitions in the codebook. The decision would be either a yes or a no, and we could calculate the percentage of agreement among all researchers on this passage that we all code. We seek to establish a high level of agreement on coding of these passages informed by an 80%-to-90% range that seems a minimal benchmark by many (Miles & Huberman, 1994; Saldaña, 2021). This range is similar to the recommendation by Miles et al. (2014, 2019). We suggest keeping in mind that there is no agreed-upon standard to indicate a high level of intercoder agreement and that the size and range of the coding scheme means it can be difficult (or even impossible) for researchers to reach the 80%-to-90% range of agreement. Many QDAS packages include features to facilitate the calculation among multiple coders. Other researchers might actually calculate a kappa reliability statistic on the agreement, but we feel that a percentage is sufficient to report in our published studies (see also Creswell & Báez, 2021, for practical guidance about three different approaches to intercoder agreement).
Revise and finalize the codebook to inform further coding. The researchers review and refine the codebook to further differentiate code definitions. In our work, after the process continues through several more transcripts, we then revise the codebook and conduct anew an assessment of passages that all researchers code to determine if we apply the same or different codes. With each phase in the intercoder agreement process, we hope to achieve a higher percentage of agreed-upon codes and themes for data segments. Then we can collapse codes into broader themes and can conduct the same process with themes, to see if the passages all coded as themes by the researchers were consistent in the use of the same theme.
Figure 10.2 Procedures for Reliability of Intercoder Agreement in Qualitative Research
EVALUATION CRITERIA FOR QUALITATIVE RESEARCH
In reviewing validation in the qualitative research literature, we are struck by how validation is sometimes used in discussing the quality of a study (e.g., Angen, 2000). Although validation is certainly an aspect of evaluating the quality of a study, other criteria are also useful. In reviewing evaluation criteria, we find that here, too, the standards vary within the qualitative community (see Creswell & Guetterman, 2019, who contrast three approaches to qualitative evaluation). To discuss evaluation, we reprise the features of a “good” qualitative study introduced in Chapter 3, and review three standards (i.e., general standards, interpretive and postmodern standards, and publication standards), and then turn to specific criteria within each of our five approaches to qualitative research.
In Figure 10.3, we present the criteria of a “good” qualitative study presented earlier in our Chapter 3 discussion about the characteristics of qualitative research. Also in Figure 10.3, we reference key content covered in different chapters of this book for each of the “good” features. In this way, readers can review detailed content for each feature.
Figure 10.3 Reprising the Features of a “Good” Qualitative Study
Other writers provide additional thoughts about helpful evaluation criteria. A methodological perspective comes from Howe and Eisenhardt (1990), who suggested that only broad, abstract standards are possible for qualitative (and quantitative) research. Moreover, to determine, for example, whether a study is a good ethnography cannot be answered apart from whether the study contributes to our understanding of important questions. Silverman (2022) advanced four criteria that good research must satisfy. The following five standards are adapted from Howe and Eisenhardt (1990) and are put in question form for researchers to apply to their work:
Does the research question drive the data collection and analysis? (That is, rather than the reverse?) Silverman (2022) states similar criteria about good research using “methods which are demonstrably appropriate to the research problem” (p. 425).
To what extent are the data collection and analysis techniques competently applied? (That is, in a technical sense?) Silverman (2022) states similar criteria about good research developing “empirically sound, reliable, and valid findings” (p. 425).
Are the researcher’s assumptions made explicit? (That is, such as the researcher’s own subjectivity?)
Does the study have overall warrant? (That is, is it robust, does it use respected theoretical explanations, and does it discuss disconfirmed theoretical explanations?) Silverman (2022) states similar criteria about good research thinking “theoretically through and with data” (p. 425).
Does the study have value both in informing and improving practice in an ethical manner? (That is, does it contribute to the “so what?” question and is the study conducted ethically? Are participants protected from harm by taking all measures to protect the confidentiality, privacy, and truth telling of participants?) Silverman (2022) states similar criteria about good research contributing “where possible, to practice and policy” (p. 425).
A postmodern, interpretive framework forms a second perspective. Lincoln (1995) thought about the quality issue in terms of emerging criteria. She tracked her own thinking (and that of her late colleague, Guba). Her thinking extended from early approaches of developing parallel methodological criteria (Lincoln & Guba, 1985) to establishing the criteria of “fairness” (a balance of stakeholder views), to sharing knowledge and fostering social action (Guba & Lincoln, 1989), and on to her current stance. Her new emerging approach to quality was based on three new commitments: emergent relations with respondents, a set of stances, and a vision of research that enables and promotes justice. Based on these commitments, Lincoln (1995) then proceeded to identify several standards of quality:
The standard is set in the inquiry community, such as by guidelines for publication. These guidelines admit that within diverse approaches to research, inquiry communities have developed their own traditions of rigor, communication, and ways of working toward consensus. These guidelines, she also maintains, serve to exclude legitimate research knowledge and social science researchers.
The standard of positionality guides interpretive or qualitative research. Drawing on those concerned about standpoint epistemology, this means that the text should display honesty or authenticity about its own stance and about the position of the author.
Another standard is under the rubric of community. This standard acknowledges that all research takes place in, is addressed to, and serves the purposes of the community in which it is carried out. Such communities might be feminist thought, Black scholarship, Native American studies, or ecological studies.
Interpretive or qualitative research must give voice to participants so that their voice is not silenced, disengaged, or marginalized. Moreover, this standard requires that alternative or multiple voices be heard in a text.
Critical subjectivity as a standard means that the researcher needs to have heightened self-awareness in the research process and create personal and social transformation.
This “high-quality awareness” enables the researcher to understand his or her psychological and emotional states before, during, and after the research experience.
High-quality interpretive or qualitative research involves reciprocity between the researcher and those being researched. This standard requires that intense sharing, trust, and mutuality exist.
The researcher should respect the sacredness of relationships in the research-to-action continuum. This standard means that the researcher respects the collaborative and egalitarian aspects of research and “make[s] spaces for the lifeways of others” (Lincoln, 1995, p. 284).
Sharing of the privileges acknowledges that in good qualitative research, researchers share their rewards with persons whose lives they portray. This sharing may be in the form of royalties from books or the sharing of rights to publication.
A final perspective uses interpretive standards for conducting qualitative research. Drawing from their own experience reviewing papers or monographs submitted for social science publication, Richardson (in Richardson & St. Pierre, 2018) suggests the following criteria:
Substantive contribution. Does this piece contribute to our understanding of social life? Does the writer demonstrate a deeply grounded (if embedded) social scientific perspective? Does this piece seem “true”—a credible account of a cultural, social, individual, or communal sense of the “real”?
Aesthetic merit. Does this piece succeed aesthetically? Does the use of creative analytical practices open up the text and invite interpretive responses? Is the text artistically shaped, satisfying, complex, and not boring?
Reflexivity. How has the author’s subjectivity been both a producer and a product of this text? Is there adequate self-awareness and self-exposure for the reader to make judgments about the point of view? Does the author hold himself or herself accountable to the standards of knowing and telling of the people he or she has studied?
Impact. Does this piece affect me emotionally or intellectually? Does it generate new questions or move me to write? Does it move me to try new research practices or move me to action? (p. 823)
As applied research methodologists, we prefer the methodological standards of evaluation, but we can also support the postmodern and interpretive perspectives. We also agree with Flick (2014), who stated, “the problem of how to assess qualitative research has not yet been solved” (p. 480). Still, we return to our criteria for a “good” qualitative study as mentioned in Figure 10.3. Our features highlight the choice of a methodological approach (i.e., one of the five approaches mentioned in this book) and strong methods for data collection and analysis in a qualitative study. Next we discuss specific evaluative criteria for each of the five approaches.
EVALUATION CRITERIA SPECIFIC TO EACH OF THE FIVE APPROACHES
In the section that follows, we provide practical guidance of evaluation criteria specific to each of the five approaches of qualitative inquiry. What standards of evaluation, beyond those already mentioned, would signal a high-quality narrative study, a phenomenology, a grounded theory study, an ethnography, and a case study?
Narrative Research
In discussing what makes for a good narrative study, both Riessman (2008) and Clandinin (2013, 2023) pointed to seeking coherence of participants’ narratives yet acknowledged that it may not always be possible. To that end, Riessman (2008) proposed questions for assessing coherence (p. 189):
Do episodes of a life story hang together?
Are sections of a theoretical argument linked and consistent?
Are there major gaps and inconsistencies?
Is the interpreter’s analytic account persuasive?
In 2013, Clandinin proposed a way for “judging and responding to narrative inquiries” (p. 211) by describing what she and Vera Caine defined as touchstones:
Clandinin (2023, p. 146) listed the following 12 touchstones and considered them as evolving (see also Clandinin & Caine, 2013, for in-depth touchstone descriptions):
1. Relational responsibilities
2. In the midst
3. Negotiation of relationships
4. Narrative beginnings
5. Negotiating entry to the field
6. Moving from field to field texts
7. Moving from field texts to interim and final research texts
8. Representing narratives of experience in ways that show temporality, sociality, and place
9. Relational response communities
10. Justifications—personal, practical, social
11. Attentive to multiple audiences
12. Commitment to understanding lives in motion
When writing an interpretive biography, Denzin (1989) was primarily interested in the problem of “how to locate and interpret the subject in biographical materials” (p. 26). Denzin (1989) advanced several guidelines for writing:
Thus, within a humanistic, interpretive stance, Denzin (2001) identified “criteria of interpretation” as a standard for judging the quality of a biography. These criteria were based on respecting the researcher’s perspective as well as thick description. Denzin (2001) advocated for the ability of the researcher to illuminate the phenomenon in a thickly contextualized manner (i.e., thick description of developed context) so as to reveal the historical, processual, and interactional features of the experience. Also, the researcher’s interpretation engulfed what was learned about the phenomenon and incorporated prior understandings while always remaining incomplete and unfinished.
This focus on interpretation and thick description was in contrast to criteria established within the more traditional approach to biographical writing. For example, Plummer (1983) asserted three sets of questions for a good life history study:
Is the individual representative? Edel (1984) asked a similar question: How has the biographer distinguished between reliable and unreliable witnesses?
What are the sources of bias (about the participant, the researcher, and the participant–researcher interaction)? Or, as Edel (1984) questions, how has the researcher avoided making himself or herself simply the voice of the subject?
Is the account valid when subjects are asked to read it, when it is compared to official records, and when it is compared to accounts from other participants?
In Figure 10.4, we advance five criteria for guiding what we would look for in a narrative study.
Figure 10.4 Guiding Evaluative Criteria for a Narrative Study
Phenomenological Research
What criteria should be used to judge the quality of a phenomenological study? From the many readings about phenomenology, one can infer criteria from the discussions about steps (Giorgi, 1985) or the “core facets” of transcendental phenomenology (Moustakas, 1994, p. 58). We have found direct discussions of the criteria to be missing, but perhaps Polkinghorne’s (1989) discussion of whether the findings are “valid” (p. 57) and van Manen’s (2014) outline of validation and evaluative criteria come the closest in our readings.
For Polkinghorne, validation referred to the notion that an idea was well grounded and well supported. He asked, “Does the general structural description provide an accurate portrait of the common features and structural connections that are manifest in the examples collected?” (Polkinghorne, 1989, p. 57). He then proceeded to identify five questions that researchers might ask themselves:
Did the interviewer influence the contents of the participants’ descriptions in such a way that the descriptions do not truly reflect the participants’ actual experience?
Is the transcription accurate, and does it convey the meaning of the oral presentation in the interview?
In the analysis of the transcriptions, were there conclusions other than those offered by the researcher that could have been derived? Has the researcher identified these alternatives?
Is it possible to go from the general structural description to the transcriptions and to account for the specific contents and connections in the original examples of the experience?
Is the structural description situation specific, or does it hold in general for the experience in other situations? (Polkinghorne, 1989)
Van Manen (2014) pointed to questions as a way to “test [a phenomenology’s] level of validity” (p. 350).
Is the study based on a valid phenomenological question? In other words, does the study ask, “What is this human experience like?” “How is this or that phenomenon or event experienced?” A phenomenological question should not be confused with empirical studies of a particular population, person(s), or group of people at a particular time and location. Also, phenomenology cannot deal with causal questions or theoretical explanations. However, a particular individual or group may be studied for the understanding of a phenomenological theme—such as a gender phenomenon, a sociopolitical event, or the experience of a human disaster.
Is the analysis performed on experientially descriptive accounts, transcripts? (Does the analysis avoid empirical material that mostly consists of perceptions, opinions, beliefs, views, and so on?)
Is the study properly rooted in primary and scholarly phenomenological literature—rather than mostly relying on questionable secondary and tertiary sources?
Does the study avoid trying to legitimate itself with validation criteria derived from sources that are concerned with other (non-phenomenological) methodologies (van Manen, 2014, pp. 350–351)?
Van Manen (2014) also provided criteria for evaluative appraisal of phenomenological studies.
Heuristic questioning: Does the text induce a sense of contemplative wonder and questioning attentiveness—ti estin (the wonder [of] what this is) and hoti estin (the wonder that something exists at all)?
Descriptive richness: Does the text contain rich and recognizable experiential material?
Interpretive depth: Does the text offer reflective insights that go beyond the taken-for-granted understandings of everyday life?
Distinctive rigor: Does the text remain constantly guided by a self-critical question of distinct meaning of the phenomenon or event?
Strong and addressive meaning: Does the text “speak” to and address our sense of embodied meaning?
Experiential awakening: Does the text awaken prereflective or primal experience through vocative and presentative language?
Inceptual epiphany: Does the study offer us the possibility of deeper and original insight, and perhaps, an intuitive or inspirited grasp of the ethics and ethos of life commitments and practices? (pp. 355–356)
In Figure 10.5, we advance our five standards for assessing the quality of a phenomenology.
Figure 10.5 Standards for Assessing the Quality of a Phenomenology
Grounded Theory Research
Strauss and Corbin (1990) identified criteria by which one judges the quality of a grounded theory study. They described seven criteria related to the general research process and six criteria related to the empirical grounding of a study. Corbin and Strauss (2015) advanced the term checkpoint in place of criteria, saying they “dislike using the word criteria because that makes the evaluative process seem so dogmatic, an ‘all [or] nothing’ approach to evaluation” (p. 350, emphasis in original).
Corbin and Strauss (2015) described 16 checkpoints for guiding researchers and reviewers in evaluating the “methodoloical consistency” of a grounded theory study:
1. What was the target sample population? How was the original sample selected?
2. How did sampling proceed? What kinds of data were collected? Were there multiple sources of data and multiple comparative groups?
3. Did data collection alternate with analysis?
4. Were ethical considerations taken into account in both data collection and analysis?
5. Were the concepts driving the data collection arrived at through analysis (based on theoretical sampling), or were concepts derived from the literature and established before the data were collected not true theoretical sampling)?
6. Was theoretical sampling used, and was there a description of how it proceeded?
7. Did the research demonstrate sensitivity to the participants and to the data?
8. Is there evidence or examples of memos?
9. At what point did data collection end or a discussion of saturation end?
10. Is there a description of how coding proceeded along with examples of theoretical sampling, concepts, categories, and statements of relations? What were some of the events, incidents, or actions (indicators) that pointed to some of these major categories?
11. Is there a core category, and is there a description of how that core category was arrived at?
12. Were there changes in design as the research went along based on findings?
13. Did the researcher(s) encounter any problems while doing the research? Is there any mention of a negative case, and how was that data handled?
14. Are methodological decisions made clear so that readers can judge their appropriateness for gathering data (theoretical sampling) and doing analysis?
15. Was there feedback on the findings from other professionals and from participants? And were changes made in the theory based on this feedback?
16. Did the researcher keep a research journal or notebook? (Corbin & Strauss, 2015, pp. 350–351)
Corbin and Strauss (2015) also advanced 17 checkpoints for researchers and reviewers to evaluate the “quality and applicability” of a grounded theory study:
1. What is the core category, and how do the major categories relate to it? Is there a diagram depicting these relationship?
2. Is the core category sufficiently broad so that it can be used to study other populations and similar situations beyond this setting?
3. Are each of the categories developed in terms of their properties and dimensions so that they show depth, breadth, and variation?
4. Is there descriptive data given under each category that brings that theory to life so that it provides understanding and can be used in a variety of situations?
5. Has context been identified and integrated into the theory? Conditions and consequences should be listed merely as background information in a separate section but woven into the actual analysis with explanations of how they impact and flow from action–interaction in the data.
6. Has process been incorporated into the theory in the form of changes in action–interaction in relationship to changes in conditions? Is action–interaction matched to different situations, demonstrating how the theory might vary under different conditions and therefore be applied to different situations?
7. How is saturation explained, and when and how was it determined that categories were saturated?
8. Do the findings resonate or fit with the experience of both the professionals for whom the research ended and the participants who took part in the study? Can participants see themselves in the story even if not every detail applies to them?
9. Are there gaps, or missing links, in the theory, leaving the reader confused and with a sense that something is missing?
10. Is there an account of extremes or negative cases?
11. Is variation built into the theory?
12. Are the findings presented in a creative and innovative manner? Does the research say something new or put old ideas together in new ways?
13. Do findings give insight into situation and provide knowledge that can be applied to develop policy, change practice, and add to the knowledge base of a profession?
14. Do the theoretical findings seem significant, and to what extent? It is entirely possible to complete a theory-generating study, or any research investigation, yet not produce findings that are significant.
15. Do the findings have the potential to become part of the discussion and ideas exchanged among relevant social and professional groups?
16. Are the limitations of the study clearly spelled out?
17. Are there suggestions for practice, policy, teaching, and application of the research? (Corbin & Strauss, 2015, pp. 351–352)
Charmaz (2014) reflected on the quality of the theory developed in a grounded theory study. She suggested that grounded theorists look at their theory and ask themselves the following evaluative questions:
Charmaz (2014) reflected on the quality of the theory developed in a grounded theory study. She suggested that grounded theorists look at their theory and ask themselves the following evaluative questions:
Are the definitions of major categories complete?
Have I raised major categories to concepts in my theory?
How have I increased the scope and depth of the analysis in this draft?
Have I established strong theoretical links between categories and between categories
and their properties, in addition to the data?
How have I increased understanding of the studied phenomenon?
How does my grounded theory study make a fresh contribution?
With which theoretical, substantive, or practical problems is this analysis most closely aligned? Which audiences might be most interested in it? Where shall I go with it?
What implications does this analysis hold for theoretical reach, depth, and breadth? For methods? For substantive knowledge? For actions or interventions? (Charmaz, 2014, pp. 337–338)
See also Charmaz (2014, pp. 337–338) for guiding questions for assessing criteria for grounded theory studies organized by four categories: credibility, originality, resonance, and usefulness.
In Figure 10.6, we describe features of the general process and a relationship among the concepts we look for when evaluating a grounded theory study.
Figure 10.6 Features for Evaluating a Grounded Theory Study
Ethnographic Research
Few ethnographic writers identify criteria for quality ethnographies. Instead, the preference, it seems, for ethnographers is to describe the “basics” of ethnographical studies as prolonged fieldwork that generate thick and contextual descriptions reflective of triangulating multiple data sources (Fetterman, 2010, 2019; Wolcott, 2008a, 2010). An exception is Spindler and Spindler (1987).
The ethnographers Spindler and Spindler (1987) emphasized that the most important requirement for an ethnographic approach was to explain behavior from the “native’s point of view” (p. 20) and to be systematic in recording this information using note taking, tape recorders, and cameras. This required that the ethnographer be present in the situation and engage in constant interaction between observation and interviews. These points were reinforced in Spindler and Spindler’s (1987) nine criteria for a “good ethnography”:
Criterion I. Observations are contextualized.
Criterion II. Hypotheses emerge in situ as the study goes on.
Criterion III. Observation is prolonged and repetitive.
Criterion IV. Through interviews, observations, and other eliciting procedures, the native view of reality is obtained.
Criterion V. Ethnographers elicit knowledge from informant-participants in a systematic fashion.
Criterion VI. Instruments, codes, schedules, questionnaires, agenda for interviews, and so forth are generated in situ as a result of inquiry.
Criterion VII. A transcultural, comparative perspective is frequently an unstated assumption.
Criterion VIII. The ethnographer makes explicit what is implicit and tacit to informants.
Criterion IX. The ethnographic interviewer must not predetermine responses by the kinds of questions asked. (p. 18)
This list, grounded in fieldwork, leads to a strong ethnography. Moreover, as Lofland (1974) contended, the study was located in wide conceptual frameworks; presented the novel but not necessarily new; provided evidence for the framework(s); was endowed with concrete, eventful interactional events, incidents, occurrences, episodes, anecdotes, scenes, and happenings without being “hyper-eventful”; and showed an interplay between the concrete and analytical and the empirical and theoretical.
In Figure 10.7, we advance seven evaluative criteria for an ethnography.
Figure 10.7 Criteria for Evaluating an Ethnography
Case Study Research
Yin (2017) reflected on the quality of the description presented in a case study. He described several characteristics for an exemplary case study.
Significant: Has the researcher focused the case(s) as “unusual and of general public interest” or underlying issues as “nationally important—either in theoretical terms or in policy or practical terms” (p. 242)?
Complete: Has the researcher clearly defined the case(s) boundaries, collected extensive evidence, and conducted the study absent “of certain artefactual conditions” (p. 244)—for example, if a time or resource constraint unwittingly ended the study?
Consider alternative perspectives: Has the researcher considered rival propositions and sought to collect evidence from differing perspectives in the case(s)?
Display sufficient evidence: Has the researcher reported the case(s) in such a way that a reader can “reach an independent judgment regarding the merits” (p. 246)?
Composed in an engaging manner: Has the researcher presented the case(s) in a way that communicates “the results widely”—either in writing or performance (p. 247)?
G. Thomas (2021) asked case study researchers to consider the following three questions:
1. How well has the case been chosen? Thomas’s discussions of quality are aptly focused on appropriate sampling and procedural decisions.
2. How well has the context for the study been explained and justified? Thomas pointed to the need for detailed contextual descriptions to be able to assess decisions justifying how analysis has been undertaken.
3. How well have arguments been made? Thomas explained the role of rival explanations to provide “space to think about alternative storylines for the same plot” (p. 81).
Stake (1995) described evaluative criteria for a case study using a rather extensive “critique checklist.” In so doing, Stake (1995) shared criteria for assessing a good case study report:
Is the report easy to read?
Does it fit together, each sentence contributing to the whole?
Does the report have a conceptual structure (i.e., themes or issues)?
Are its issues developed in a serious and scholarly way?
Is the case adequately defined?
Is there a sense of story to the presentation?
Is the reader provided some vicarious experience?
Have quotations been used effectively?
Are headings, figures, artifacts, appendixes, and indexes used effectively?
Was it edited well—and then again with a last-minute polish?
Has the writer made sound assertions, neither over- nor misinterpreting?
Has adequate attention been paid to various contexts?
Were sufficient raw data presented?
Were data sources well-chosen and in sufficient number?
Do observations and interpretations appear to have been triangulated?
Is the role and point of view of the researcher nicely apparent?
Is the nature of the intended audience apparent?
Is empathy shown for all sides?
Are personal intentions examined?
Does it appear that individuals were put at risk? (p. 131)
In Figure 10.8, we describe our criteria for evaluating a case study.
Figure 10.8 Evaluative Criteria for a Case Study
COMPARING EVALUATION STANDARDS ACROSS THE FIVE APPROACHES
The standards discussed for each approach differ slightly depending on the procedures of the approaches. Certainly, less is mentioned about narrative research and its standards of quality, and more is available about the other approaches. From within the major books used for each approach, we have attempted to extract the evaluation standards recommended for their approach to research. To these, we have added our own standards that we use in our qualitative classes when we evaluate a project or study presented within each of the five approaches. We can compare these standards across the five approaches relating five dimensions summarized in Table 10.2. At the most fundamental level, the five differ in the focus of the study. A couple of potential similarities should be noted. Phenomenology, grounded theory, and ethnography typically focus the study on a singular phenomenon, process (or action or interaction), and culture-sharing group, respectively. One may also focus narrative research or a case study on a single individual or case but it also possible to focus on two or three individuals or multiple cases. The study procedures offer distinguishing features for each approach in which some research features the collection of stories (i.e., narrative), specification of cultural themes (i.e., ethnography), and selection of cases (i.e., case study). Others such as phenomenology are distinguishable by initial conveyance of an understanding of the philosophical tenets or integration of analysis into data collection activities and use of memoing (i.e., grounded theory). When examining a study, the approach can be identifiable by its presentation and outcomes of a story chronology for narrative research, theoretical diagramming for grounded theory research, explanation of how a culture-sharing group works for ethnography, and the assertions for a case study. Common across all the qualitative approaches is the use of reflexive and self-disclosing practices for embedding what the researcher brings to the study.
Table 10.2 Comparing the Evaluation Standards Across the Five Qualitative Approaches
SUMMARY
In this chapter, we discussed validation, reliability, and standards of evaluation in qualitative research. Validation approaches vary considerably, such as strategies that emphasize using qualitative terms comparable to quantitative terms, the use of distinct terms, perspectives from postmodern and interpretive lenses, syntheses of different perspectives, descriptions based on metaphorical images, or some combination of these perspectives on validity. Reliability is used in qualitative research in several ways, one of the most popular being the use of intercoder agreements when multiple coders analyze and then compare their code segments to establish the reliability of the data analysis process. A detailed procedure for establishing intercoder agreement is described in this chapter. Also, diverse standards exist for evaluating the quality of qualitative research, and these criteria are based on methods perspectives, general research perspectives, postmodern perspectives, and interpretive perspectives. Within each of the five approaches to inquiry, specific standards also exist; these were reviewed in this chapter. Finally, we advanced standards that we use to assess the quality of studies presented in each approach and we compare the evaluating standards across the five approaches.
Descriptions of Images and Figures
In the center lies the “Strategies for Validation” which is surrounded by other methods in a circular sequence such as “Corroborating evidence through triangulation, Discovering a negative case analysis or disconfirming evidence, and Clarifying researcher bias or engaging in reflexivity” comes under “Researcher’s Lens.” The “Member checking or seeking participant feedback, Having a prolonged engagement and persistent observation in the field and Collaborating with participants” comes under “Participant’s Lens.” The “Enabling external audits, Generating a rich, thick description, and Having a peer review or debriefing of the data and research process” comes under “Reader’s or Reviewer’s Lens.”
The flowchart includes from top to bottom:
Establish a common platform for coding and develop a preliminary code list.
Define and share the initial codebook among coders.
Apply the codebook to additional transcripts and compare coding across multiple researchers.
Assess and report intercoder agreement among researchers.
Revise and finalize the codebook to inform further coding.
Does the qualitative study do the following?
Meet the assumptions and distinguishing features of the qualitative approach to research?
• The author provides evidence of the defining characteristics described in Table 1.1.
Provide evidence of an ethical study?
• The author provides evidence of having anticipated and addressed the ethical issues described in Tables 3.1, 7.2, 8.3, and 9.1.
Use procedures of a recognizable approach to inquiry?
• The author may refer to the procedures described in Figures 4.3 (narrative), 4.5 (phenomenology), 4.7 (grounded theory), 4.9 (ethnography), and 4.11 (case study).
Explore a single focus or concept?
• The author describes the study focus using research purpose statements such as those in Examples 6.1–6.5.
Employ rigorous data collection procedures?
• The author provides evidence of data collection activities described in Figure 7.1 and Table 7.1.
Detail rigorous data analysis methods?
• The author provides evidence of data spiral activities described in Figure 8.1 and Table 8.5.
Present evidence of multiple levels of abstraction?
• The author provides evidence of the validation strategies in Figure 10.1 and Table 10.2.
Convey the reader experience of “being there” persuasively?
• The author provides evidence of writing structures described in Table 9.2.
Present the researcher’s situated role?
• The author provides evidence of their guiding philosophical assumptions and interpretive frameworks described in Tables 2.1, 2.2, and 2.3.
Back to Figure
Does the narrative study do the following?
Focus on an individual?
• The author may focus on one or more individuals.
Collect stories about a significant issue?
• The author may focus on the stories told by the individual or individuals.
Develop a chronology?
• The author may use a chronology to connect different phases or aspects of a story.
Tell a story?
• The author may, through the story, report what was said (themes), how it was said (unfolding story), or how speakers interact or perform the narrative.
Embed reflexivity?
• The author may use reflexive thinking and writing to bring himself or herself into the study.
Back to Figure
Does the phenomenology do the following?
Articulate a clear “phenomenon” to study in a concise way?
• The author may use a phenomenological question to guide the study.
Convey an understanding of the philosophical tenets of phenomenology?
• The author may ground the study in primary and scholarly phenomenological literature.
Use procedures of data analysis in phenomenology?
• The author may refer to procedures recommended by Moustakas (1994) or van Manen (1990, 2023).
Communicate the overall essence of the experience of the participants?
• The author may describe the context in which the experience occurred.
Embed reflexivity throughout the study?
• The author may explain the process and outcomes of reflexive thinking.
Back to Figure
Does the grounded theory study do the following?
Focus on the study of a process, an action, or an interaction as the key element in the theory?
• The author may focus on the steps that unfold when studying a central phenomenon as a process, action, or interaction among individuals.
Integrate a coding process that works from the data to a larger theoretical model?
• The author may describe the data collection as alternating with data analysis to build a theoretical model.
Present the theoretical model in a figure or diagram?
• The author may use innovative means for presenting the theory in a creative and innovative manner.
Advance a story line or proposition connected with the categories in the theoretical model that presents further questions to be answered?
• The author may refer to the overall picture emerging in the current study as a springboard for future directions of research.
Use memoing through the process of research?
• The author may describe the different types of memos or ways of recording emerging ideas throughout the process of conducting the study.
Embed evidence of reflexivity for self-disclosure by the researcher about their study “positioning”?
• The author may describe how a research journal documented their reflexive thinking during the study.
Does the ethnography do the following?
Convey evidence of clear identification of a culture-sharing group?
• The author may describe the group in some detail—how the group was selected, how access to the group was facilitated including any involvement of gatekeepers, how the group interacts, how the group communicates, and so forth.
Specify a cultural theme that will be examined in light of this culture-sharing group?
• The author may identify a cultural theme and the rationale for choosing it.
Describe the cultural group in detail?
• The author may use creative analytical practices to convey these descriptions.
Communicate themes derived from an understanding of the cultural group?
• The author may organize a thematic narrative or tale.
Identify issues that arose during fieldwork that reflect on the relationship between the researcher and the participants, the interpretive nature of reporting, and sensitivity and reciprocity in the co-creating of the account?
• The author may describe a working set of rules or generalizations as to how the culture sharing group functions.
Explain how the culture-sharing group works overall?
• The author may describe a working set of rules or generalizations as to how the culture sharing group functions.
Integrate self-disclosure and reflexivity by the researcher about their position in the research?
• The author may describe their background experiences with the group and describe their reflections about interactions with the group.
Does the case study do the following?
Identify the case studied?
• The author may identify the boundaries and time parameters of a single case or multiple cases.
Present a rationale for the case(s) selection?
• The author may identify rationales for selecting the case, such as the intent to understand a research issue or describe intrinsic merit.
Describe the case(s) in detail?
• The author may begin with a rich description of the case(s) and its setting or context(s).
Articulate the themes identified for the case(s)?
• The author may focus on a few key thematic issues for an individual case(s) or across cases.
Report assertions or generalizations from the case analysis?
• The author may interpret how the case(s) provides insight into the issue, or the findings can be generalized to other cases. Sometimes this may take the form of a summary statement or a vignette.
Embed researcher self-disclosure about their position in the study?
• The author may use reflexive practices and writing throughout the study