Application 2 – Annotated Bibliography

tchyar

ComplementarymethodsofsystemusabilityevaluationSurveysandobservationsduringsoftwaredesignanddevelopmentcycles.pdf

Home >Information Systems homework help >Application 2 – Annotated Bibliography

Journal of Biomedical Informatics 43 (2010) 782–790

Contents lists available at ScienceDirect

Journal of Biomedical Informatics

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / y j b i n

Complementary methods of system usability evaluation: Surveys and observations during software design and development cycles

Jan Horsky a,b,c,*, Kerry McColgan a, Justine E. Pang a, Andrea J. Melnikas a, Jeffrey A. Linder a,b,c, Jeffrey L. Schnipper a,b,c, Blackford Middleton a,b,c

a Clinical Informatics Research and Development, Partners HealthCare, Boston, USA b Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Boston, USA c Harvard Medical School, Boston, USA

a r t i c l e i n f o a b s t r a c t

Article history: Received 11 December 2009 Available online 26 May 2010

Keywords: Health information technology Clinical information systems Usability evaluations Design and development Adoption of HIT

1532-0464/$ - see front matter � 2010 Elsevier Inc. A doi:10.1016/j.jbi.2010.05.010

* Corresponding author at: Clinical Informatics Partners Healthcare, 93 Worcester St., Wellesley, MA 8771.

E-mail address: [email protected] (J. Horsky).

Poor usability of clinical information systems delays their adoption by clinicians and limits potential improvements to the efficiency and safety of care. Recurring usability evaluations are therefore, integral to the system design process. We compared four methods employed during the development of outpa- tient clinical documentation software: clinician email response, online survey, observations and inter- views. Results suggest that no single method identifies all or most problems. Rather, each approach is optimal for evaluations at a different stage of design and characterizes different usability aspect. Email responses elicited from clinicians and surveys report mostly technical, biomedical, terminology and con- trol problems and are most effective when a working prototype has been completed. Observations of clin- ical work and interviews inform conceptual and workflow-related problems and are best performed early in the cycle. Appropriate use of these methods consistently during development may significantly improve system usability and contribute to higher adoption rates among clinicians and to improved qual- ity of care.

1. Introduction

There is a broad consensus among healthcare researchers, prac- titioners and administrators that although health information technology has the potential to reduce the risk of serious injury to patients in hospitals, significant differences remain among the multitude of electronic health record (EHR) systems with respect to their ability to achieve high safety, quality and effectiveness benchmarks [1–4]. In many instances, the intrinsic potential of EHRs for preventing and mitigating errors continues to be only par- tially realized and some implementations may, paradoxically, ex- pose clinicians to new risks or add extra time to many routine interactions [5,6].

Research evidence and published reports on the successes, fail- ures, best-practices, lessons learned and barriers overcome during implementation efforts have had only limited effect so far on accel- erating the adoption of electronic information systems [7]. Accord- ing to conservative estimates, at least 40% of systems either are abandoned or fail to meet business requirements, and fewer than

ll rights reserved.

Research and Development, 02481, USA. Fax: +1 781 416

40% of large vendor systems meet their stated goals [8]. A recent national study reported that only four percent of physicians used a fully functional, advanced system and that 13% used systems with only basic functions [9].

Transition from paper records to electronic means of informa- tion management is an arduous process at large institutions and private practices alike. It introduces new standards and reshapes familiar practices often in ways unintended or unanticipated by the stakeholders. Clinicians object to forced changes in established workflows and familiar practices, long training times, and exces- sive time spent serving the computer rather than providing care [10,11].

Although the initial decline in efficiency generally improves with increased skills and sufficient time to adjust to new routines [12], systems themselves rarely evolve to better meet the demands and requirements of the clinical processes they need to support. A recent survey found an increase in the availability of EHRs over two years in one state, but the researchers also reported that routine use of ten core functions remained relatively low, with more than one out of five physicians not using each available function regu- larly [13]. An observational study of 88 primary care physicians identified key information management goals, strategies, and tasks in ambulatory practice and found that nearly half were not fully supported by available information technology [14].

http://dx.doi.org/10.1016/j.jbi.2010.05.010

mailto:[email protected]

http://dx.doi.org/10.1016/j.jbi.2010.05.010

http://www.sciencedirect.com/science/journal/15320464

http://www.elsevier.com/locate/yjbin

J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790 783

Developing highly functional, versatile clinical information sys- tems that can be efficiently and conveniently used without exten- sive training periods is predicated on incorporating rigorous and frequent usability evaluations into the design process. Iterative development methodology for graphical interfaces suggests evalu- ating and revising successive prototypes in a cyclical fashion until the product attains required characteristics. There are several com- mon techniques that can be used to perform the evaluations that are either carried out entirely by usability experts or involve the in- put of intended users. Equally important is to see usability evalua- tion as situated within the context of challenges imposed by complex socio-technical systems [15] and within broader concep- tual frameworks for design and evaluation such as those based on the theory of distributed cognition and work-centered research [16].

The broad objective of this study was to compare data gathered by four usability evaluation methods and discuss their respective utility at different stages of the software development process. We hypothesized that no single method would be equally effective in characterizing every aspect of the interface and human interac- tion. Rather, an approach that employs a set of complementary methods would increase their cumulative explanatory value by applying them selectively for specific purposes. Our narrower goal was to formulate recommendations for designers and evaluators of health information systems on the effective use of common usabil- ity inspection methods during the design and development cycle.

This report expands a brief discussion of methods used in the design, pilot testing, and evaluation of the Smart Form in a previ- ous publication [17].

2. Background

The reasons why one system may be preferred over another by clinicians and perform closer to expectations are often complex, vary with local conditions and almost always include financing, leadership, prior experience and training. Among the core predic- tors of quick adoption and successful implementation are the de- sign quality of the graphical user interface and functionality, along with socio-technical factors [7]. Usability has a strong, often direct relationship with clinical productivity, error rate, user fati- gue and user satisfaction that are critical for adoption. The system must be fast and easy to use, and the user interface must behave consistently in all situations [18]. At the same time, the system must support well all relevant clinical tasks so that a clinician working with the computer can achieve higher quality of care. The Healthcare Information and Management Systems Society (HIMSS) considers poor usability characteristics of current infor- mation technology as one of the major factors, and ‘‘possibly the most important factor” hindering its widespread adoption [19].

Historically, developers and designers have failed to tap the experiential expertise of practicing clinicians [20]. The lack of a systematic consideration of how clinical and computing tasks are performed in the situational context of different clinical environ- ments often results in designs that are off the intended mark and fail to deliver improvements in safety and efficiency. For example, in an experiment that examined the interactive behavior of clini- cians entering a visit note, researchers compared the sequence and flow of items on an electronic note form that was implied by the designed structure to actual mouse movements and entry se- quences recorded by a tracking software and found substantial dif- ference between the observed behavior and prior assumptions by the designers [21].

Existing usability studies mainly employ research designs such as expert inspection, simulated experiments, and self-reported user satisfaction surveys. Unfortunately, a large body of research

indicates that self-reports can be a highly unreliable source of data, often context-dependent, and even minor changes in question wording, format or order can profoundly affect the obtained results [22].

While analyses that rely predominantly on a single method may produce incomplete or unreliable results, there is considerable evi- dence of the effectiveness of comprehensive approaches that com- bine two or more methods, as important redesign ideas rarely emerge as sudden insights but may evolve throughout the work process [23,24]. For example, during the development of a decision support system, designers employed field observations, structured interviews, and document analyses to collect and analyze users’ workflow patterns, decision support goals, and preferences regard- ing interactions with the system, performed think-aloud analyses and used the technology acceptance model to direct evaluation of users’ perceptions of the prototype [25]. A careful workflow analysis could lead to the identification of potential breakdown points, such as vulnerabilities in hand-offs, and communication tasks deemed critical could be required to have a traceable elec- tronic receipt acknowledgment [26]. The advantage of informing the design from its conception with close insights into local needs and actual practices the software will support is reflected in the fact that ‘‘home-grown” systems show a higher relative risk reduc- tion than commercial systems [1].

Iterative development of user interfaces involves the steady refinement of the design based on user testing and other evalua- tion methods [27]. The complexity and variability of clinical work requires correspondingly complex information systems that are virtually impossible to design without usability problems in a sin- gle attempt. Experts need to create a situation in which clinicians can instill their knowledge and concern into the design process from the very beginning [28]. Changing or redesigning a software system as complex as an EHR after it has been developed (or imple- mented) is enormously difficult, error-prone, and expensive [29,30]. Iterative evaluations early in the process allow larger con- ceptual revisions and refinements to be done without excessive ef- fort and resources [31].

The software developed, tested and deployed in a pilot program in this study, the Coronary Artery Disease (CAD) and Diabetes Mel- litus (DM) Smart Form (Fig. 1), was a prototype of an application intended to assist clinicians with documenting and managing the care of patients with chronic diseases [17]. Integrated within an outpatient electronic record, it allowed direct access to laboratory and other coded data for expedient entry into new visit notes. The Smart Form also aggregated reviewing of prior notes and labora- tory results to create disease-relevant context for the planning of care, and provided actionable decision support and best-practices recommendations. The anticipated benefit to clinicians includes savings in time required to look up, collect, interpret and record clinical data into a note, and an increase in the quality and com- pleteness of documentation that may contribute to improved pa- tient care.

In the planning stage of the development, two experts, includ- ing a physician, conducted focus groups with approximately 25 physicians who described their usual workflows, methods for acute and chronic disease management, attitudes towards decision support, and their wants and needs, and summarized emerging themes [17].

3. Methods

We have conducted four different studies of usability and hu- man–computer interaction that were intended to collect two types of data: comments elicited directly from clinicians working with the Smart Form, and findings derived from formal evaluations by

Fig. 1. Screenshot of Smart Form.

784 J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790

usability experts. We rigorously maintained distinctions between direct, free-style comments made by clinicians and objective find- ings by usability experts. Comments were always direct expres- sions of clinicians that originated either spontaneously or in response to a question, written or verbal. Findings, on the other hand, were expert opinions and recommendations based on field notes, interviews, focus groups and on direct observation of clini- cians interacting with the Smart Form.

The reason why we chose to count and compare comments and findings instead of actual problems is the uncertainty in determin- ing whether any two or more user reports describe identical prob- lems, as comments may sometimes be vague, too general or without the proper context to match them to unique problems. Since we could not differentiate all problems in a consistent man- ner, we decided to report the comments and findings themselves as approximations to actual problems.

In the first study, clinicians sent their comments by email dur- ing a 3-month pilot period in which they used the module for the documentation of actual visits. Another set of comments, in the second study, were entered in an online survey at the end of the pilot. We also extracted direct quotes of clinicians from tran- scripts of interviews and think-aloud protocols that were com- pleted as parts of usability evaluation in the remaining two studies. The findings, in contrast, were formulated entirely by usability experts as the result of a series of evaluation studies (third and fourth) and published in technical reports.

Each comment and a finding were assigned to a usability heu- ristic category independently by two researchers. The classification scheme was specific to the healthcare domain and its development is described in detail in a section below. The number of comments and findings in each category was compared to assess the descrip- tive power of each data collection method for specific usability characteristic. For example, we would contrast the different pro- portion of comments from each source that contributed to the total number of observations in each category.

The four data collection methods are described in detail below. Think-aloud studies were conducted by a usability expert at our

institution and walkthroughs and evaluations by independent pro- fessional evaluators on contract basis.

3.1. Email via an embedded link

The Smart Form was integrated within the outpatient clinical records system and used by 18 clinicians for 3-months (March to May, 2006) in the course of their regular clinical work to write visit notes for patients with coronary artery disease and diabetes. They had the option of opening a free-text window on their desktops at any time by clicking on a link embedded in the application and typing in their comments. The messages were collected in a data- base and logged with a timestamp and the sender’s name.

3.2. Online survey

Fifteen participants received an email with a link to an online survey in May 2006. Questions about satisfaction, frequency of use and problems had multiple-choice responses and were accom- panied by two open-ended questions, ‘‘What changes could be made to the Smart Form that would make you more likely to use it?” and ‘‘What improvements can be made to the Smart Form be- fore you would recommend it to other clinicians?” Completion was voluntary and rewarded with a $20 gift certificate.

3.3. Think-aloud study and observations

We recruited six primary care physicians and specialists (four women) to participate in usability and interaction studies. Evalua- tions were conducted in the clinicians’ offices at six different clinics and lasted 30–45 min. Subjects were asked to complete a series of interactive tasks described in a previously developed clinical sce- nario. A researcher played the role of a patient during each session to provide a realistic representation of an office visit. Medical his- tory, current medications and the presence of diabetes and CAD were included in a narrative paragraph that was accompanied by

J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790 785

supporting electronic documentation of prior visits, lab results, vi- tals and demographic information in a simulated patient record.

Subjects were instructed to verbalize their thoughts (to think- aloud) as they were completing the tasks and interacting with the Smart Form. Video and audio recordings of each session were made with Morae [32] usability evaluation software installed on portable computers. The verbal content was transcribed for analy- sis to be used together with the resulting screen captures. In a debriefing period after completion, subjects were asked follow- up questions to elaborate or elucidate their actions and reasoning. The results of this study were compiled in a technical report.

3.4. Walkthroughs, expert evaluations and interviews

A team of professional health informatics consultants carried out independently usability assessment and walkthroughs and conducted interviews with six primary care physicians and special- ists (two women) whose experience with the application ranged from novice to expert. The results of the evaluation were presented in a technical report.

3.5. The development of heuristic usability assessment scheme

Four sets of usability heuristics with a substantial theoretical overlap have been generally accepted and are widely used in pro- fessional evaluations: Nielsen’s 10 usability heuristics [33] (de- rived from the results of a factor analysis of about 250 problems), Shneiderman’s Eight Golden Rules of Interface Design [34], Tognazzini’s First Principles of Interaction Design [35], and a set of principles based on Edward Tufte’s visual display work [36]. These approaches were recently integrated into a single Mul- tiple Heuristics Evaluation Table by identifying overlaps and com- bining conceptually related items [37].

These general heuristics sets have been used to evaluate health- care-related applications [38–41] and consumer-health websites [42]. A set of aggregated Nielsen’s and Schneiderman’s heuristics was proposed by Zhang and colleagues [43] for HIT and applied to the evaluation of an infusion pump [44] and a clinical web appli- cation [45]. However, the categories and guidelines do not specifi- cally address biomedical or clinical concepts. Our goal was to formulate additional categories to increase their cumulative explanatory power.

To this end we analyzed all 155 statements about usability problems collected during the study to identify emergent themes following the grounded theory principles [46]. Two researchers then independently assigned the statements into heuristic catego- ries, either general or modified according to newly identified themes. Several iterative coding sessions and discussions ensued, and as a result of extensive comparison and refinement, 12 heuris- tic categories were formulated (Table 3).

Table 1 Comments by heuristic category and source.

Heuristic category

Email N (%)

Survey N (%)

Evaluation N (%)

Interview N (%)

Totals N (%)

Biomedical 21 (81) 0 1 (4) 4 (15) 26 (17) Cognition 12 (46) 3 (12) 4 (15) 7 (27) 26 (17) Control 17 (61) 4 (14) 5 (18) 2 (7) 28 (18) Customization 7 (29) 5 (28) 1 (6) 5 (28) 18 (12)

3.6. Participants

All data were collected from 45 clinicians within Partners Healthcare practice network who participated in either part of the study (with a small overlap). Most were primary care physi- cians (73%), about half were female (53%), and the mean age of the group was 48 years.

Fault 16 (94) 1 (6) 0 0 17 (11) Speed 3 (43) 3 (43) 1 (14) 0 7 (5) Terminology 4 (100) 0 0 0 4 (3) Transparency 4 (36) 1 (9) 6 (55) 0 11 (7) Workflow 1 (6) 3 (17) 8 (44) 6 (33) 18 (12) Totals 85 (55) 20 (13) 26 (17) 24 (15) 155

(100)

4. Results

Analyses were performed separately on comments by clinicians and on findings by usability experts. Results are presented in the following sections and contrasted.

4.1. Comments by clinicians

Results for comments are summarized in Table 1. There were 155 comments from 36 clinicians obtained either in the form of written communication (email and survey) or transcribed from di- rect verbal quotes (interview and evaluation). We received 85 emails from nine clinicians (reflecting a 50% response rate), and 20 free-text comments were entered in the online survey by 15 cli- nicians (54% response). Six clinicians who participated in usability evaluations made 26 comments and another six clinicians made 24 distinct comments during interviews.

Over a half of all responses (55%) were emails, and about equal numbers were obtained from the survey, evaluations and inter- views (15%, 13% and 17%, respectively). The most common form of a response that constituted about a third of collected data (N = 54) was an email classified as either a Biomedical, Control or Fault category. Comments from the other three sources were most likely to be classified in the following categories: Customization and Control for survey (N = 9, 45%), Transparency and Workflow for evaluations (N = 14, 54%), and Cognition and Workflow for interviews (N = 13, 54%). Overall, the Control, Cognition and Bio- medical categories described about a half of all data (52%), and about a third (35%) was classified in the Customization, Workflow and Technical categories. There were no Consistency or Context comments.

Although email was the most prevalent form of communication in the set, its proportion was different within each heuristic cate- gory (Fig. 1). For example, it added up to 80% or more in three cat- egories (Terminology, Fault and Biomedical) and to a majority (61%) in the Control category, but only one was classified as related to Workflow. Written response was more likely to be used for the reporting of technical, biomedical and interaction problems (e.g., Fault, Biomedical, Terminology, Control), while verbal comments often related to Workflow or Transparency difficulties. For exam- ple, almost 90% of comments made during evaluations were clus- tered in just four categories and similar distribution was found in data from interviews.

4.2. Findings by usability evaluators

The results are summarized in Table 2. There were 47 findings extracted from expert reports. Over two thirds were classified into just three categories: Cognition, Customization and Workflow. In contrast, none were in the Fault, Speed or Terminology categories and only one was classified as Biomedical. Technical and biomed- ical concepts were generally not represented in the evaluations.

4.3. Comments and findings comparison

We contrasted all 47 findings with a subset of 105 comments that included only email and survey. Findings were derived from

Table 3 Description of Heuristic Evaluation Categories.

Category Description

Consistency Hierarchy, grouping, dependencies and levels of significance are visually conveyed by systematically applied appearance characteristics, perceptual cues, spatial layout, text formatting and pre-defined color sets. Behavior of controls is predictable. Language in commands, labels and warnings is standardized

Transparency The current state is apparent and possible future states are predictable. Action effects, their closure and failure are indicated

Control The interruption, resumption and non-linear or parallel task completion is possible. Direct access to data across levels of hierarchy, backtracking, recovery from unwanted states and reversal of actions are possible

Cognition Content avoids extraneous information and excessive density. Representational formats allow perceptual judgment and unambiguous interpretation. Cognitive effort is reduced by minimalistic design, formatting and use of color, allowing fast visual searches. Recognition is preferred over recall. Conceptual model corresponds to work context and environment

Context Terms, labels, symbols and icons are meaningful and unambiguous in different system states. Alerts and reminders perceptually distinguish between general (disease, procedure, guidelines) and patient-specific content

Terminology Medical language is meaningful to users in all contexts of work, compatible with local variations and established terms

Biomedical Biomedical knowledge used in rules and decision support is current and accurate, reflecting guidelines and standards. It is evident how suggestions are derived from data and what decision logic is followed

Safety Complex combinations of medication doses, frequencies, units and time durations are disambiguated by appropriate representational formats and language, entries are audited for allowed value limits. Omissions are mitigated by goal and task completion summary views. Errors are prevented from accumulating and propagating through the system

Customization Preferred data views, organization, sorting, filtering, defaults, basic screen layout and behavior are persistent over use sessions and can be defined individually or according to role

Fault Software failures and functional errors are minimal, do not compromise safety and prevent the loss of data

Speed Minimal latency of screen loads and high perceived speed of task completion

Workflow Navigation, data entry and retrieval does not impede clinical task completion and the flow of events in the environment

Table 2 Findings by Heuristic Category and Source.

Heuristic category Evaluation N Interview N Total findings N (%)

Biomedical 1 0 1 (2) Cognition 10 6 16 (34) Control 2 4 6 (13) Customization 2 7 9 (19) Consistency 0 1 1 (2) Context 1 1 2 (4) Transparency 5 0 5 (11) Workflow 7 0 7 (15) Totals 28 19 47 (100)

786 J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790

reports of evaluation and interviews that already contained rein- terpreted verbal comments of the subjects. We therefore excluded comments made during evaluations from the comparison.

Comments and findings showed divergent trends in character- izing usability aspects of the Smart Form (Fig. 3). Comments were more likely to describe discrete, clearly manifested and highly spe- cific problems and events, such as software failures or concerns about medical logic or language (e.g., Control, Biomedical, Fault,

Terminology). Findings derived from usability evaluation, on the other hand, tended to explain conceptual problems related to over- all design and the suitability of the electronic tool to clinical work (e.g., Consistency, Context, Workflow). Both methods contributed about equally to the description of problems with human interac- tion (e.g., Cognition, Customization).

4.4. Implementation of design changes to a revised prototype

Individual comments and findings most often referred to single, discrete problems. Some problems were reported by several clini- cians or were identified by multiple methods. The 155 analyzed comments and findings reported 120 unique problems (77% ratio), and 12 problems were simultaneously described by more than one method (10% ratio). We have iteratively implemented design changes into the prototype on the basis of 56 reported problems (47%). Most of the problems that led to subsequent changes (34) were reported by email.

5. Discussion

Our data analysis has identified the relative strengths and weaknesses of the four evaluation approaches, their distinct utility and appropriateness for characterizing different usability concepts, and their cumulative explanatory power as a set of complementary methods used at specific points of the development lifecycle. The large number of comments that clinicians provided were a rich source of reports on software failures, slow performance and po- tential conflicts and inconsistencies in biomedical content, while usability experts generally gave comprehensive assessments of problems related to human interaction and workflow, including characterizations of problems with interface design and layout that negatively affect cognitive and perceptual saliency of displayed information. The core principles, attributes and expected results for each method are summarized in Table 4 and discussed in depth in the following sections.

5.1. Email

An email link embedded in the application is available to every- one and at all times, allowing almost instantaneous reporting of problems as they occur. Informaticians and computer technology specialists can learn from these comments how the software per- forms in authentic work conditions and how well it supports clini- cians in complex scenarios that commonly arise from the combination of personal workflows and preferences, unexpected events, and unusual, idiosyncratic, unplanned or non-standard interaction patterns. The wide range of conditions that affect per- formance and contribute to errors and failures would not be possi- ble to anticipate and simulate in the laboratory. Performance measures in actual settings also give evidence of the technical and conceptual strengths of the design. Insights from these reports give designers a unique opportunity to make the application more robust and tolerant of atypical interaction, more effective in man- aging and preventing errors, and more appropriate for the clinical task it supports.

The large number and variety of email reports and their often fragmentary content make them often hard to interpret. For exam- ple, it is difficult for clinicians to recall accurately the relevant and descriptive details of errors that were made or problems that were encountered during complex interactions with multi-step or inter- leaving tasks, and to convey a meaningful description of the event. However, informaticians may need details about the system state, work context or preceding actions that are often lacking in sponta- neous and short messages to evaluate how a problem originated

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

C om

m en

ts (

% )

Evaluation Heuristic

Email Survey Evaluation Interview

Fig. 2. Proportions of comments by heuristic and source.

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

C om

m en

ts (

N )

Evaluation Heuristic

Findings Comments

Fig. 3. Proportion of comments and findings by heuristic.

J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790 787

and its potential consequences. The usually large volume of emails accumulated over time also contains repetitive, idiosyncratic and inaccurate reports that may be of little value and need to be ex- cluded. A self-selection bias among respondents (e.g., novice users may be underrepresented) may accentuate marginal problems or conceal more serious ones. Difficulties of more conceptual charac- ter may be only rarely reported through comment messages, as was evident from the analysis of our data (e.g., the distribution of comments in heuristic categories, Fig. 2).

Among the most significant advantages of embedded email re- sponse links are their inexpensive implementation, network-wide availability, real-time response and continuous, active data collec- tion. These characteristics make email an excellent data collection method during pilot testing of release candidate versions and after the release of full versions. There is a high probability of quickly

discovering technical problems, an opportunity to review medical logic for decision support tools that may not have been tested in complex scenarios (e.g., a patient with multiple comorbidities and drug prescriptions), and a likelihood of finding inconsistencies in terminology or ambiguities in language and expressions. For example, of the 56 changes and corrections we implemented in the prototype, 36 (64%) such problems were reported and identi- fied in emails.

This method requires the software to be in the stage of a fully functional prototype or in its final release form. It may therefore be too laborious or expensive to make significant conceptual changes in design at that point. However, our data suggest (e.g., the proportions of comments to specific concepts in Fig. 2) that most of email-reported problems concern specific biomedical con- tent, terminology and technical glitches that may be relatively easy

Table 4 Comparison of clinician response and formal usability evaluation results.

Descriptions Email Survey Usability Studies

Heuristic focus Biomedical, Cognition, Control, Customization, Fault, Speed, Terminology

Control, Customization, Speed Cognition, Context, Consistency, Control, Customization, Safety Transparency, Workflow

Evaluated aspects Software problems, medical logic, decision support, use of terms, perceived speed, interaction difficulties, desired functions

Satisfaction, perceived speed of completion, qualitative assessments, desired functions, personal preferences, use context

Design concepts, actual and anticipated errors, cognitive load, workflow fit, cognitive model, skilled and novice performance

When to perform Pilot release, shortly after full release After pilot release, after full release, periodically

Early in design cycle, iteratively during prototyping, planning stage before new design

Advantages Can identify rare and complex use situations, immediate response when problem occurs, everyone can comment

Allows comparison over time, broad reach, can be web-based, ongoing

Describes human error, mental models, strategy, structured and reliable, rich detail, insights into workflow integration

Limitations Often missing context, may not be intelligible, does not capture human error, self-selection bias

Relatively low reliability, reflective, subjective, may be hard to interpret

Laborious, expertise is required, describes only few use cases, needs expensive physician time

Source of data Clinicians Clinicians Usability experts

Timeframe Continuous Periodic Episodic

Sample quotes ‘‘I saved the note, then tried to Sign, and the system just froze” ‘‘There appears to be a problem with logic when the creatinine is too low” ‘‘I found it challenging to find signature location. Spent extra time just looking at the screens”

‘‘Allow for easier management of insulin and titration” ‘‘Needs a faster medication entry format” ‘‘I find it cumbersome to my workflow”

She did not notice the Save icon and searched for a Save button at the bottom of the window. He knew where to look for vitals, but had to enter new values manually. Hide non-essential icons. Create a dynamic right- click context menu

788 J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790

to correct without large-scale changes in the code and screen layout.

5.2. Survey

Survey is another form of direct clinician response that we used in this study and it shares several characteristics with email com- munication, such as a potentially wide reach, economy of adminis- tration, a tendency for self-selection bias, relatively low response rate and the brevity of its form. Unlike email links, surveys are structured and contain a pre-determined set of questions to elicit responses and opinions on narrow topics of interest. They do not allow reporting problems in real time, however, and require respondents to recall and interpret past events at the time the sur- vey is completed. This may be difficult, as our data suggest that free-text answers to open-ended questions did not contain refer- ences to specific and detailed biomedical and technical problems, the most frequent categories represented in emails (see Table 1). Rather, clinicians tended to describe more broadly defined difficul- ties with screen control, navigation and customization.

The content in surveys, as in other direct forms of communica- tion, is often subjective, reflecting personal opinion, and therefore, of lower descriptive value and accuracy than data gathered in pro- fessional evaluations [22]. A substantial period of time needs to be allowed for potential survey respondents to work with a fully working prototype or the completed application before they can form meaningful opinions and gain a measure of proficiency.

Surveys can be administered periodically for comparisons over time and can be timed to coincide with important events such as technology or procedures updates that may affect the way the sys- tem is used. They can also be targeted to specific groups, such as primary care physicians, pediatricians and other specialists.

5.3. Usability evaluations and interviews

The most telling indicators of conceptual flaws in the design come from the observation of human interaction errors [47]. They can provide insights into discrepancies between expected and ac- tual behavior and identify inappropriate and ambiguous represen- tational formats of information on the screen that impairs its

accurate interpretation [48]. Errors are rarely reported directly in emails or in surveys, as the responders are not often aware of their own mistakes. For example, observation experts in our study re- ported that a clinician during a simulated task ‘‘could not tell whether the patient was taking Aspirin, assumed that urinalysis could only be ordered on paper and did not notice the save button,” an insight that would not be gained by introspection and recall.

Usability inspection methods in which experts alone evaluate the interface, such as the cognitive walkthrough and heuristic eval- uation, provide predominantly normative assessments. In other words, they report how well the interface supports the completion of a standardized task that can be reasonably expected to be per- formed routinely, and measure the extent to which the design ad- heres to general usability standards. These methods produce reference models of interaction that can be compared to evidence from field observations.

Ethnographic and observational methods such as think-aloud studies, on the other hand, derive data from analyzing unscripted and natural interactions with the software by non-experts with various levels of computer and task-domain skills. They are there- fore inherently descriptive and analytic and allow researchers to make inferences about the clarity and suitability of the design to the task from observed competencies and errors. Usability experts can integrate findings about interaction errors with interface eval- uations, cognitive walkthroughs and heuristic evaluations into a comprehensive analysis and formulate optimal strategies for mak- ing modifications to the interface. Normative and descriptive methods together constitute a comprehensive evaluation of design in progress that can be repeated iteratively early in the process to refine data representation and interaction concepts in each succes- sive version.

Findings from experts in this study have been clearly focused on conceptual and interaction-related aspects of the Smart Form (Table 2). The structured format of think-aloud studies follows pre-defined clinical scenarios that generally contain validated bio- medical data and unambiguous terminology that do not represent potential problems to be reported in evaluations. Comments from clinicians working with the software in real settings, however, are more descriptive of specific factual, technical and biomedical er- rors that observational studies frequently do not capture. The

J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790 789

relative proportions of expert findings and clinicians’ comments in each heuristic category and their respective tendency to describe different aspects of the software are clearly evident in Fig. 3.

Experts can also capture more easily positive aspects of the de- sign and confirm successful trends. For example, an evaluator re- ported that ‘‘the subject seemed comfortable navigating around and understands how to update medications in the system.” Email responses are often initiated at the time of a failure or when an er- ror is encountered, but rarely when the system is working well. In effect, successful performance is characterized by uneventful and well-progressing work which is apparent to observers but not of- ten reported back to designers by clinicians.

Interviews with clinicians are usually done in conjunction with observations to elucidate aspects of collected data that require proper context for interpretation, and also as ‘‘debriefings” at the end of after think-aloud studies. The results of expert evaluations commonly incorporate insights and findings from interviews into comprehensive reports.

Expert evaluations are indispensable during the initial design stages when even significant corrections and reconceptualizations are still possible without incurring steep penalties in time and development effort.

6. Conclusion

This study has been conducted to characterize and compare four usability evaluation methods that were employed by the re- search team during the design and pilot testing of new clinical doc- umentation software. We have also formulated a classification scheme of heuristic usability concepts that incorporates estab- lished principles and extends them for evaluations specific to the clinical software domain.

Our results suggest that no single method describes better than others all or most usability problems, but rather that each is opti- mally suited for evaluations at different points of the design and deployment process, and that they all characterize different as- pects of the interface and human interaction. The studies and assessments we have performed were embedded in the design pro- cess and spanned the entire development cycle.

Heuristic evaluations and ethnographic observations of actual clinical work by usability experts inform and guide conceptual and workflow-related changes and need to be performed itera- tively early in the design cycle so that they can be incorporated without excessive effort and time. Responses elicited directly from clinicians and other users through email links and surveys report mostly technical, biomedical, terminology and control problems that may occur in a wide variety of workflows and idiosyncratic use patterns.

The evaluations were conducted on the relatively small scale of a pilot study. However, the smaller size may be typical of many software development efforts at large academic and healthcare centers. The findings and lessons learned in this study may, there- fore, be of interest to information system designers, developers and research and development centers affiliated with hospitals and di- rectly related to their experiences with the design and improve- ment of clinical information systems. We have outlined a methodological approach that is applicable to most development processes of software intended for healthcare information systems.

We plan to formally validate and possibly revise the set of heu- ristics we formulated and apply it to the evaluation of an informa- tion system in its entirety that will also include judgments about safety that were not performed in this pilot study.

Health information technology is still in its nascent state today. Order entry systems, for example, still represented only a second generation technology in 2006 and had many limitations that pre-

cluded their meaningful integration into the process of care [49]. Applications not appropriately matched to clinical tasks tend to be chronically underused and may be eventually abandoned [21].

Acknowledgments

The Smart Form research was supported by Grant 5R01HS015169-03 from the Agency For Healthcare Research And Quality. We wish to thank Alan Rose, Ruslana Tsurikova, Lynn Volk and Svetlana Turovsky for their contribution and expertise in data collection and initial interpretation, and to all clinicians who par- ticipated in the four studies as subjects.

References

[1] Ammenwerth E, Schnell-Inderst P, Machan C, Siebert U. The effect of electronic prescribing on medication errors and adverse drug events: a systematic review. J Am Med Inform Assoc 2008;15:585–600.

[2] Linder JA, Ma J, Bates DW, Middleton B, Stafford RS. Electronic health record use and the quality of ambulatory care in the United States. Arch Intern Med 2007;167:1400–5.

[3] Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 2006;144:742–52 [see comment].

[4] Kaushal R, Shojania KG, Bates DW. Effects of computerized physician order entry and clinical decision support systems on medication safety: a systematic review. Arch Intern Med 2003;163:1409–16.

[5] Koppel R, Metlay JP, Cohen A, Abaluck B, Localio AR, Kimmel SE, et al. Role of computerized physician order entry systems in facilitating medication errors. JAMA 2005;293:1197–203.

[6] Horsky J, Kuperman GJ, Patel VL. Comprehensive analysis of a medication dosing error related to CPOE. J Am Med Inform Assoc 2005;12:377–82.

[7] Ludwick DA, Doucette J. Adopting electronic medical records in primary care: lessons learned from health information systems implementation experience in seven countries. Int J Med Inform 2009;78:22–31.

[8] Kaplan B, Harris-Salamone KD. White paper: Health IT project success and failure: recommendations from literature and an AMIA workshop. J Am Med Inform Assoc 2009;16:291–9.

[9] DesRoches CM, Campbell EG, Rao SR, Donelan K, Ferris TG, Jha AK, et al. Electronic health records in ambulatory care: a national survey of physicians. N Engl J Med 2008;359:50–60.

[10] Smelcer JB, Miller-Jacobs H, Kantrovich L. Usability of electronic medical records. J Usability Stud 2009;4:70–84.

[11] Harrison MI, Koppel R, Bar-Lev S. Unintended consequences of information technologies in health care: an interactive sociotechnical analysis. J Am Med Inform Assoc 2007;14:542–9.

[12] Pizziferri L, Kittler AF, Volk LA, Honour MM, Gupta S, Wang SJ, et al. Primary care physician time utilization before and after implementation of an electronic health record: a time-motion study. J Biomed Inform 2005;38:176–88.

[13] Simon SR, Soran CS, Kaushal R, Jenter CA, Volk LA, Burdick E, et al. Physicians’ use of key functions in electronic health records from 2005 to 2007: a statewide survey. J Am Med Inform Assoc 2009;16:465–70.

[14] Weir CR, Nebeker JJR, Hicken BL, Campo R, Drews F, LeBar B. A cognitive task analysis of information management strategies in a computerized provider order entry environment. J Am Med Inform Assoc 2007;14:65–75.

[15] Vicente KJ. Work domain analysis and task analysis: a difference that matters. In: Schraagen JM, Chipman SF, editors. Cognitive task analysis. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.; 2000. p. 101–18.

[16] Zhang J, Butler K. UFuRT: A work-centered framework and process for design and evaluation of information systems. HCI International Proceedings; 2007.

[17] Schnipper JL, Linder JA, Palchuk MB, Einbinder JS, Li Q, Postilnik A, et al. ‘‘Smart Forms” in an electronic medical record: documentation-based clinical decision support to improve disease management. J Am Med Inform Assoc 2008;15:513–23.

[18] Sittig DF, Stead WW. Computer-based physician order entry: the state of the art. J Am Med Inform Assoc 1994;1:108–23.

[19] HIMSS EHR usability task force. Defining and testing EMR usability: principles and proposed methods of EMR usability evaluation and rating. HIMSS; 2009.

[20] Ball MJ, Silva JS, Bierstock S, Douglas JV, Norcio AF, Chakraborty J, et al. Failure to provide clinicians useful IT systems: opportunities to leapfrog current technologies. Methods Inf Med 2008;47:4–7.

[21] Zheng K, Padman R, Johnson MP, Diamond HS. An interface-driven analysis of user interactions with an electronic health records system. J Am Med Inform Assoc 2009;16:228–37.

[22] Schwarz N, Oyserman D. Asking questions about behavior: cognition, communication, and questionnaire construction. Am J Eval 2001;22:127.

[23] Jaspers MWM. A comparison of usability methods for testing interactive health technologies: methodological aspects and empirical evidence. Int J Med Inform 2009;78:340–53.

790 J. Horsky et al. / Journal of Biomedical Informatics 43 (2010) 782–790

[24] Uldall-Espersen T, Frokjaer E, Hornbaek K. Tracing impact in a usability improvement process. Interact Comput 2008;20:48–63.

[25] Peleg M, Shachak A, Wang D, Karnieli E. Using multi-perspective methodologies to study users’ interactions with the prototype front end of a guideline-based decision support system for diabetic foot care. Int J Med Inform 2009;78:482–93.

[26] Sittig DF, Singh H. Eight rights of safe electronic health record use. JAMA 2009;302:1111–3.

[27] Nielsen J. Iterative user interface design. IEEE Comput 1993;26:32–41. [28] Gould JD, Lewis C. Designing for usability: key principles and what designers

think. Commun. ACM 1985;28:300–11. [29] Walker JM, Carayon P, Leveson N, Paulus RA, Tooker J, Chin H, et al. EHR safety:

the way forward to safe and effective systems. J Am Med Inform Assoc 2008;15:272–7.

[30] Leveson NG. Intent specifications: an approach to building human-centered specifications. IEEE Trans Software Eng 2000;26:15–35.

[31] Wachter SB, Agutter J, Syroid N, Drews F, Weinger MB, Westenskow D. The employment of an iterative design process to develop a pulmonary graphical display. J Am Med Inform Assoc 2003;10:363–72.

[32] Morae. 3.1 ed., Okemos, MI: TechSmith Corporation; 2009. [33] Nielsen J, Mack RL. Usability inspection methods. New York: John Wiley &

Sons; 1994. [34] Shneiderman B. Designing the user interface. Strategies for effective human–

computer-interaction. 4th ed. Reading, MA: Addison Wesley Longman; 2004. [35] Tognazzini B. Tog on interface. Reading, Mass.: Addison-Wesley; 1992. [36] Tufte ER. The visual display of quantitative information. 2nd ed. Cheshire,

Conn.: Graphics Press; 2001. [37] Atkinson BF, Bennet TO, Bahr GS, Nelson MM. Development of a multiple

heuristics evaluation table (MHET) to support software development and usability analysis. In: Universal access in human–computer interaction: coping with diversity. Berlin/Heidelberg: Springer; 2007.

[38] Thyvalikakath TP, Schleyer TK, Monaco V. Heuristic evaluation of clinical functions in four practice management systems: a pilot study. J Am Dent Assoc 2007;138:209–10.

[39] Scandurra I, Hagglund M, Engstrom M, Koch S. Heuristic evaluation performed by usability-educated clinicians: education and attitudes. Stud Health Technol Inform 2007:205–16.

[40] Lai TY. Iterative refinement of a tailored system for self-care management of depressive symptoms in people living with HIV/AIDS through heuristic evaluation and end user testing. Int J Med Inform 2007;76:S317–24.

[41] Tang Z, Johnson TR, Tindall RD, Zhang J. Applying heuristic evaluation to improve the usability of a telemedicine system. Telemed J E Health 2006;12:24–34.

[42] Choi J, Bakken S. Heuristic evaluation of a web-based educational resource for low literacy NICU parents. Stud Health Technol Inform 2006:194–9.

[43] Zhang J, Johnson TR, Patel VL, Paige DL, Kubose TK. Using usability heuristics to evaluate patient safety of medical devices. J Biomed Inform 2003;36:23–30.

[44] Graham MJ, Kubose TK, Jordan DA, Zhang J, Johnson TR, Patel VL. Heuristic evaluation of infusion pumps: implications for patient safety in intensive care units. Int J Med Inform 2004;73:771–9.

[45] Allen M, Currie LM, Bakken S, Patel VL, Cimino JJ. Heuristic evaluation of paper- based web pages: a simplified inspection usability methodology. J Biomed Inform 2006;39:412–23.

[46] Corbin JM, Strauss AL. Basics of qualitative research: techniques and procedures for developing grounded theory. 3rd ed. Los Angeles, Calif.: Sage Publications, Inc.; 2008.

[47] Hall JG, Silva A. A conceptual model for the analysis of mishaps in human- operated safety-critical systems. Saf Sci 2008;46:22–37.

[48] Johnson CM, Turley JP. The significance of cognitive modeling in building healthcare interfaces. Int J Med Inform 2006;75:163–72.

[49] Ford EW, McAlearney AS, Phillips MT, Menachemi N, Rudolph B. Predicting computerized physician order entry system adoption in US hospitals: can the federal mandate be met? Int J Med Inform 2008;77:539–45.

Complementary methods of system usability evaluation: Surveys and observations during software design and development cycles

Introduction
Background
Methods

Email via an embedded link
Online survey
Think-aloud study and observations
Walkthroughs, expert evaluations and interviews
The development of heuristic usability assessment scheme
Participants

Results

Comments by clinicians
Findings by usability evaluators
Comments and findings comparison
Implementation of design changes to a revised prototype

Discussion

Email
Survey
Usability evaluations and interviews

Conclusion
Acknowledgments
References