Remote Evaluation Methods

Although existing lab-based usability formative evaluation is frequently and effectively applied to improving usability of software user interfaces, it has limitations. Several factors have led to the development of remote usability evaluation methods:

  • the Web, the Internet, and remote work settings have become intrinsic parts of usage patterns and this work context can be difficult or impossible to reproduce in a laboratory setting
  • most software applications have a life cycle extending well beyond the first release
  • project teams want more relevant usability data – more representative of real world usage
  • transporting users to a usability lab or developers to remote locations can be very costly

Fortunately, deployment of an application creates an additional source of usability data, from real-usage. Remote evaluation uses the network itself as a bridge to take interface evaluation to a broad range of users in their natural work settings, allowing formative evaluation to continue downstream, after implementation and deployment, to obtain formative usability data from real users in real work settings. There are several types of remote usability evaluation, as described and compared in an upcoming paper (keep tuned here for a link)

Sometimes developers think that alpha and beta testing are adequate for post-deployment usability testing, but these kinds of testing usually do not qualify as formative usability evaluation. Typical alpha and beta testing in the field is accomplished by asking users to give feedback in reporting problems encountered and commenting on what they think about a software application. This kind of post hoc data (e.g., from questionnaires and surveys) is useful in determining user satisfaction and overall impressions of the software. It is not, however, a substitute for the detailed data observed during usage and associated closely with specific task performance. This detailed usage data, perishable if not captured immediately and precisely as it arises during usage, is essential to isolate specific usability problems within the user interaction design.

Relevance of Critical Incident Data

The detailed observational data described just above is exactly the kind of data one obtains from the usability lab, especially in the form of critical incident data. Despite numerous variations in procedures for gathering and analyzing critical incidents, researchers and practitioners agree about the definition of a critical incident in the context of formative usability evaluation: A (negative) critical incident is an event or occurrence observed within task performance that is likely to be an indicator of one or more usability problems. Critical incident data are arguably the most important kind of data for finding and fixing usability problems in formative evaluation.

Our user-reported critical incident method

Some remote usability methods use the Internet as an extension of usability lab video and audio cables, sending video and audio of usage sessions over the Internet. This approach, however, is expensive in terms of bandwidth required and does nothing to aid data analysis. We wanted to develop a remote evaluation method that is cost effective for both data capture and analysis. Because of the importance of critical incident data and the opportunity for users to capture it, our primary goal to capture critical incident data, satisfying the following criteria:

  • users self-report own critical incidents
  • tasks are performed by real users
  • users are located in normal working environments
  • data are captured in day-to-day task situations
  • no direct interaction is needed between user and evaluator during an evaluation session
  • data capture is cost-effective
  • data are high quality and therefore relatively easy to convert into usability problems

Our method is based on self-reporting of critical incidents by remote users via a Web-based reporting tool, a software tool residing on the user’s computer. Critical incident reports are augmented with task context in the form of screen-sequence video clips and evaluators analyze these contextualized critical incident reports, transforming them into usability problem descriptions.

The good news from our evaluation study is that users with no background in software engineering or human-computer interaction, and with the barest minimum of training in critical incident identification, can effectively perform those activities essential to the method: Users can identify, report, and rate the severity level of their own critical incidents.

The bad news is that we found the point in time when users initiate a critical incident report is often significantly delayed after the onset of the critical incident. Because video clips used for context are composed of screen activity captured just before critical incident reporting, the delay in reporting destroys the relevance of the clips, nullifying their value to usability problem analysis. This outcome led to redesign of the video capture method, based on a "declaration of awareness" of a critical incident by the user to trigger video clip capture, separate from critical incident reporting.

More recently, Omar Vasnaik has explored extending the user-report critical incident method by adding mechanisms for "freezing" and "thawing" critical incident contexts. Not every user can report a critical incident immediately. Sometimes users are occupied with tasks (e.g., serving customers) and must wait until they are less occupied to do the reporting. However, retrospective reporting is subject to loss of detail due to the effect of proactive interference on human memory. In response, we created an initial design of a mechanism to freeze some detail of the context of a critical incident (a screen image and some textual comments or a brief narrated video clip of screen action) so that it can be thawed to partially restore memory of the details for later reporting. We have yet to perform any studies on whether the freezing and thawing mechanisms can refresh the user's memory enough to partially compensate for this loss.


Our work on the user-reported critical incident method began with a 1997 Masters thesis, The User-Reported Critical Method for Remote Usability Evaluation, by Jose Castillo.

An early case study is reported in [Hartson, H.R., Castillo, J.C., Kelso, J., Kamler, J., and Neale, W.C. (1996). Remote Evaluation: The Network as an Extension of the Usability Laboratory. Proceedings of CHI'96 Human Factors in Computing Systems, 228-235].

Building on the lessons of the case study, we developed a new version of the remote usability evaluation method and tool and evaluated its effectiveness [Castillo, J.C., Hartson, H.R., and Hix. D. (1998). Remote Usability Evaluation: Can Users Report Their Own Critical Incidents? Summary of CHI'98 Human Factors in Computing Systems, 253-254; Hartson, H.R., and Castillo, J.C. (1998).

An evaluation study and summary of the work to date is in: Remote evaluation for post-deployment usability improvement. Proceedings of the Working Conference of Advanced Visual Interfaces (AVI'98), 22-29.]

A journal article, including a discussion of the comparative advantages and disadvantages, costs, and benefits of various approaches to remote usability evaluation is now in progress (stay tuned for a link).

Collaborative remote usability evaluation site

Visit a collaborative remote usability evaluation site {link}that we established, separate from this research site. If you would like to have your work included in this site, send us your information, papers, studies, and other material related to remote usability evaluation methods.