Usability testing

From Wikipedia, the free encyclopedia

Jump to: navigation, search

human-computer interaction]]

Contents

[edit] History of usability testing

A Xerox Palo Alto Research Center (PARC) employee wrote that PARC used extensive usability testing in creating the Xerox Star, introduced in 1981.[1] Only about 25,000 were sold, leading many to consider the Xerox Star a commercial failure.

The Google Book Search preview, of the Inside Intuit book, says (page 22, 1984), "... in the first instance of the Usability Testing that later became standard industry practice, LeFevre recruited people off the streets... and timed their Kwik-Chek (Quicken) usage with a stopwatch. After every test... programmers worked to improve the program."[2]) Scott Cook, Intuit co-founder, said, "... we did usability testing in 1984, five years before anyone else... there's a very big difference between doing it and having marketing people doing it as part of their... design... a very big difference between doing it and having it be the core of what engineers focus on.[3]

Cook may not have known of the PARC work, but it sounds more like he knew it only related to marketing design, as opposed to engineering and re-engineering decisions based on direct user input. In any event, at the time of this writing Google seems to have no Usability Testing projects between the PARC work and Quicken, but many after Quicken became a top commercial seller.

[edit] Goals of usability testing

  • Performance -- How much time, and how many steps, are required for people to complete basic tasks? (For example, find something to buy, create a new account, and order the item.)
  • Accuracy -- How many mistakes did people make? (And were they fatal or recoverable with the right information?)
  • Recall -- How much does the person remember afterwards or after periods of non-use?
  • Emotional response -- How does the person feel about the tasks completed? Is the person confident, stressed? Would the user recommend this system to a friend?

[edit] What usability testing is not

Simply gathering opinions on an object or document is market research rather than usability testing. Usability testing usually involves systematic observation under controlled conditions to determine how well people can use the product. 1

Rather than showing users a rough draft and asking, "Do you understand this?", usability testing involves watching people trying to use something for its intended purpose. For example, when testing instructions for assembling a toy, the test subjects should be given the instructions and a box of parts. Instruction phrasing, illustration quality, and the toy's design all affect the assembly process.

[edit] Methods

think aloud protocol and eye tracking.

[edit] Hallway testing

Hallway testing (or hallway usability testing) is a specific methodology of software usability testing. Rather than using an in-house, trained group of testers, just five to six random people, indicative of a cross-section of end users, are brought in to test the software (be it an application, web site, etc.); the name of the technique refers to the fact that the testers should be random people who pass by in the hallway. The theory, as adopted from Jakob Nielsen's research, is that 95% of usability problems can be discovered using this technique.

[edit] Remote testing

Remote usability testing (also known as unmoderated or asynchronous usability testing) involves the use of a specially modified online survey, allowing the quantification of user testing studies by providing the ability to generate large sample sizes. Additionally, this style of user testing also provides an opportunity to segment feedback by demographic, attitudinal and behavioural type. The tests are carried out in the user’s own environment (rather than labs) helping further simulate real-life scenario testing. This approach also provides a vehicle to easily solicit feedback from users in remote areas.

[edit] How many users to test?

In the early 1990s, Jakob Nielsen, at that time a researcher at Sun Microsystems, popularized the concept of using numerous small usability tests -- typically with only five test subjects each -- at various stages of the development process. His argument is that, once it is found that two or three people are totally confused by the home page, little is gained by watching more people suffer through the same flawed design. "Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford." 2. Nielsen subsequently published his research and coined the term heuristic evaluation.

The claim of "Five users is enough" was later described by a mathematical model[1] which states for the proportion of uncovered problems U

U = 1 − (1 − p)n

where p is the probability of one subject identifying a specific problem and n the number of subjects (or test sessions). This model shows up as an asymptotic graph towards the number of real existing problems (see figure below).

Image:Virzis Formula.PNG

In later research Nielsen's claim has eagerly been questioned with both empirical evidence 3 and more advanced mathematical models [2]. Two key challenges to this assertion are: (1) since usability is related to the specific set of users, such a small sample size is unlikely to be representative of the total population so the data from such a small sample is more likely to reflect the sample group than the population they may represent and (2) Not every usability problem is equally easy-to-detect. Intractable problems happen to decelerate the overall process. Under these circumstances the progress of the process is much shallower than predicted by the Nielsen/Landauer formula [3].

Most researchers and practitioners today agree that, although testing 5 users is better than not testing at all, a sample size larger than five is required to detect a satisfying amount of usability problems.

[edit] See also

[edit] References

  1. ^ Virzi, R.A., Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough? Human Factors, 1992. 34(4): p. 457-468.
  2. ^ Caulton, D.A., Relaxing the homogeneity assumption in usability testing. Behaviour & Information Technology, 2001. 20(1): p. 1-7
  3. ^ Schmettow, Heterogeneity in the Usability Evaluation Process. In: M. England, D. & Beale, R. (ed.), Proceedings of the HCI 2008, British Computing Society, 2008, 1, 89-98

[edit] External links

Personal tools