There are significant methodological and philosophical differences between ethnographic processes and laboratory-based processes in the product development cycle. All too frequently, proponents of these data collection methods are set at odds, with members on both sides pointing fingers and declaring the shortcomings of the methods in question. Methodological purity, ownership and expertise are debated, with both ends of the spectrum becoming so engrossed in justifying themselves that the fundamental issues of product development are compromised. Namely, will the product work in the broadest sense of term. One side throws out accusations of a lack of measures and scientific rigor. The other side levels accusations about the irrelevance of a sterile, contextually detached laboratory environment. At the end of the day, the both sides make valid points and the truth, such as it is, lies somewhere between the two extremes in the debate. As such, we suggest that rather than treating usability and exploratory work as separate projects, that a mixed approach be used.
So why bridge methodological boundaries? Too frequently final interface design and product planning begin after testing in a laboratory setting has yielded reliable, measurable data. The results often prove or disprove the functionality of a product and any errors that may take place during task execution. Error and success rates are tabulated and tweaks are made to the system in the hopes of increasing performance and/or rooting out major problems that may delay product or site release and user satisfaction. The problem is that while copious amounts of data are produced and legitimate design changes ensue, they do not necessarily yield data that are valid in a real-life context. The data are reliable in a controlled situation, but may not necessarily be valid when seen in context. It is perfectly possible to obtain perfect reliability with no validity when testing. But perfect validity would assure perfect reliability because every test observation would yield the complete and exact truth. Unfortunately, neither perfection nor quantifiable truth does exist in the real world, at least as it relates to human performance. Reliable data must be supported with valid data which can best be found through field research.
Increasingly, people have turned to field observations as an effective way of checking validity. Often, an anthropologist or someone using the moniker of “ethnographer” enters the field and spends enough time with potential users to understand how environment and culture shape what they do. Ideally, these observations lead to product innovation and improved design. At this point, unfortunately, the field expert is dropped from the equation and the product or website moves forward with little cross-functional interaction. The experts in UI take over and the “scientists” take charge of ensuring the product meets measures that are, often, somewhat arbitrary. The “scientists” and the “humanists” do not work hand in hand to ensure the product works as it should in the hands of users going about their daily lives.
Often the divide stems from the argument that the lack of a controlled environment destroys the “scientific value” of research (a similar argument is made over the often small sample size), but by its very nature qualitative research always has a degree of subjectivity. But to be fair, small performance changes are given statistical relevance when they should not. In fact, any and all research, involves degrees of subjectivity and personal bias. We’re not usually taught this epistemological reality by our professors when we learn our respective trades, but it is true nonetheless. Indeed, if examining the history of science, there countless examples of hypothesis testing and discovery that would, if we apply the rules of scientific method used by most people, be considered less than scientifically ideal James Lind’s discovery of the cure for scurvy or Henri Becquerel discovery the existence of radioactivity serve as two such examples. Bad science from the standpoint of sample size and environmental control, brilliant science if you’re one of the millions of to people to have benefited from these discoveries. The underlying problem is that testing can exist in a pure state and that testing should be pristine. Unfortunately, if we miss the context we usually overlook the real problem. A product may conform to every aspect of anthropometrics, ergonomics, and established principles of interface design. It may meet every requirement and have every feature potential consumers asked for or commented on during the various testing phases. You may get an improvement of a second in reaction time in a lab, but what if someone using an interface is chest deep in mud while bullets fly overhead. Suddenly something that was well designed in a lab becomes useless because no one accounted for shaking hands, decrease in computational skills under physical and psychological stress, or the fact that someone is laying on their belly as they work with the interface. Context, and how it impacts performance with a web application, software application, or any kind of UI now becomes of supreme importance, and knowing the right question to ask and the right action to measure become central to usability.
So what do we do? We combine elements of ethnography and means-based testing, of course, documenting performance and the independent variables as part of the evaluation process. This means detaching ourselves from a fixation with controlled environments and the subconscious (sometimes conscious) belief that our job is to yield the same sorts of material that would be used in designing, say, the structural integrity of the Space Shuttle. The reality is that most of what we design is more dependent on context and environment than it is on being able to increase performance speed by 1%. Consequently, for field usability to work, the first step is being honest with what we can do. A willingness to adapt to new or unfamiliar methodologies is one of the principal requirements for testing in the field, and is one of the primary considerations that should be taken into account when determining whether a team member should be directly involved.
The process begins with identifying the various contexts in which a product or UI will be put to use. This may involve taking the product into their home and having them use it with all the external stresses going on around them. It may mean performing tasks as bullets fly overhead and sleep deprivation sets in. The point is to define the settings where use will take place, catalog stresses and distractions, then learn how these stresses impact performance, cognition, memory, etc. For example, if you’re testing an electronic reading device, such as the Kindle, it would make sense to test it on the subway or when people are laying in bed (and thus at an odd angle), because those are the situations in which most people read — external variables are included in the final analysis and recommendations. Does the position in bed influence necessary lumens or button size? Do people physically shrink in on themselves when using public transportation and how does this impact use? The idea is simply to test the product under the lived conditions in which it will find use. Years ago I did testing on an interface to be used in combat. It worked well in the lab, but under combat conditions the interface was essentially useless. What are seemingly minor issues dramatically changed the look, feel, and logic of the site. Is it possible to document every variable and context in which a product or application will see use? No. However, the bulk of these situations will be uncovered. And those which remain unaddressed frequently produce the same physiological and cognitive responses as the ones that were uncovered. Of course, we do not suggest foregoing measurement of success and failure, time of task, click path or anything else. These are still fundamental to usability. We are simply advocating understanding how the situation shapes usability and designing with those variables in mind.
Once the initial test is done, we usually leave the product with the participant for about two weeks, then come back and run a different series of tests. This allows the testing team to measure learnability as well as providing test participants time to catalog their experience with the product or application. During this time, participants are asked to document everything they can about not only their interaction with the product, but also what is going on in the environment. Once the research team returns, participants walk us through behavioral changes that have been the result of the product or interface. There are times when a client gets everything right in terms of usability, but the user still rejects the product because it is too disruptive to their normal activities (or simply isn’t relevant to their condition). In that case, you have to rethink what the product does and why.
Finally, there is the issue of delivery of the data. Nine times out of ten the reader is looking for information that is quite literal and instructional. Ambiguity and/or involved anecdotal descriptions are usually rejected in favor of what is more concrete. The struggle is how to provide this experience-near information. It means doing more than providing numbers. Information should be broken down into a structure such that each “theme” is easily identifiable within the first sentence. More often than not, specific recommendations are preferred to implications and must be presented to the audience in concrete, usable ways. Contextual data and its impact on use need the same approach.
A product or UI design’s usability is only relevant when taken outside the lab. Rather than separating exploratory and testing processes into two activities that have minimal influence on each other, a mixed field method should be used in most testing. In the final analysis, innovation and great design do not stem from one methodological process, but a combination of the two.