Users play cards. We keep score. Magic results!
Contents
Click a link below to jump to a particular section; click any "CONTENTS" image following a section heading to jump back here.
Introduction 
Our use of Microsoft's Product Reaction Cards over numerous usability studies gives us a window into users' experience that is eye-opening and amazingly consistent within each study. We show and tell how we use the cards and the results we obtain, focusing on several case studies. We give you the strategy to use them in your studies.
Desirability is Hard to Measure 
Interest in understanding the "desirability factor" in user experience continues to grow while the use of post-test questionnaires to measure desirability continues to be problematic.
The ISO definition of usability (9241-11) contains three elements for gauging usability: effectiveness, efficiency, and satisfaction. The first two elements can be readily measured and observed, but the elusive quality of satisfaction is hard to gauge, especially when we know that users tend to provide more positive feedback about their experience, when asked, than observers noted.
Microsoft Creates a Desirability Toolkit
Microsoft created a "desirability toolkit" to address the elusive desirability factor, and presented their methodology and results in 2002 (Benedek & Miner) and 2004 (Williams, Kelly, Anderson, Zavislak, Wixon, & de los Reyes).
The original desirability toolkit had two parts:
- A faces questionnaire, in which participants were asked to look at a photograph of a face and rate how closely the facial expression matched their experience with performing a task.
- Product reaction cards, a deck of 118 cards with 60% positive and 40% negative or neutral words, from which participants chose the words that reflected their feelings about the experience.
The faces questionnaire confused some participants and didn't produce consistent results, so it was abandoned. The product reaction cards played well, so these were refined before the final card deck was determined. A different strategy for using the cards was presented by Microsoft for the rollout of MSN 9 (2004).
Since then, a few others have reported sporadically and sometimes anecdotally about their use of the product reaction cards (Rohrer, 2009; Travis, 2008; Tullis & Stetson, 2004).
We Got on Board in 2006
Our interest in using the cards was driven by our desire to understand the desirability factor, which we felt that post-task and post-test questionnaires didn't reveal very well. As most of our studies are qualitative, we wanted to be able to explain users' experiences qualitatively, and we especially wanted users to demonstrate how they felt about the experience in their own words, rather than via the fixed format of a questionnaire.
The 118 cards in the product reaction card deck offered us a way to see if this would work. And it did . . . beyond our wildest expectations.
Here's how we use them in our studies:
- After completing a study (and in some cases, after certain scenarios within a study), we ask the participant to go to a table with the cards spread out in a random pattern.
- We ask each participant to look over the cards, and pick up 3 or 4 or 5 cards (making the suggestion in a way that maintains a flexible requirement on the number of cards) that match their experience of working with the product.
- We then ask the participant to bring the cards back to the desk, place them under our document camera so that we can record them, and tell us what each card means to the participant.
- We record the comments made by the participant, both on video and in our log (for analysis in the findings meeting).
- We return the cards to different places on the table, so that they are arranged differently for the next selection round, whether that selection is made by the same or a different participant.
Case Studies Prove the Point of the Cards' Effectiveness 
Since 2006, our studies have spanned a variety of industries and applications. We've tested basic web-based applications for education, telephone-based interactive voice response (IVR) systems for the telecommunications and airline industries, complex web-based corporate applications for the hospitality industry, and even TV weather stations' websites.
In our early work with the cards, we were amazed by two outcomes:
- How an overall negative or positive assessment was indicated by participants' card choices.
- How many times the same card was picked by each participant in a study.
What follows is a select overview of several of our usability studies where we've employed the product reaction cards to add additional depth to our findings.
Education Website
In a small study of a teacher education website, a total of 18 cards were chosen by the six participants. Seventeen of those cards were positive; thus, the choices reflected an unequivocal endorsement of the site. What truly captivated us and confirmed that we should keep using the cards were the individual card selections.
Six out of six participants chose the card useful to describe their experience on the teacher education website. Three participants chose the card organized. We were intrigued, to say the least.
Interactive Voice Response
We wanted to know if the cards would work well in iterative developments—that is, would there be meaningful results if we tested Version 1.0 and retested several weeks later when the client used the results to develop Version 2.0?
We got our chance to test that and more in a study on an interactive voice response (IVR) system for a telecommunications provider. The baseline test on Version 1.0 yielded a dramatic negative assessment. Only 41% of the cards selected were positive. We tested Version 2.0 several weeks later with new participants; in that test, participants also tested another IVR, which was currently in use by the company.
Version 2.0 garnered 85% positive card choices. The card efficient was selected four times; business-like, convenient, high-quality, and easy-to-use were each selected twice.
By way of comparison, the other IVR (the current system) tested in this study was not anywhere near as popular with only 48% of the card selections reflecting a positive sentiment.
TV Station Weather Sites
In another study, we compared three TV station weather websites. The goal of the study was to determine which style of presentation and what features participants preferred. We asked participants to complete the same set of tasks on each site and we randomized the presentation order to prevent usage bias. From our study of stations A, B, and C, we could see there was a clear winner in the opinion of our participants.
When we looked at the basic negative and positive card choices for Station B, we saw that 89% of the cards selected were positive. The positive card choices for Station A (67%) and Station C (58%) indicated these stations' formats were not as well liked.
When we broke participants' choices down to individual card selections, Station A had the following repeated positive card choices: useful (4), time-saving (3), and clear (3). Appealing, clean, connected, helpful, and satisfying were selected twice each.
The least popular station—Station C—was not liked by participants because they felt it was confusing. Their other assessments of Station C, as compared to negative assessments for Station A and B, are shown in Figure 1.

Figure 1: A comparison of negative language choices in product reaction cards for Stations A, B, and C shows users' card choices.
Hospitality Industry
We have also assessed the product reaction cards in a study with a significant longitudinal span and an aggressive iterative development cycle. A major global hotel group created a web-based application to help their hotel properties define, measure, and implement various environmental initiatives.
Their first version was not received well by the study's participants. With only 42% of the card choices reflecting a positive assessment, it was clear the application was problematic. What did emerge from participants' choices with the cards were themes. Participants' repeated positive card choices were low, with only comprehensive, professional, and usable selected twice each. However, the cards they did select clustered in broad themes.
On the positive end of the scale, the themes of Quality, Appearance, Ease-of-Use, and Motivation emerged from participants' card choices.
Negative card choices told a different story on this first version of the application. There were significant repeated card choices with time-consuming (6) and hard-to-use (5) being most notable. Here, however, we also saw two strong themes emerge from the card choices: Ease-of-Use and Speed. Eighteen individual card choices were grouped into the theme of Ease-of-Use. Eight individual card choices landed clearly in Speed.
The application was, in the eyes of participants, neither easy to use nor fast. Version 1.0 was entirely scrapped, and eight months later we were working with Version 2.0 prototype.
The results this time were of the night-to-day variety. This second iteration of the application had 82% of the card choices reflect positive language. The theme, Ease-of-Use, which, in the first iteration, had only seven positive cards, now had 25. The card useful was selected five times; usable was selected four times.
With respect to negative themes and card choices, the theme of Speed contained only two cards: time-consuming and slow. Previously, it had contained eight.
Fast-forward another eight months and we're now testing the pilot of Version 2.0. Astoundingly, no negative cards were chosen—100% of the participants' card choices were positive. The theme of Speed reversed its polarity as well. Now, Speed was a positive attribute of the application with fast selected three times and time-saving selected twice.
Conclusion 
Our work with the product reaction cards makes a strong case for their inclusion in usability studies. As a way for users to tell their story, the cards are invaluable to our work and strengthen the findings from other data collection sources.
References 
- Benedek, J., & Miner, T. (2002). Measuring desirability: New methods for measuring desirability in the usability lab setting. Retrieved from http://www.microsoft.com/usability/UEPostings/DesirabilityToolkit.doc

- Rohrer, C. (2009, January 14). Desirability studies. Retrieved from http://www.xdstrategy.com/blog

- Travis, D. (2008, March 3). Measuring satisfaction: Beyond the usability questionnaire. Retrieved from http://www.userfocus.co.uk/articles/satisfaction.html

- Tullis, T.S., & Stetson, J. N. (2004). A comparison of questionnaires for assessing website usability. Usability Professionals' Association Conference. Retrieved from http://home.comcast.net/~tomtullis/publications/UPA2004TullisStetson.pdf

- Williams, D., Kelly, G., Anderson, L., Zavislak, N., Wixon, D., & de los Reyes, A. (2004). MSN9: New user-centered desirability methods produce compelling visual design. Proc. CHI 2004, ACM Press, 959-974.
As Professor of Information Design and Director for graduate studies in Information Design and Communication, Carol Barnum teaches information design and usability testing. As co-founder (1994) and Director of the Usability Center at Southern Polytechnic, she works with clients to understand their users' experience. Barnum's latest book is Usability Testing Essentials: Ready, Set . . . Test! (Morgan Kaufmann, 2011).
Carol M. Barnum
Professor, Information Design and
Director, Usability Center
Southern Polytechnic State University
cbarnum@spsu.edu
Laura Palmer is an Assistant Professor in the Information Design and Communication program and Senior Associate at the Usability Center. Laura has been actively involved in studies at the Usability Center since 2008; she specializes in the visual presentation of qualitative usability results.
Laura A. Palmer
Assistant Professor, Information Design
Southern Polytechnic State University
lpalmer2@spsu.edu