A FULL LIFE-CYCLE METHODOLOGY FOR STRUCTURED USE-CENTERED QUANTITATIVE USABILITY REQUIREMENTS SPECIFICATION AND USABILITY EVALUATION OF WEBSITES Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classified information. Guoqiang Hu Certificate of Approval: Richard O. Chapman Associate Professor Computer Science and Engineering Juan E. Gilbert Associate Professor Computer Science and Engineering Kai-Hsiung Chang, Chair Professor Computer Science and Engineering George T. Flowers Dean Graduate School A FULL LIFE-CYCLE METHODOLOGY FOR STRUCTURED USE-CENTERED QUANTITATIVE USABILITY REQUIREMENTS SPECIFICATION AND USABILITY EVALUATION OF WEBSITES Guoqiang Hu A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama December 18, 2009 A FULL LIFE-CYCLE METHODOLOGY FOR STRUCTURED USE-CENTERED QUANTITATIVE USABILITY REQUIREMENTS SPECIFICATION AND USABILITY EVALUATION OF WEBSITES Guoqiang Hu Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author December 18, 2009 Date of Graduation iii iv DISSERTATION ABSTRACT A FULL LIFE-CYCLE METHODOLOGY FOR STRUCTURED USE-CENTERED QUANTITATIVE USABILITY REQUIREMENTS SPECIFICATION AND USABILITY EVALUATION OF WEBSITES Guoqiang Hu Doctor of Philosophy, December 18, 2009 (M.S., Peking University, 1993) (B.S., Shenyang Institute of Technology, 1986) 199 Typed Pages Directed by Kai-Hsiung Chang World Wide Web has gained its dominant status in the cyber information and services delivery world in recent years. But how to specify website usability requirements and how to evaluate and improve website usability according to its usability requirements specification are still big issues to all the stakeholders. To help solve this problem, we propose a website usability requirements specification and usability evaluation methodology that features a structured use-centered quantitative full life-cycle method. A validation experiment has been designed and conducted to prove the validity of the proposed methodology, QUEST (Quantitative Usability Equations SeT). Its principle is to prove that QUEST has stronger website usability evaluation capability than the most v typical existing usability evaluation methods. Apparently, if QUEST?s website usability evaluation capability is established, then its usability metrics can be used to quantitatively specify upfront user usability requirements for websites. In the validation experiment, 7 usability experts and 20 student subjects were recruited to perform 4 tasks on 2 open source calendar websites, WebCalendar 1.0.5 and VCalendar 1.5.3.1; 4 sets of usability data had been collected, which were corresponding to the following 4 usability evaluation methods respectively: expert usability review, traditional user usability testing, SUS (System Usability Scale), and QUEST. According to the experiment results: both the expert usability review and the traditional user usability testing were inconclusive on which of the 2 target websites had better usability; although SUS rated the overall usability of WebCalendar 1.0.5 at 66.00 and VCalendar 1.5.3.1 at 61.75, it was subjective and vague on usability problems; in contrast, QUEST not only rated the overall usability of WebCalendar 1.0.5 at 56.59 and VCalendar 1.5.3.1 at 35.97, but also revealed where the usability problems were and how severe each usability problem was in a quantitative manner. In conclusion, it clearly can be stated that QUEST has stronger website usability evaluation capability than all other 3 most typical existing usability evaluation methods. So, the proposed methodology has been validated by the experiment results. vi ACKNOWLEDGMENTS This dissertation is whole-heartedly dedicated to the author?s always loving, always caring, and always supporting family. Style manual or journal used Communications of the ACM Computer software used Microsoft? Office Word 2003 vii viii TABLE OF CONTENTS LIST OF FIGURES .......................................................................................................... xii LIST OF TABLES ...........................................................................................................xiii 1 INTRODUCTION ........................................................................................................... 1 2 BACKGROUND ............................................................................................................. 8 2.1 The history of research in usability........................................................................ 8 2.2 Usability engineering............................................................................................. 9 2.3 Website usability engineering .............................................................................. 13 2.4 Related work and our methodology?s potential contributions ............................. 15 2.4.1 Current practice in measuring usability ..................................................... 15 2.4.2 Measuring usability in a single score......................................................... 17 2.4.3 Usability in User-Centered Design (UCD) ................................................ 20 2.4.4 Potential contributions of the proposed methodology ............................... 24 3 PRINCIPLE OF THE METHODOLOGY .................................................................... 28 3.1 Use features.......................................................................................................... 28 3.2 Efficiency............................................................................................................. 32 3.3 The origin of usability problems.......................................................................... 38 3.4 The solution ......................................................................................................... 43 3.5 Problems with the existing definitions of usability.............................................. 46 3.6 Principle of the methodology............................................................................... 51 ix 3.7 More thoughts on the proposed methodology ..................................................... 60 4 SOME FEATURES OF WEBSITES ............................................................................. 64 4.1 The general architecture of WWW ...................................................................... 64 4.2 Some features of websites.................................................................................... 66 4.2.1 Unification of functional services and contents......................................... 66 4.2.2 Contentized navigation .............................................................................. 68 4.2.3 Extensive utilization of short-cuts ............................................................. 68 4.2.4 High dynamicity and unchanging usability expectance............................. 69 5 WEBSITE USE FEATURES......................................................................................... 71 5.1 General terms....................................................................................................... 71 5.2 Website goal-task use features ............................................................................. 74 5.2.1 Presentation and its basic use features....................................................... 74 5.2.2 Interaction and its basic use features ......................................................... 79 5.2.3 Efficiency................................................................................................... 81 5.2.4 Effectiveness and its basic use features ..................................................... 82 5.2.5 Satisfaction................................................................................................. 85 5.2.6 Usability of a goal-task .............................................................................. 85 5.3 Website navigation use features........................................................................... 86 5.3.1 Presentation and its basic use features....................................................... 88 5.3.2 Interaction and its basic use feature........................................................... 91 5.3.3 Efficiency................................................................................................... 92 5.3.4 Effectiveness and its basic use feature....................................................... 95 5.3.5 Satisfaction................................................................................................. 95 x 5.3.6 Usability of navigation system................................................................... 96 5.4 Website universal consistency use features ......................................................... 97 5.4.1 Goal-task consistency and its basic use features........................................ 98 5.4.2 Navigation consistency and its basic use features ................................... 102 5.4.3 Website consistency ................................................................................. 105 5.5 Website usability................................................................................................ 106 5.6 User usability requirements specification.......................................................... 107 6 VALIDATION EXPERIMENT ................................................................................... 108 6.1 Introduction........................................................................................................ 108 6.1.1 Design ...................................................................................................... 108 6.1.2 Target websites and test tasks .................................................................. 110 6.1.3 Expert usability evaluation ...................................................................... 114 6.1.4 Traditional user usability testing.............................................................. 115 6.1.5 SUS .......................................................................................................... 116 6.1.6 Think-Aloud Protocol .............................................................................. 117 6.1.7 Pilot study ................................................................................................ 118 6.1.8 Setup ........................................................................................................ 118 6.2 Expert usability evaluation results..................................................................... 123 6.2.1 Expert usability evaluation reports .......................................................... 123 6.2.2 Discussion and sub-conclusion................................................................ 123 6.3 Traditional user usability testing results ............................................................ 127 6.3.1 User performance data ............................................................................. 127 6.3.2 Discussion and sub-conclusion................................................................ 127 xi 6.4 SUS results......................................................................................................... 129 6.4.1 SUS data................................................................................................... 129 6.4.2 Discussion and sub-conclusion................................................................ 130 6.5 QUEST results ................................................................................................... 131 6.5.1 QUEST data ............................................................................................. 131 6.5.1.1 Goal-task usability ......................................................................... 131 6.5.1.1.1 WebCalendar 1.0.5 goal-task usability ................................ 131 6.5.1.1.2 VCalendar 1.5.3.1 goal-task usability.................................. 132 6.5.1.1.3 Some comments................................................................... 133 6.5.1.2 Navigation usability....................................................................... 134 6.5.1.2.1 WebCalendar 1.0.5 navigation usability .............................. 134 6.5.1.2.2 VCalendar 1.5.3.1 navigation usability................................ 138 6.5.1.3 Website consistency ....................................................................... 139 6.5.1.4 Website usability............................................................................ 139 6.5.2 Discussion and sub-conclusion................................................................ 140 6.6 Conclusions and discussion ............................................................................... 141 7 CONCLUSION AND FUTURE WORK..................................................................... 144 BIBLIOGRAPHY........................................................................................................... 147 APPENDIX A TRADITIONAL USABILITY TESTING DATA ............................... 159 APPENDIX B QUEST EXPERIMENT DATA........................................................... 162 xii LIST OF FIGURES Figure 1.1 The methodology illustrated in Waterfall model............................................... 4 Figure 2.1 Structured and fully quantitative definition of usability.................................. 25 Figure 3.1 Two hammers .................................................................................................. 30 Figure 3.2 The homepage of Auburn University TigerMail website ................................ 31 Figure 3.3 The ?amount of time? or the ?speed?? ............................................................ 34 Figure 3.4 The efficiency of route .................................................................................... 34 Figure 3.5 Mental models? schism and the distance adjustment....................................... 40 Figure 3.6 Norman?s ?stages of action? model................................................................. 50 Figure 3.7 Usability hierarchy .......................................................................................... 61 Figure 4.1 The general architecture of WWW.................................................................. 64 Figure 5.1 Goal-task presentation and its basic use features ............................................ 78 Figure 5.2 Goal-task interaction and its basic use features............................................... 81 Figure 5.3 Goal-task effectiveness and its basic use features........................................... 84 Figure 5.4 Navigation and goal-tasks ............................................................................... 87 Figure 5.5 Conceptually-simplified navigation and goal-tasks ........................................ 87 Figure 5.6 Navigation presentation and its basic use features.......................................... 90 Figure 5.7 Goal-task consistency and its basic use features ........................................... 101 Figure 5.8 Navigation consistency and its basic use features......................................... 104 Figure 7.1 Possible relationship between usability and its budgetary impact ................ 146 LIST OF TABLES Table 6.1 Expert usability evaluation report 1................................................................ 124 Table 6.2 Expert usability evaluation report 2................................................................ 125 Table 6.3 WebCalendar 1.0.5 usability testing SUS data................................................ 129 Table 6.4 VCalendar 1.5.3.1 usability testing SUS data................................................. 130 Table 6.5 Composites for WebCalendar 1.0.5 task 1, task 2, task 3, and task 4............. 132 Table 6.6 Composites for VCalendar 1.5.3.1 task 1, task 2, task 3, and task 4 .............. 133 Table 6.7 Comparisons of usability aspects on both websites (Case 1).......................... 133 Table 6.8 Comparisons of usability aspects on both websites (Case 2).......................... 134 Table 6.9 gt P for locating WebCalendar 1.0.5 task 1, task 2, task 3, and task 4............ 135 Table 6.10 gt I for locating WebCalendar 1.0.5 task 1, task 2, task 3, and task 4 ........... 135 Table 6.11 for locating WebCalendar 1.0.5 task 1, task 2, task 3, and task 4.......... 137 gt nav C Table 6.12 Composites for WebCalendar 1.0.5 navigation system................................. 137 Table 6.13 Composites for VCalendar 1.5.3.1 navigation system.................................. 138 Table 6.14 Capability comparisons between the 4 methods ........................................... 141 xiii 1 CHAPTER 1 INTRODUCTION Today, Internet has reached almost every corner on earth and it connects all of us together [1][2]. On the Internet, World Wide Web (WWW) [3][4] has become one of the most powerful and influential Internet applications [5]. Now, just like the air we breathe, WWW is everywhere. WWW has rapidly gained its dominant status in the cyber information and services delivery world by its simplicity, platform-independency, extensibility, flexibility, and versatility. It is hard to imagine the kind of information services or applications that cannot be built on the Web; and except physical objects, it is also hard to imagine the kind of objects that cannot be delivered through it. WWW has become not only an indispensable social mechanism of our society but also an essential daily necessity for most people. By its great impact to people?s living, WWW has changed, to a great extent, the way people think about the computing technology. The importance of the WWW to the proper functioning of the human society is beyond any words can say [1][2][5][6]. 2 WWW consists of tens of millions of Web sites or Web-based applications 1 [7] distributed all over the world. Because of WWW?s significant value to all of us, how to specify website usability requirements and how to evaluate and improve website usability according to its usability requirements specification are big concerns to all the stakeholders. However, currently there exist no good ways to address this issue. To help solve this problem, we propose a website usability requirements specification and usability evaluation methodology that features a structured use-centered quantitative full-life-cycle method. Here, use refers to a real use of a designed task of a website by an end user; use-centered simply stresses the view 2 that because usability issues originate from use, usability study should be not only based on use but also focused on use, and that usability should be engineered for use and evaluated by use. In other words, usability study should be from use, on use, for use, and by use, thus be use-centered. Our approach is: a system?s usability is quantitatively defined in terms of its goal-tasks? 3 usabilities; in turn, a goal-task?s usability is quantitatively defined in terms of its 5 major usability aspects; and further, each major usability aspect is quantitatively defined in terms of its basic use features. In this way, a structured and quantitative usability engineering framework for websites is set up. 1 For convenience, Web sites or Web-based applications will be uniformly referred to as websites in this dissertation. 2 When usability is concerned, in contrast to user-centered, the term use-centered is more appropriate and more accurate: first, usability problems occur during uses rather than on users; and further, use-centered takes into consideration the users, the task, and the interaction between them at the same time. 3 A system can be divided into tasks. Because in an implemented system, each task is designed to achieve a certain goal and each goal is accomplished through a specific task, in this dissertation, the term goal-task is used to simultaneously represent a goal and the activities required to achieve the goal. Goal-task is a basic research object of this usability study. 3 The process of this methodology is: at system analysis stage, after goal-task analysis, each goal-task?s user usability requirements can be assigned by quantitatively specifying the desired value for each of its major usability aspects? basic use features; At the same time, each goal-task?s weight and use frequency in the target system can also be specified according to its relative importance and use frequency in the current system. Then, with the above quantitative specifications obtained, each goal-task?s composite use features and the entire system?s usability can be easily derived through their respective defining formulas. Finally, all the above information put together as a package forms the usability requirements specification for the entire system. It should be recognized that the user usability requirements have equal status with other traditional user requirements, such as user functional requirements. So, at all the other stages of the website?s life-cycle, each time a review or testing is performed, the usability requirements specification should also be tested against to see if it has been satisfied just like functional requirements specification has always been. The only difference between them is the testing methods used, i.e., for the functional requirements specification, the testing method is the traditional software testing; but for the usability requirements specification, the testing method is usability testing by use. Apparently, the user usability requirements specification should be agreed upon between the system analyst and the end user(s) (sometimes, the system procurer in lieu of the end users). The key point to be considered here is its economic, or budgetary, implication, because as quality requirements, the higher the usability requirements are, the more expensive it will be for the target system to satisfy them. This quantitative usability methodology is independent of, and therefore can be seamlessly integrated into, any engineering methodologies, processes, and techniques. For example, this methodology can be integrated into the Waterfall model as is illustrated in Figure 1.1. Figure 1.1 The methodology illustrated in Waterfall model 4 5 ebsite. Apparently, if the proposed methodology?s website usabilit In order to prove the validity of the proposed methodology, a validation experiment has been designed and conducted. The principle of the validation experiment is to prove that the proposed methodology has stronger website usability evaluation capability than the following 3 most typical existing usability evaluation methods: expert usability evaluation, traditional usability testing, and SUS (System Usability Scale) [41]. Here, website usability evaluation capability contains the following 3 aspects: overall website usability evaluation, usability comparison between websites, and usability problem diagnosis for a w y evaluation capability is established, then its usability metrics can be used to quantitatively specify upfront user usability requirements for websites. The entire validation experiment was a double-blind and multi-control-group design. In the validation experiment, 7 usability experts and 20 student subjects were recruited to perform 4 tasks on 2 open source calendar websites, the WebCalendar 1.0.5 and the VCalendar 1.5.3.1, which were hosted locally. 4 sets of usability data had been collected, which were corresponding to the 4 usability evaluation methods respectively. According to the results of the validation experiment, it can be concluded that the proposed methodology has been validated. The details of the entire validation experiment are presented in Chapter 6. Although the topic of this dissertation is focusing on how this methodology can be applied to website usability engineering process, the approach of defining a structured 6 of websites that are critical for understanding our approach to website usability study. Chapter 5 presents the entire set of structured and fully quantitative website use feature definitions, and illustrates how to use these use features to specify upfront user usability requirements for websites. Chapter 6 introduces the design and setup of the validation experiment of the methodology, presents the 4 sets of experiment data that are corresponding to the 4 usability evaluation methods, compares and fully quantitative usability framework for websites can also be applied to any other human-tool interaction systems. The difference lies in the specific use features that have to be considered for a particular kind of human-tool interaction system. The advantage of this kind of quantitative usability framework is that no matter what particular kind of human-tool interaction system it is applied to, all the resulted usabilities are comparable with each other. In other words, the usability of a hammer can be compared with the usability of a website. Unfortunately, any further discussion of this expanded topic is beyond the scope of this dissertation. The rest of this dissertation is organized as follows. Chapter 2 presents the literature review, compares the proposed methodology with related work, and highlights the potential contributions of the proposed methodology. Chapter 3 defines the concept of use feature, explains the mental model schism theory, identifies the existing problems that the proposed methodology intends to solve, and presents the principle of the proposed methodology. Chapter 4 introduces the architecture of World Wide Web, and points out some important features 7 the 4 sets of experiment results and concludes the validation experiment of the methodology. Chapter 7 concludes the research, gives more discussion about the methodology, and points out future w te set of traditional usability testing experiment data is given in Appen te set of QUEST experiment data Appendix B. ork. The comple dix A, and the comple is given in 8 es of humans and devices so that they could interact most effectively, efficiently, and safely. s became widely used since the early 1980?s, a new discipl CHAPTER 2 BACKGROUND 2.1 The history of research in usability Usability is about how effectively, efficiently, and easily things can be used by human beings. Research in usability has a long history. In its early stage under the terms like Ergonomics [8] and Human Factors [9], research in usability was mainly concerned with how to match the physical capabiliti After computer system ine or inter-discipline called Human-Computer Interaction (HCI) [10] emerged to specifically take on the issues related to the interaction between humans and computers. Compared to traditional Ergonomics and Human Factors, HCI stresses on how to match the mental and physical capabilities of humans and computers. The research scope of HCI covers the intersection of the disciplines such as Human Cognition, Human Perception, Human Intelligence, Anthropometry, Biomechanics/Kinesiology, Sociology, Philosophy, Behavioral Science, Computer Science and Software Engineering. Closely related to HCI, in 1986, the term Usability Engineering was coined to only name the 9 2.2 Usa ot very usable. Researchers [11][12][13][14][15] discovered that software usabili subset of the research in usability that specializes in usability of computer systems, especially software systems. It should be noted that, with the time going and technologies advancing, all the terms mentioned above have taken on new meanings. Because the evolving history of these disciplines is beyond the concern of this dissertation, this chapter will only focus on usability engineering, or more specifically, website usability engineering. The rest of this chapter is organized as follows. Section 2.2 briefly reviews the general accomplishments in usability engineering. Section 2.3 focuses on the achievements in website usability engineering. Section 2.4 contrasts our work with other related work and describes the potential contributions of the proposed methodology. bility engineering In the early 1980?s, software usability became a big concern in software engineering because people found out that there were many software products that were simply n ty problems were caused by designers who took a computer- and/or designer-centered view and were not considerate for their end users. So very soon, user-friendly [16][17][18] became a buzzword in the computer technology community. But in order to be more accurate and stress the shift of focus from computers and designers to end users, the term user-friendly was banned in favor of user-centered in 10 ], Dix [33], Quesenbery [34][35], etc. In 1998, ISO 9241-11 [36] defined usability definition, its details will be further presented and evaluated in C n the usability definitions mentioned above, a variety of usability metrics [20][37][38][39][40][41][42] have been suggested. Hornb?k provided a comprehensive review of current practices in measuring usability [84], which will be further introduced in 2.4.1 to show the overall usability of the evaluated product are User Centered System Design by Norman and Draper (1986) [19], and this practice has been broadly accepted ever since [20]. Different definitions of usability for software systems have been given by Miller [21], Shackel [22][23][24][25], Bennet [26][27], Sheiderman [28][29][30], Nielsen [20], Bevan [31], L?wgren [32 usability as: ?The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.? It is the ISO 9241-11 usability definition that has been recognized as authoritative and has been widely adopted. Because our methodology is deeply related to ISO 9241-11 hapter 3. Based o . There are many techniques [20][38][39][43][44][45][46][47] for usability evaluation, inspection, and testing. Usability evaluations carried out during a development cycle in order to improve the usability of the product under development are called formative usability evaluations; usability evaluations carried out at the end of a development cycle in order 11 called ation of a sing glob To beca part tech z z z z I universal usability was p al usability will be met when affordable, useful, and usable technology accommodates the vast ajority of the global population: this entails addressing challenges of technology variety, user diversity, and gaps in user kno d ucational, corporate, and government agencies.? [52] summative usability evaluations. Also, usability evaluations ?involving evaluating parts or aspects, either as a means to an overall evaluation or without the final synthesis? are called analytic evaluations [48]; usability evaluations aiming at ?the alloc le score/grade/evaluation to the overall? usability of the evaluated product are called al evaluations [48]. It is very expensive to perform strict traditional usability evaluation and testing. solve this problem, in 1989, Nielsen [49] proposed discount usability engineering, use it was found that 80% of usability problems could be detected with 4 to 5 icipants [50][51]. Discount usability engineering is based on the following 4 niques: User and task observation Scenarios Simplified thinking aloud Heuristic evaluation n order to extend usability to as many user groups as possible, pro osed. It is believed that ?univers m wle ge in ways only beginning to be acknowledged by ed 12 sical engineering, which am unts to specifying, quantitatively and in advance, what characteristics and in to be engineered is to have. This process is followed by usability spe uct fulfills those needs. If we cannot me 5. With all the above research achievements in software usability being systematically put together [20][38][39][53][54][55][56][57][58], Usability Engineering as a discipline was formally established. In 1986, Good et al. [59] defined that: ?Usability Engineering is a process, grounded in clas o what amounts the final product actually building the product, and demonstrating that it does indeed have the planned-for characteristics. Engineering is not the process of building a perfect system with infinite resources. Rather, engineering is the process of economically building a working system that fulfills a need. Without measurable cifications, there is no way to determine the usability needs of a product, or to measure whether or not the finished prod asure usability, we cannot have Usability Engineering. Usability Engineering has the following steps: 1. Define usability through metrics, 2. Set planned levels of usability, 3. Analyze the impact of design solutions, 4. Incorporate user-derived feedback, and Iterate until the planned usability levels are achieved.? 13 es have their own feature Because poor usability is costly and good usability can mean increased revenue, usability engineering is cost-justifiable [20][26][38][39][49][60]. User-Centered Design (UCD) [19][53][54][55][56][57][58][59][66] is currently the main methodology adopted in usability engineering to address the software usability issue. 2.3 Website usability engineering Websites are mainly web-based software. Because websit s (See Chapter 4 for more details) that separate them from traditional software and the number of existing and potential websites is huge, special efforts [39][61][62][63][64] [65][66] have been made to specifically address the usability issues of websites. Heuristic usability evaluations guidelines for websites [67][68][69][70] have been developed. [67] and [70] are two such examples. In [67], Keevil collected a heuristic checklist that was organized into ?usability categories or metrics?. A designer or end user can choose from it the categories and items that are believed applicable to a target website, and then ask each of those chosen items as a question and answer ?Yes? or ?No? according to his/her experience of the target website. The total number of ?Yes? divided by the total number of chosen items is the target website?s ?Usability Index? (in percentage). Similarly, in [70], Nielsen suggested a list of 113 heuristic guidelines that are focusing on usability of website homepages. A designer or end user can choose from it 14 ively for no-compliance, partial-compliance and full-compliance according to his/her experience of the target website?s homepage, and nes: if ?the usability website?s homepage is in ?good shape but ma [78] developed a simulat the heuristic guidelines that are believed applicable to a target website?s homepage and count each one as ?0, 1/2, 1? respect then divide the final count by the total number of applicable guideli compliance rate? is greater than 80%, the target y need a few minor fixes?; if between 50 to 80%, ?bad enough and start a redesign?; if less than 50%, ?abandon it and start over from scratch?. Automatic website usability evaluation tools [71] have also been developed. These tools can track a user?s time, pages requested, errors occurred, response time, and traffic information, etc. They are most effective in navigation analysis, webpage level usability evaluation, and standards and guidelines review. The suggested webpage level usability metrics can be found in [72][73][74][75][76]. Chi et al [77] ing system to simulate a real user?s navigating behavior based on information scent on the pages of a website, so the usability of a website can be evaluated without having to use a real user. This approach is skeptical, because usability is more of a user experience issue than just following links and counting the number of clicks. For the same reason, automatic website usability evaluation tools should only be used to facilitate usability evaluation but never to substitute user-based usability testing. Websites inherently fit for remote usability testing [79][80][81][82][83]. World Wide Web Consortium (W3C)?s website [4] provides information about Federal and other 15 2.4.1 C following three a web accessibility standards, evaluation tools, filter and transform tools, repair tools, markup validator and other validators. 2.4 Related work and our methodology?s potential contributions urrent practice in measuring usability Whether or not having a complete, systematic, and reasonable set of quantitative metrics has long been considered as an indicator of an academic discipline?s maturity. In this regard, usability engineering should be of no exception. But on this front, it has to be admitted that usability researchers have encountered big challenges. This fact can be clearly seen in [84]. In order to have a better understanding of the research findings in [84], it is necessary first to have some basic knowledge about the current usability defining frameworks on which the existing usability metrics are based. There are presently three major usability defining frameworks from which most of the existing usability metrics have originated. The first one, which is also the most influential one, is the ISO 9241 standard for usability [36], which suggests the spects of usability to be measured: effectiveness (which is further defined by accuracy and completeness), efficiency, and satisfaction; The second one is Shneiderman?s usability definition [28][29][30], which recommends measuring time to learn, speed of performance, rate of errors by users, retention over time, and satisfaction; The third one is Nielsen?s usability definition [20], which recommends measuring 16 s will fall short. Because of this problem very hard to link each particular usability problem discovered to ess. In other words, the practicing process needs t over its life-cycle, the overall usability between differen luded that measures of the quality of outcome of interaction were used in only 16% of the studies; measures of interaction process had not been given separate learnability, efficiency, memorability, rate of errors, and satisfaction. Although each of the above three major usability defining frameworks claims that it defines usability, they differ in what usability aspects or dimensions usability consists of and how different usability metrics are categorized into corresponding usability aspects or dimensions. With these definitions? differences being put aside, it?s not difficult to find out that practices that strictly follow them will suffer in two aspects. Firstly, direct and specific measurements of the usability of interaction proces , in practice it is specific part of a particular interaction proc o be more formative. Secondly, none of these definitions define what the overall usability of a target system is, and, to what extent and in which way each usability aspect affects the overall usability of the target system. Because of this problem, in practice, the overall usability of a target system t systems or different versions of the same system cannot be meaningfully compared. In other words, the practicing process needs to be more summative. The research by Hornb?k in [84] chose ISO 9241 as its foundation. Hornb?k reviewed 180 usability studies that were published in core HCI journals and proceedings in recent years as to how the different usability measures or metrics were used in them. He critically conc 17 attentio ability, and thu aluation methods that can be used to evaluate and compare usability across system n; measures of usability over time were very rare; the measurement of satisfaction seemed to be in a state of disarray; and ?despite more than 20 years of research into usability, current practice in measuring usability suggests that choosing usability measures is difficult?. He further suggested that some of the above problems, for example, the lacking of measures that focus on the quality of outcome of interaction and interaction process, originated from the limitations of the ISO 9241 standard for us s, the ISO 9241 standard need to be improved. 2.4.2 Measuring usability in a single score As stated in 2.4.1, a big problem with following the existing three major usability defining frameworks is that the overall usabilities of a target system over its life-cycle, the overall usabilities between different systems or different versions of the same system cannot be meaningfully compared. In order to solve this problem, it is necessary to combine the different usability aspects into one single overall usability measure. This practice is called summative usability evaluation. Although there are few universal and convincing summative usability ev s, in this section, several existing summative usability evaluation methods will be briefly reviewed. 18 , they cannot identify specific usability problem on users? perception In [73], Babiker et al presented a metric for evaluating usability of hypertext systems. First, their hypertext usability metric was based on three attributes that were common in any hypertext system: access and navigation, orientation, and user interaction. Further, each of the three attributes was computed based on user performance time, key stroke time, and error rate. Finally, the overall usability ? the metric ? was computed through a weighted formula to combine the three attributes into a single measure. As introduced in 2.3, Keevil [67] proposed a method to assess website usability, and Nielsen [70] proposed a method to assess the usability of a website?s homepage. The two methods roughly assess usability based on their respective heuristic guidelines. There are many questionnaire-based methods, such as SUMI [85][86], CSUQ [87][88], CUSI [89][90], MUMMS [91], PSSUQ [92][93], QUIS [94][95], SUS [41], WAMMI [96], etc., that claim to be able to assess the overall usability of a system by a single measure based on users' perception of the usability of the system. Some of these questionnaires are free, but others are commercial and require a license to use. A common problem with the usability questionnaire-based methods is that except providing a subjective global assessment of system usability s. McGee [97] proposed a usability measurement method called Master Usability Scaling (MUS), which was based on Usability Magnitude Estimation (UME) [98] and Master Scaling [99]. UME is a subjective measure of usability based 19 rovide the users an objective usability definiti ate them into one single usability measure, the statistics unit sigma ated usability metric? through an equa of usability. In practice, first, usability engineers p on; then, according to this definition, the users make ratio usability estimates in terms of the usability of reference tasks; and finally, an averaging procedure is used to normalize the ratio usability estimates and form a single ratio scale of usability. In order for all the ratio scales of usability to be comparable across practices, the objective usability definition and the reference tasks used should be consistent among all practices. Sauro et al [100] proposed a method to ?simplify? all usability aspects into ?a single, standardized, and summated usability metric (SUM)?. In order to solve the problem that different usability aspects are currently measured on different scales, which makes it difficult to summ (?) from Six Sigma is used as the universal unit for all scales. Now that all the different usability aspects are now expressed in sigma as standardized ?quality level? percentages (Z-scores), the different usability aspects are deemed not only comparable with each other but also combinable into one single ?summ l-weighted scheme. For the same reason, the SUM values of different systems are deemed to be comparable with each other [101]. In [102], Gupta and Gilbert proposed a Speech Usability Metric (SUM) to evaluate the usability of spoken language systems. The SUM metric is actually a weighted scheme to combine some usability aspects, for example, user satisfaction, accuracy, task completion time, etc., into a single usability measure. 20 bility o acro mos he issue of how to enable end users to of v is dealt with in UCD usab more of a philosophy or principle than just a methodology. in th upfr usab perf here, a task is defined as ?clear, precise, repeatable instructions?) of a system as benchmark tasks. Example benchmark tasks include common tasks (i.e., those 2.4.3 Usability in User-Centered Design (UCD) The accomplishments of the usability measurement and evaluation practices reviewed in the above two subsections are very limited. There are two reasons for this comment. First, most of these practices are only aiming at how to evaluate the usa f a single system, and their usability evaluation results normally are not comparable ss random systems or even between different versions of the same system. Second, t of these practices have not attempted to address t specify upfront usability requirements for a system. From usability engineering?s point iew, the latter is a bigger problem. In this subsection, how this problem UCD will be examined. As stated before, UCD is currently the main methodology in usability engineering. emphasizes users? center role in software engineering process and incorporates ility engineering activities into the traditional software life-cycle. UCD is said to be Indeed, in contrast to the usability measuring and evaluation practices introduced e prior two subsections, UCD does try to base some of its usability evaluation on ont specified usability requirements for a system. In dealing with the problem of user ility requirements specification, it defines the representative and frequently ormed tasks ( 21 ta miss system are identified, they will be used a s, and ?avoiding errors?; e, learnability, retainability, and initial impression; ber titions of failed commands?, ?number of times user , or etermine a metric?s target level, ilable) in an existing system, or a prior version of the sam sks that are 20% in number, but account for 80% usage in a system), and business- or ion-critical tasks. Once the benchmark tasks of a s usability ?measuring-instruments?. For each benchmark task, its ?interaction design? usability requirements are to be specified in terms of the following aspects: z Usability goal: The high-level objectives for a user class in terms of usability and design of user interaction, for example, ?walk-up-and-use? for new users, ?power performance? for expert z Usability aspects: The general usability characteristic to be measured, for example, initial performanc z Metrics: The values to be measured, for example, ?time to complete task?, ?num of errors?, ?frequency of help and documentation use?, ?time spent in errors and recovery?, ?number of repe expresses frustration or satisfaction?, and ?number of commands, mouse-clicks other user actions to perform task?; z A metric?s baseline level: The starting point to d normally coming from the level of user performance of the same or similar measuring-instrument (if ava e system, or a competitor system, or even from trying out some users on early prototype; 22 the above approach to specifying user usability requirements for benchm ser performance of the same or similar z good that the system can be used as z stem? z behind each metric?s different levels? z z the q unfo z A metric?s target level: The minimum acceptable level of user performance, usually an improvement over the metric?s baseline level. To say the least, ark tasks is questionable. There are many reasons for this comment, for example: z Why should the level of u measuring-instrument in an existing system be used as the starting point to determine a metric?s target level? Is the usability of the existing system already so a model system? What is the exact relationship between the target system and the existing sy z How much improvement a metric?s target level should be made over its baseline level? And why? What exactly is the budgetary implication z How thoroughly can the chosen metrics measure the usability of a benchmark task? What is the overall usability of a benchmark task? How much will a metric?s particular improvement affect the overall usability of its respective benchmark task? Because all the above questions have not been addressed appropriately in UCD, uantitative usability goals set forth for the benchmark tasks can only be said to be an unded guesswork. In fact, UCD also admits that the bottom line of usability 23 requ argu usab ith the UCD approach as well, for example: ely said that they are user usa as its usability def not im irements specification is that ?this is not an exact science? [103][104][105]. But this ment should not be made as the justification for using some guesswork as purported ility requirements. Actually, besides the above problems, there are some other problems w z The specified usability requirements cannot be legitimat bility requirements because real users normally do not understand them. In fact, in practice it is not real users who specify them. z Because UCD directly adopts the major usability defining frameworks inition, UCD suffers the same problems as stated in subsection 2.4.1. z If UCD directly adopts the summative usability evaluation techniques as have been introduced in subsection 2.4.2, inevitably the problems with those summative evaluation methods will still be present. z Except the attempt to specify limited usability requirements for benchmark tasks, all other usability issues in a target system are tackled through iterations of sorts of usability reviews, evaluations, and testings by involving end users and/or usability experts. This is not to say that these techniques do not work, or they are portant. Instead, it is just to say that this approach will wrongfully subject the usability of a target system only to the good-will and/or good-luck of designers and usability experts rather than to a contractual user usability requirements specification 24 in terms o f the goal-tasks of the system. In turn, the usability of a goal-task quantitatively defined in terms of the following 5 major usability aspects: use teraction process interface and presentation aptness (presentation, for short), use teraction process aptness (interaction, for short), efficiency, satisfaction, and ffectiveness. Further, each major usability aspect is quantitatively defined in terms of its asic use features. In this way, as shown in Figure 2.1, a usability engineering framework set up, and a structured and fully quantitative definition of usability is established. In this framework, the usability of a system, the usability of a goal-task, and the 5 ajor usability aspects of the usability of a goal-task are all called composite or derivative ssively in that is specified upfront by end users and has to be tested against at the end of a development project. In fact, we believe that, it is this kind of poor practices that have caused the situation that the end users have to grapple with many usability problems in many existing systems and this situation is totally unacceptable. 2.4.4 Potential contributions of the proposed methodology The proposed methodology may solve the above problems. Its principles, details, and validation experiment will be presented in the following chapters. In this subsection, only its main features and potential contributions will be briefly described. In the proposed methodology, the usability of a system is quantitatively defined f the usabilities o is in in e b is m use features of the system. They are derived, or built up, succe reverse Figure 2.1 St order starting from the basic use features. Among the 5 major usability aspects, presentation and interaction focus on the quality of use interaction, with the former focusing on the quality of the intera resen the quality of the choreography of resource-consumption of use; satisfaction serves as a catch-up b other general usability facets th usability aspects, for exam usefulness of a content, etc. Apparently details) is clearer and more prac Goal- Task 1 Usability Presentation Interaction Basic Use Feature 1 Basic Use Featur ction interface and p 25 ructured and fully quantitative definition of usability the interaction process; efficiency effectiveness focuses on the quality of ag to capture users? feelings at are hard to define and not captured by the other 4 major ple, the users? feelings about the quality o , this framework (See Chapters 3, 4, and 5 for tical than the vague ISO 9241-1 System Usability Goal-Task 2 Usability ? ? ? ? ? ? Efficiency Satisfaction e 2 Basic Use Featur? ? ? ? ? ? ? ? ? ? ? ? tation and the latter on focuses on the quality of outcome of use; and about the quality of all the f a content or the 1 usability definition. Goal-Task t Usability Effectiveness ? ? ? ? ? ? e f 26 mparable with no conver and summative but also both analytical and global. This methodology is also discount usability engineering friendly and scalable. This is supported at least in the following two ways. First, the usability of a system can be estimated by measuring the usability of some selected tasks in the system and then scale the usability results up to the entire-system level. Second, a system?s usability engineering practices can be done incrementally and over time. So, this methodology fits any project in terms of scale and budget situation. In this methodology, the value of each use feature is expressed as a ratio (in percentage) to measure the perfectness of the use feature (100% = the best, and 0% = the worst), so the values of all use features are inherently and naturally normalized. They are directly comparable with each other not only for the full life-cycle of a product but also across any kinds of products. For example, the usability of a website and the usability of a hammer can be easily compared without any confusion. In other words, this methodology makes the usability of any products inherently co sion ever needed. Further, because each basic use feature works like a usability problem probe probing directly into every aspect of use, the usability of a product is directly linked to the root of each usability problem of the product. In other words, this methodology also makes the usability of a product very diagnostic or analytical. All in all, it is fair to say that this methodology is not only both formative 27 The most important contribution of this methodology is that it makes it possible for end users to be able to specify upfront, natural, easy to understand, and contractual usability requirements for a target asic use features. Because of this capability, user us t only become a her kinds of user requirements specification for uarantees that the desired user usability requirements will eventually be satisfie system via its b ability requirements specification for a system has no reality but also gained equal status with ot the system. This g d just like other kinds of user requirements have always been. Apparently, the core ingredients that have made all the above potential contributions possible are the new concept of use feature and how the perfectness of a use feature in terms of usability is quantified or measured. Among all use features, it is especially worth noting that the new definition of the use feature efficiency in this methodology is unique. For details of the above, see Chapter 3. 28 CHAPTER 3 PRINCIPLE OF THE METHODOLOGY 3.1 Use features It makes sense that whenever you begin to talk about the usability of a tool, first you must specify the context of its use, the goal for which it is to be used, and how you would expect or want it to be used to achieve the goal. The context of use defines the characteristics of the users and the organizational and physical environments of use. The goal of use defines the intended outcome of use. The ?how you would expect or want it to be used? defines what features of the tool you expect or want you could make use of, i.e., the interface and capability presentation of the tool, and in what possible procedural orders you expect or want you could make use of those features, i.e., the interaction choreography or implementation of the tool. For example, both humans and lions have hands or paws, but the human hands and lion paws are normally used in different contexts, for different goals, and with different presentations and interactions. Apparently, the usability of either the human hands or the lion paws is very good in their own contexts of use with their own presentations and interactions to meet their own goals, but probably not vice versa. 29 ny feature of a tool that is essential or significant for the tool?s use is called a use feature of the tool. A use feature that does not consist of other use features is called a basic use feature. A use feature that consists of other use features is called a composite or derivative use feature. A tool can only be used through the use features it provides. In order to understand the concept of use feature, let?s examine some use feature examples of some familiar tools. Figure 3.1 illustrates two hammers. Apparently, the two hammers are intended to be used in different contexts and for different goals, and they have different presentations and interactions. Definitely, for each hammer, its context of use, goals of use presentation, and interaction ar all these features are essential for its use. But among these use features, the presentation and the interaction of each hammer are both composite use features because they both consist of other use features. For example, the presentation of a hammer consists of at least such component basic use features: the hardness of its hitting surface, the size of its hitting surface, the shape of its hitting surface, the weight of its head, the length of its handle, the shape of its handle, and the stiffness of its handle. It should be pointed out that not every feature of a hammer is a use feature of the hammer. For example, for some aesthetic effects, the hammer on the right in Figure 3.1 has some funny pictures on its head and also some color patterns on its handle, but because these features are not essential or significant for this hammer?s use, they are not this hammer?s use features. A , e all its use features because Figure 3.1 Two hammers Figure 3.2 shows the homepage of Auburn University TigerMail website. Like the 2 hammers, this homepage also has context of use, goals of use, presentation, and interaction use features, because all these features are essential for its use. For its presentation, at least the following component basic use features can be identified: the theme ratio (i.e., the ratio between the displayed space occupied by the theme of a page and the total displayed content space of a browser); the number of misleading or confusing items; the number of items that have bad readability; the number of distracting items; the number of items that have inappropriate layout or grouping; the number of items that have inconsistent appearances or properties; the number of necessary but missing methods; the number of links that cannot be easily identified to be links; the 30 31 number of links that do not follow visitation color-coding; the number of links that are ality of page help. Apparently, all these component use features are essenti broken; the qu al for the usability of the homepage?s presentation. Figure 3.2 The homepage of Auburn University TigerMail website It should be pointed out that, when the usability of a tool is at concern, besides the 4 top level use features of the tool mentioned above (i.e., context of use, goal of use, presentation, and interaction), the other 3 top level use features of the tool, i.e., effectiveness, efficiency, and satisfaction, also have to be considered. Effectiveness is the 32 Satisfaction is users? feeling about t 3.2 Efficiency In software usability studies, efficiency of a task is normally considered as the mount of time spent on the task by users. But in our opinion, this definition of efficiency controversial. In order to enlighten this issue up and make it right, in this subsection, ases will be analyzed and our new definition of efficiency of use will be presented. When measuring the efficiency of a task (or use of a task) is at concern, naturally, btaining either the ?absolute amount of time spent on the task by users? or the ?speed? i.e., the average achievement per unit of time with which users finish the task) seems to e the right way to go. Actually, this is exactly the case in most existing software sability studies, especially the former one. accuracy and completeness with which users can achieve specified goals by using the tool. Efficiency is the resources that have to be expended in relation to the accuracy and completeness with which users achieve specified goals. he freedom from discomfort when using the tool and the degree of users? positive attitude toward the use of the tool. Among all these top level use features, while the context of use and the goal of use delimit the boundary of the discussion of the usability of the tool (i.e., the context of use can be considered as a pre-condition of use, and the goal of use can be regarded as an ideal post-condition of use), the rest of them form the body of the definition or evaluation of the usability of the tool. a is c o ( b u 33 Let?s first take a look at how the ?absolute amount of time spent on the task by sers? approach fares. Let?s assume each task as a straight route literally. In Case 1 on the ft of Figure 3.3, let?s assume the 2 users, User1 and User2, travel at the same speed v. ser1 is supposed to travel through the AB route that has length L and User2 through the D route that has length 10L. Apparently, if User1 takes time t from A to B, User2 needs me 10t from C to D. Then, which one is more efficient, User1 with time t or User2 with me 10t? Apparently in this case, the ?absolute amount of time spent on the route? cannot e used to tell which one is more efficient, because User1 and User2 have traveled through 2 different routes with different lengths (i.e., 2 different situations) respectively. This ap u le U C ti ti b proach sounds silly, but it has long been widely used to measure and compare task efficiency. In fact, it is not difficult to tell that both User1 and User2 have the same efficiency, because they have traveled at the same speed. If the above scenario for Case 1 is not good enough to tell the truth, let?s change the scenario a little bit: Let?s assume everything else is the same except that User2 would travel at speed 10v. In this new scenario, apparently both User1 and User2 will take the same time t to reach their respective destinations. So, which one is more efficient, User1 with time t or User2 also with time t? In this new scenario, the ?absolute amount of time spent on the route? approach still sounds silly. In fact, it is easy to tell that User2 is 10 times more efficient than User1. Figure 3.3 The ?amount of time? or the ?speed?? L Figure 3.4 The efficiency of route Now that the ?absolute amount of time spent on the task by users? approach does not work, then m User1, speed = v A B se 1 L User1, speed =10 v AB Case 2 User2, speed = v 10L User2, speed = v C D User2, speed =2 v Case 3 L User1, speed = v User2, speed =2 v ????????? A B L User1, speed = v AB User2, speed = v L Case 4 CD User1, speed = v C easuring the ?speed? mi 34 ght be the right way to go. Case 2 on the right of Ca 35 from more efficient, User1 with time t or User2 with time 10t? Indeed, in this case, the ?speed? approach seems to have worked perfectly both from the view of the ?absolute amount of time spent on the route? and from the view of the ?speed?. The reason why the ?absolute amount of time spent on the route? approach also seems to have worked in Case2 is because only the same single route is at concern here and there is no intention to compare it with any other routes. Although the ?speed? approach seems to have worked perfectly in Case 2, unfortunately, performing a task is not really the same thing as traveling along a route. It is really difficult to quantify the achievement, or the achievement per unit of time, of a task-performing. Perhaps this is the reason why this approach has rarely found use in existing software usability studies. But if examined further, it can be found that, even if the quantification of achievement of a task-performing were not an issue, the ?speed? approach has actually measured the efficiencies of wrong targets, i.e., the efficiencies of users instead of the efficiency of route. Then, which one should have been at concern in the first place, the efficiencies of users or the efficiency of route? Certainly, it should have been the latter rather than the former. In this new light, suddenly it is not difficult to see that the efficiencies of users are not relevant any more, because the same route should Figure 3.3 illustrates how the ?speed? approach might work. In Case 2, let?s assume User1 and User2 travel at speeds 10v and v respectively, along the same route AB that has length L. If User1 takes time t A to B, then User2 needs time 10t. So, which one is 36 ute ACB. Let?s define always have the same efficiency no matter who is riding on it. In other words, in Case 2, even though User1 is 10 times faster than User2, the efficiency of route AB is the same for both of them. From this case, we should recognize that it is always the usability of tools rather the ability of users that should be evaluated in a usability study. So, we need to figure out how to measure the efficiency of route instead of trying to measure the efficiencies of users. Case 3 on the left of Figure 3.4 illustrates how the efficiency of a route could be measured. Let?s assume there are 2 routs from A to B, one is AB with length L, and the other is ACB with length 2L. 2 users, User1 and User2, travel at speeds v and 2v respectively and each will travel along the 2 routs once at a time from A to B. If User1 takes time t via route AB, then s/he needs time 2t via route ACB. Apparently, User2 needs time 0.5t via route AB and needs time t via ro efficiency of route as T TT w ? . Here, T is the total amount of time spent, T is w the amount of time wasted that has been imposed upon the users by the route. Then for both User1 and User2, the efficiency of route AB is 100% and the efficiency of route ACB is 50%. In fact, for whatever users, the efficiency of route ACB is always half of the efficiency of route AB, because route ACB always forces the users to travel double the length of route AB. Actually, our new definition of efficiency of route can be applied to any cases or scenarios. If it has been applied to all the scenarios in Case 1 and Case 2 presented above, the efficiencies for both routes AB and CD in Case 1 and for route AB in Case 2 all can 37 wing how our new definition of efficiency of route works. In Case 4, User1 is supposed to travel from A ed to travel from C to D via route CD. Route AB can be , and its efficiency of route is 100%. In contras be found to be 100%. Apparently, all the route efficiencies obtained this way are directly comparable with each other regardless the lengths or curvatures of the routes. Case 4 on the right of Figure 3.4 illustrates a generic scenario sho to B via route AB and User2 is suppos considered as a straight route; route CD can be theoretically considered as a straight route but with many crossroads along the way. On route CD, all along it and at each of the crossroads, there is no sign telling any directions or giving any hints. Let?s also assume that User2 has no idea that s/he can reach D by traveling the entire way straight forward from C. In other words, route CD is in fact a labyrinth. Although routes AB and CD have the same theoretical length L, the users? experiences travelling along them would be different. Straight route AB does not impose any difficulty on its users, so the users would not experience any wasted time t, for route CD, because of its bad usability, users would experience much wasted time that is imposed on them. In order to find the efficiency of route CD, we need to use Think-Aloud Protocol [125] to help identify all the wasted times (See 6.1.6 for a brief introduction). The following kinds of wasted times along route CD can be expected: 1) at each crossroad, the time wasted on determining which direction to take next; 2) the time wasted on taking detours; 3) the time wasted on forming unnecessary loops. Because of such imposed wasted times, the efficiency of route CD is less than 100%. 38 ve is actually our new definiti In other words, all the wasted times of use are identifiable and justifiable. 3.3 The origin of usability problems Why exist there usability problems? In our opinion, usability problems originate from the mental model difference between designers and end users of a product. A mental model is simply a person?s view of something experienced, its function and the person?s expectance about it. Everybody forms a mental model about everything experienced, and for a variety of reasons, rarely two persons would form exactly the same mental model about one thing. The difference between two different mental models is called mental Just as stated at the beginning, a route in the above discussion is actually a figurative representation of a task or use of a task. So if route is substituted with task or use, the definition of the efficiency of route presented abo on of efficiency of task or use. The unique advantages of our new definition are that, first, it is true efficiency of task; second, if followed, all the resulted efficiencies of any tasks are guaranteed to be directly comparable with each other regardless the kind and size of a task. Although identifying the wasted times of a task or use of a task seems daunting, on a high note, it is definitely doable. Keep in mind that all the tasks of any man-made tools are intentionally designed to be as efficient as possible, so how each task should be done or implemented should never be like a blackbox, or a labyrinth, or even a mystery. 39 odel schism. In this dissertation, only the mental model schism between the designers and the end users of a product is of our interest. In their relationship, the role of a designer and the role of an end user of a product are not equal. A designer designs a product that is to be used by an end user; an end user has to use a product designed by a designer. The designer?s mental model of the product, also called conceptual model, is a model of the product that the designer wants the end user to understand, and the product is simply a concrete embodiment of the designer?s mental model of the product. The end user?s mental model of the product is forced to match the designer?s conceptual model in order for the end user to be able to understand and use the product. Unless the designer is also the end user, there would exist a mental model schism between the designer and the end user of the product. It is this mental model schism that has caused the product?s usability problems. Let?s call the width of a mental model schism the distance between the two mental models at concern. Figure 3.5 illustrates both the relationship between the mental model of a designer and the mental model of an end user of a product and the change of e distance over the product?s usability engineering process. It is fair to say that the bigger is the n should be that th o mental models overlap as much as possible. Unfortunately, because the m th distance, the bigger are the usability problems. The ideal situatio e tw cognitive and psychological mechanism behind a mental model is still not well known, exactly what has caused the mental model schism is not clear. Therefore, the distance and then correcting the produc Mental End User?s Mental Designer?s Dis Mental models? schism and the distance adjustm between two m distance can be indirectly m Model Model Design Expectance tance The original mental models and the distance Mental Designer?s Mental End User?s Design Expectance Distance Report Specification Usability Testing Report User Usability Requirements Specification The mental models and the distance after testing-correction and adaptation Figure 3.5 40 t?s usability problems reported. For exam ental models cannot be directly measured and shortened. Fortunately easured and s ed by testing end us ent ing the product , the ers ushorten Model Model ple, after 41 testing true meaning through trials and err , if a user is working on a long list, and e from list imp of u com prob soft -correction, the distance as shown in the upper part of Figure 3.5 can be shortened to the distance as shown in the lower part. In Figure 3.5, it is also shown that an end user?s mental model can somewhat adapt to the designer?s mental model. In other words, the end user learns to understand the designer?s mental model. But this adaptation should not be expected to be much. This phenomenon simply reflects the fact that an end user?s familiarity with a product can improve his/her perception of the usability of the product over time, but real and hard usability problems can not be compensated or eliminated just through the end user?s familiarity with the product. For example, a fuzzy label for a button may be misleading or confusing at the beginning, but after a user understands its ors, the usability problem caused by the fuzzy label may become negligible to some degree for that particular user. In contrast very time after the user has performed some operation on an item that is a little far away the beginning, the user is automatically brought back to the very beginning of the (let?s say to the very top of the first page of that list), sooner or later this osed-upon usability problem may very well drive the user up a wall. The former kind sability problems are soft-cored usability problems, which usually are tolerable and pensatable by end users; the latter kind of usability problems are hard-cored usability lems, which usually are intolerable and incompensatable by end users. But both -cored and hard-cored usability problems are real usability problems. 42 adap men ?s mental model. An extreme but good example is mod is c man mod exam men tool z fin directly measured and shortened, the distance can only be indirectly measured and shortened by testing ng the product and then correcting the reported usability problems acc measure the distance case by case; correcting is to shorten the distance case by case. Meanwhile, it should be noted that the above mentioned end user?s mental model tation phenomenon also reflects the fact that sometimes the end user even must make tal model transition toward a designer that end users? old paper-based application mental model is forced to transition to ern computer-based application mental model when their old paper-based application omputerized. Because of the computerization of the old paper-based application, y originally non-existent or impossible concepts and operations in the old mental el now become existent and possible in the modern mental model. But this kind of ples should never be used to justify that designers can count on forcing end users? tal models to transition to solve real usability problems. In order to eliminate, or at least alleviate, the usability problems of man-made s, the following points need to be stressed: It should be the end users, rather than the designers, who have the center role and the al say on the usability of a designed product. As stated above, because the distance between the mental models of designers and end users cannot be end users usi ordingly. In other words, the usability of a product can only be revealed by testing end users instead of just being calculated through some formulas. Testing is to 43 at end users will really have the final say on the usability of the pro tated in 3.3, in order to eliminate, or at least to alleviate, the usability problem Because of this reason, end users must be guaranteed to have the center role and the final say in the software engineering process (certainly including the usability engineering process) 4 of a product. z In order for end users to have the center role and the final say on the usability of a product, they also must have the first say on the usability of the product. In other words, an upfront contractual user usability requirements specification is necessary in making sure th duct. In fact, if end users do not have the first say, designers can act like they have gotten a carte blanche from end users in the beginning and do not have to worry about being held accountable in the end. It should have long been realized that besides the mental model schism problem, the immunity or amnesty on the usability of products provided to designers by this kind of practices has been a major source of bad usability for many products today. Now, it is time for this loophole to be closed. Actually, it is not difficult to see that the points emphasized above are consistent with the philosophy or principle advocated by User-Centered Design. 3.4 The solution As s s of a designed product, an upfront contractual user usability requirements 4 In our opinion, usability engineering should always be part of software engineering instead of being a stand-alone discipline as it is now, and each software engineer should also be a usability engineer or expert. 44 ject, usability require upfront how it should be used; second, usabilit specification for the product is the solution. But, are user usability requirements valid user requirements? Intuitively, the answer is ?Yes!?. Life experiences tell us that it should be considered wrong to begin designing and building a house without first knowing how the inhabitants would like to use it. Just imagine the difficulty and the mess that the inhabitants of the house may have to overcome to make things right after the fact (if it is ever possible). Unfortunately, such an answer cannot be found in the current textbooks of software engineering. Although software engineering emphasizes on the importance of accurate acquisition of user requirements at the very beginning of any pro ments have rarely been considered as valid user requirements that need to be collected from users at the beginning of a project and then tested against in the end. The void of methodology for dealing with usability requirements in regular engineering doctrine seems to have its reasons: first, since a product to be built does not exist, it is hard for its future users to specifically demand y issues seem to be subjective and they are hard to be described objectively and quantitatively. So, it seems impossible for usability requirements to be specified in such an objective and quantitative manner that they can be tested against to see if they have been met. In fact, the predicament in dealing with usability requirements in software engineering has made ?make it work first, then make it better? a practical guidance for 45 ents are extremely important to the success of any project and it must be carefully dealt with in the very beginning, otherwise the undertakers of the ned lesson applies to all user require ctual requirement that can h the relentless ef software . We believe that the only way out of this usability predicament is that end users are enabled to specify upfront contractual usability requirements in an many practitioners. Most practitioners believe that after making a product work first, they can make it better later. But, can this ?practical guidance? really work in reality? This doubt can be justified by the following reasoning. It is well known that accurate and complete user requirem project will be punished heavily later. This hard-lear ments. So, if user usability requirements are supposed to be valid user requirements, the above practitioners are doomed to have a big trouble in the end! In our opinion, this is exactly the situation all the practitioners have been facing. Covering up the inability of the existing engineering methodologies on usability issues would not make usability issues disappear. What we need is a methodology that can uncover the usability issues and make good usability not just an undetermined gift from the developers but a users? rights that is guaranteed via a contra be implemented, verified, and satisfied. In fact, because of its contractual power, user functional requirements specification has been a successful controlling factor on the quality assurance of software products. Software functional issues have been solved pretty well throug forts in software engineering thus far. It is time for software engineering to take on the usability issues 46 ex the role played in the ?old? software en ca product 3. A re-exam of usability before we proceed further. As mentioned in Chapter 2, diffe 241-11 [36] gave ou Goal intended outcome. plicit and quantitative manner. Just like gineering by functional requirements specification, in this new usability assurance mpaign, an upfront contractual user usability requirements specification for any is the solution. 5 Problems with the existing definitions of usability fter we have talked so much about usability, it is wise for us to take a break to ine the definition rent definitions of usability for software systems have been given by Miller [21], Shackel [22][23][24][25], Bennet [26][27], Sheiderman [28][29][30], Nielsen [20], Bevan [31], L?wgren [32], Dix [33], Quesenbery [34][35], etc. In 1998, ISO 9 t its own definition of usability. Now, it is the ISO 9241-11 definition of usability that has been recognized as authoritative and become widely adopted. There are problems with the existing definitions of usability. In this dissertation we only focus on evaluating the problems of ISO 9241-11 definition of usability. It should be noted that the major conclusions about ISO 9241-11 definition of usability also apply to other existing definitions. The ISO 9241-11 defines usability as: Context of use: characteristics of the users, tasks and the organizational and physical environments. : 47 Task ss that have ca : activities required to achieve a goal. Effectiveness: the accuracy and completeness with which users achieve specified goals. Efficiency: the resources expended in relation to the accuracy and completeness with which users achieve specified goals. Satisfaction: freedom from discomfort, and positive attitude to the use of the product. Usability: The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use. As pointed out below, in this definition, there exist ambiguities and vaguene used usability problems for itself in practice. We think the ISO 9241-11 definition needs to be improved or extended in at least the following five major aspects. Firstly, the ISO 9241-11 definition does not differentiate between the goal and task of designers and the goal and task of end users. According to the mental model schism theory presented in section 3.3, we believe that it is important to make a clear differentiation between the two in the definition of usability and the focus should be on the goal and task of the intended end users. The designers can also have goal and task in mind, but their goal and task should try to match as closely as possible those of the intended end users. There is no doubt that the bigger is the difference between the two, the more severe will the resulted usability problems be. 48 r its life-cy st racy and completeness with which users achieve Secondly, the ISO 9241-11 definition does not specify how to measure the effectiveness, efficiency and satisfaction; also, it does not specify how to combine these measures into a single aggregate measure of usability for the entire system. Because of this problem, in practice, it is impossible to quantify usability. The big downside of the inability to quantify usability is that if you cannot measure it, then you cannot manage and control it. In other words, the usabilities of a system ove cle, the usabilities between different systems or different versions of the same sy em cannot be meaningfully compared; and also, there is no way to determine to what extent and in which way each specific usability aspect affects the overall usability of a system. Thirdly, the ISO 9241-11 definition defines efficiency as an absolute amount of ?resources expended in relation to the accu specified goals?. As discussed in 3.3, we believe that this is not a good way to define efficiency, because efficiency defined in this way is not comparable across tasks and it does not provide any insight into the quality of the amount of resources expended on a task by users. Here, let?s examine this issue in detail. We can assume that for each absolute amount of resource expended, there are at least two portions: one portion that is rightfully expended in relation to the accuracy and completeness with which users achieve specified goals; the other that has been wasted but imposed upon users by an awkward design. Take time as an example. It does not make much sense to measure the absolute amount of time expended on a task as efficiency of that task. The reason is that any work needs some time to finish, and depending on the complexity of the work, the time needed can be very long or very short. It does not matter how much time has to be expended, but it does matter how much time in the total is rightfully expended. Let?s assume the total amount of time expended is T , the portion of T that is the rightfully expended is n T (time necessary), the other that is wasted (because of mistakes or awkwardness imposed by design) is w T (time wasted), then, wn TTT ?= . As a measure of efficiency of time expended on a task, T T n makes much more sense than T , because: first, it measures the efficiency of task; second, it is comparable across tasks, no matter how big or small a task is; and third, it provides us insight into the quality of total time pent on a task. So, we believe efficiency should be defined as a ratio between a part and e total instead of just an absolute total amount. Fourthly, the ISO 9241-11 definition does not pay explicit, direct, and specific ttention to measuring the usability of a goal-task?s human-tool interaction process and s interaction interface. Actually, from Norman?s ?stages of action? model [111], which illustrated in Figure 3.6, it is obvious that the choreography of the interaction process nd the presentation (including feedback presentation) of the interaction process are two ey components of a successful interaction design. So, if only effectiveness, efficiency, nd satisfaction are to be measured as defined in the ISO 9241-11 definition, then the pecific usability problems related to interaction process and interaction interface cannot be directly reflected usability evaluation s th a it is a k a s in the usability evaluation. This will make the 49 50 too abs Figure 3.6 Norman?s ?stages of action? model Goals interpretations (Specifications) the perception the action sequence Perceiving the state of the world tract and empty. We believe that the quality of interaction process and the quality of interaction interface need to be directly included in the definition of usability, because: first, they are the real sources of most usability problems; second, they determine the easiness of use; and third, they have much to do with users? cognitive feeling about a goal-task. Actually, from usability engineering process?s point of view, the usability evaluation stage of a goal-task is also the right time and place to expose detailed usability problems related to the goal-task?s interaction process and interaction interface. In fact, interaction design has so much to do with the usability of a product that some usability researchers began to call it user experience design [106][107][108][109]. Intention to act Evaluation of Sequence of actions Interpreting Execution of The world 51 e hand, usability is based o al correct ability of a system as a usability hierarchy: The usa Lastly, the ISO 9241-11 definition does not state anything about the relationship between functional correctness and usability. We believe that, on on n functional correctness of a system, because without functional correctness, a system is not usable at all; on the other hand, a functionally totally correct system can be rendered totally useless by improper usability design. So, in the engineering process of any system, the following 3 points are important: first, there should be usability requirements just like there have been functional requirements; second, usability requirements should be as important as functional requirements have been; third, usability requirements should also be tested against in the end to see if they have been satisfied just like functional requirements have always been. Meanwhile, functional correctness and usability are two totally different aspects of a system. Being functionally correct is a precondition of usability but not part of usability. In other words, function ness should be out of concern in usability study. So, for each system, functional requirements and usability requirements are apparently orthogonal to each other. 3.6 Principle of the methodology In our opinion, any system consists of its goal-tasks, and so does its usability. In fact, as shown in Figure 2.1, we define the us bility of a system consists of the usabilities of its goal-tasks; In turn, the usability of a goal-task consists of its 5 top level composite use features: presentation, interaction, 52 the top level composite use features. We have already discussed the concept of use efficiency, satisfaction, and effectiveness; Further, each of the 5 top level composite use features consists of its corresponding component basic use features. Here, it should be pointed out that the above presentation of a goal-task is actually the part of its presentation that only concerns the specific semantics of the goal-task. But, in any system that consists of more than one goal-task, the presentation of each goal-task must also conform to a set of system-level presentation consistency rules that have nothing to do with the specific semantics of any particular goal-task but are critical to the universal look-and-feel and usability of the entire system. In contrast, the part of the presentation of the goal-task that only concerns its conformance to the set of system-level presentation consistency rules is called the aptness of use universal consistency (consistency, for short). Because consistency and presentation are actually two facets of the presentation of a goal-task, they share their top level composite use feature status. Also, it should be pointed out that, in any system, there is always a special system goal-task that is to be used by end users to locate each available end user goal-task in the system. This special system goal-task is called the system navigation (navigation, for short). Although system navigation is unique in many ways, it is still considered as just another end user goal-task in the system, because it is always the first goal-task that end users have to use when they use the system. Now, it is time for us to consider how to choose the basic use features for each of 53 following two rules need to be considered. is necessary. Apparently, all the interface items on the homepage are compo mail accounts. According to the definition of use butt prop the ?log os app inte one l li perc app acco feature in 3.1, but when we try to determine the appropriate basic use features for a composite use feature, the The first rule: each basic use feature should represent the collective quality (in percentage) of all interface items of a goal-task in a corresponding usability aspect. Let?s use the homepage of Auburn University TigerMail website in Figure 3.2 as an example to explain why this rule site use features. Take the ?login? button for instance. It is a feature that users have to use when they try to login into their e feature, the ?login? button is a use feature of the homepage. At the same time, the ?login? on has many usability aspects, such as: if it has proper layout or grouping; if it has er labeling; if it has proper size; etc. Each of these usability aspects is significant for use of the ?login? button. According to the definition of composite use feature, the in? button is also a comp ite use feature of the homepage. In fact, this observation lies to all the other interface items on the homepage 5 . Normally, there are many such rface items in any goal-task, and it does not make much sense to just consider each of them individually. So instead, we on y consider the collective qua ty (in entage) of all interface items of a goal task in a corresponding usability aspect as an ropriate basic use feature. For example, the presentation of the ?login into email unt? goal-task can have such basic use features like: percentage of interface items e of the interface items that are seemingly unrelated with the homepage are also the use features of the homepage 5 Som in the sense that they are essential or significant in influencing or affecting the use of the homepage, for example, distracting end users' attention from their current goal-task, or messing up the theme of the homepage, etc. 54 that oper prob show usab easurement (as quality level of use 1. have improper layout or grouping; percentage of interface items that have impr or misleading labels; etc. The second rule: each basic use feature should focus on one aspect of usability lems. Our purpose is to identify and evaluate usability problems. Just as already n in the above examples, when determining basic use features, we only focus on ility problems. It should be pointed out that the percentage m feature) has the following advantages: It can be used to measure the distance between two mental models. According to the mental model schism theory, usability problems are caused by the mismatches between designers? mental model and end users? mental model. Let?s assume, in a goal-task, the total number of involved items is n , and because of the mismatches of the two mental models, m items present some aspect of usability problem to end users. Then, for that aspect of usability problem, its distance can be expressed as n m , which means m among n items present that aspect of usability problem. In contrast, its usability can be expressed as n mn? or n m ?1 , which means mn? among n items do not present that aspect of usability problem. In other words, n m ?1 measures the quality level of the goal-task in that aspect of usability. Apparently, the total distance between two mental models of the goal-task can now be considered as the aggregation of the distances of all the basic use features of the 55 2. 3. ange meant ore or goal-task. Based on this observation, the usability of a goal-task is defined at the end of this section. It can be used as a meaningful severity indicator without having to refer to the total amount involved. For example, it is reported that the fire accidents caused by rats account for 25% of total fire accidents. This report makes perfect sense by just using a percentage number rather than a total number to represent the severity of the fire accidents caused by rats. In fact, in this case, a total number, even if possible, makes much less sense than just a percentage number. It can be used to compare the quality level of things both over time and across kinds. For example, when Dow Jones Index was at 100-point level, a 3-point ch a 3% up or down from that level. Now, let?s assume Dow Jones Index is at 10,000-point level, then a 3% move will mean a 300-point up or down. It does not make any sense to compare Dow Jones Index?s daily moves or performances over time in absolute number of points. In contrast, its percentage moves compare meaningfully. Meanwhile, all the markets around the world are now known to be interrelated with each other. Because each market has its own absolute point level, the correlation between the markets can only be manifested by using their percentage changes on a particular day. For example, if China?s Shanghai Stock Index made a 5% up move on a Friday, the Dow Jones Index would very probably make a m less similar move the same day (considering the time difference). When it comes to 56 s in all aspects of usability; while, another go e efforts (costs) to achieve the same usability level are dif re s i ut t t an sense whe re market, an absolute item amount, i.e., the scale of the goal-task, only makes sense when it acco 4. It is intuitive and easy to understand. For example, it is easy for both designers and users to understand the meaning of a usability requirement like ?for goal-task gt1, no more than 10% of interface items can have misleading or confusing labels?. Because a complete set of use features cover every usability aspect of a goal-task, users can easily specify upfront usability requirements in such a form: a desired quality level (in percentage) for a specific usability aspect of a particular goal-task. Hence, the upfront user usability requirem ore will no er 5 usability, the percentage measurements of use features reflect usability levels in such a normalized way that they can be directly compared without conversion. For example, a goal-task with 10 items can have 90% usability by making 9 of its 10 items match end users? expectation al-task with 100 items can also reach the same usability by making 90 of its 100 items match. Although th ferent, both of them can now be known as having the same level of usability gardles of the r kinds and sizes. It should be pointed o ha , just as absolute point level of Dow Jones Index only makes n it signifies the scale of the enti unts for the total efforts needed to build the goal-task. ents specification predicament mentioned bef t exist any more. The detailed definitions of all the use features for websites and how to use them to specify usability requirements are presented in Chapt . 57 5. F ample, diff m s of a goal-task may very probably have different interface items or different numbers tion can be specified to meet the same level of usability (although definitely you can specify different ones if you really want to). The following is the structured and fully quantitative definition of the usability framework presented above. Let?s assume It is independent of design and implementation. or ex erent designs or i plementation of interface items, but each design or implementa P is the top level composite use feature presentation of a goal-task, , , ? ? ?, are 1 P 2 P k P P ?s k component basic use features, and , ? ? ?, are these basic use features 1 P w , 2 P w k P w ? weights respectively, 10 ?? i P and 10 ?? i P w for i = 1 ? ? ? k, and , we define: 1 1 = ? = k i P i w i i P PwP i ? k = Sim ?= 1 1 (3-1) ilarly, let?s assume I is the top level composite use feature interaction of the goal-task, , ? ? ?, 1 I , 2 I h I are I ?s h component basic use features, and 1 I w , 2 I w , ? ? ?, are these basic u e features? weights respectively, k I w s 10 ?? i I and 10 ?? i I w ? hfor i = 1 ? ? , and 1 1 = ? = I i i I i =1 (3- h i w , we define: h IwI ? ?=1 2) i 58 Similarly, let?s assume E is the top level composite use feature efficiency of the goal-task, , , ? ? ?, are 1 E 2 E q E E ?s q component basic use features, and , , ? ? ?, are these basic use features? weights respectively, 1 E w 2 E w q E w 10 ?? i E and 10 ?? i E w for i = 1 ? ? ? q, and 1 1 = ? = i ? =1 (3-3) Similarly, let?s assume is the top level composite use feature satisfaction of the goal-task, , ? ? ?, are ?s m component basic use features, and , a , q i E i w , we define: E ?=1 i q i E Ew S 1 S , 2 S m S S 1 S w 2 S w , ? ? ?, w re these basic use features? weights respectively 0 m S 1? S ? i and for i = 1 ? ? ? m, and = ? 10 ?? i S w 1 1= m S i w , we define: i iS SwS i ? m i= ?= 1 1 (3-4) Similarly, let?s assume R (short for Results) is the top level composite use feature fectiveness the goal-task, 1 R , 2 R , ? ? ?, n R are ef of R ?s n component basic use features, and w ?, w are t ese basic use features? weights respectively, 1 R , ? ? h and for i = 1 ? ? ? n , we define: (3-5) 2 R w , n R 10 ?? i R 10 ?? i R w , and 1 1 = ? = n i R i w i n i R RwR i ? = ?= 1 1 59 S el composite use feature consistency of imilarly, let?s assume gt C is the top lev the goal-task, 1 C , 2 C , ? ? ?, v C are gt C ?s v component basic use features, and 1 C w , 2 C w , ? ? ?, v C w are these basic use features? weights respectively, 10 ?? i C and 10 ?? w for i = 1 ? ? ? v, and 1= ? v w , we define: v C 1=i C i i v i Cgt CwC i ? = ?= 1 1 (3-6) Let?s assume gt U is the usability of the goal-task, P w , I w , E w , and S w are the weights of presentation ( P ), interaction ( I ), efficiency ( E ), and satisfaction ( S ) respectively, 1,,,0 ?? SEIP wwww and 1=+++ SEIP wwww , we define: RSwEwIwPwU SEIPgt )( +++= (3-7) (3-7) means: 1. gt U will be 100% only if P , I , E , S , and R all are 100%; 2. If R <1, then R is a discount factor of gt U (especially, if R = 0, then gt U = 0). Let?s assume is the usability of the system that consists of t goal-tasks. For e t goal-tasks, their respective usabilities are , , ? ? ?, , consistencies are , , ? ? ?, , and weights are , , ? ? ?, , with for i = 1 ? ? ? t and . Also let?s assume the system navigation has usability , U th 1 gt U 2 gt U t gt U 1 gt C 2 gt C t gt C 1 gt w 2 gt w t gt w 10 ?? i gt w 1 1 = ? = t i gt i w nav nav U 60 onsistency , and weight . Assume is the weight for the combined sability of the t goal-tasks as a whole, nav C nav w gt wc u 10 ?? gt w , then gtnav ww ?=1 . Assume is e consistency of the entire system. We define: (3-8) And in (3-8), (3-9) 3-8) means that the overall comprehensive usability ( ) of the system is a composite se feature that combines all the usabilities of its goal-tasks and navigation together, and en takes the consistency of the system into account as a discount factor 6 . C th CUwUwwU navnav t i gtgtgt ii )( 1 += ? = navnav t i gtgtgt CwCwwC ii += ? =1 U( u th Figure 3.7 illustrates the refined usability hierarchy. 3.7 More thoughts on the proposed methodology As stated before, this structured and fully quantitative usability framework can be applied to any human-tool interaction systems. Their differences lie in the specific basic use features that have to be considered for a particular kind of human-tool interaction system. The advantage of this kind of quantitative usability framework is that no matter what kind of human-tool interaction system it is applied to, all the resulted usabilities are comparable with each other. In other words, the usability of a hammer can be easily 6 The reason why consistency of system is used as a discount factor is because bad consistency severely affects the overall usability of any system, and it has no reason to exist at all. 61 website. Our concern in this dissertation is to apply it to the usa As a first endeavor to provide a structured, fully quantitative, and full lifecycle usability engineering framework, this methodology is still at its infancy stage, so all aspects are open for improvement. Because the set of quantitative usability equations presented in 3.6 are subject to optimization and evolution according to their uses in compared with the usability of a bility engineering of websites. System Usability (U ) Figure 3.7 Usability hierarchy Goal-Task 1 Usability ( gt U ) 1 ? ? ? ? ? ? Presentation ( P ) ? ? ? ? ? ? Basic Use Feature 1 ( P ) 1 ? ? ? ? ? ? Goal-Task t Usability Navigation Usability ( gt U ) t ( nav U ) Consistency gt C ) ? ? ? ? ? ? ( Interaction ) ( I Efficiency ) Satisfaction ) Effectiveness ( E ( R( S ) Basic Use Feature k ? ? ? ? ? ? ( P ) k 62 n he ab quan l (Version#). system with 95% usability using version 1.0 of QUEST can be noted as: Usability: 95 (QUEST v1.0), or Usability: 95% (QUEST v1.0). A good analogy of this methodology is the methodology adopted for the evaluation of credit worthiness of people: a QUEST number is like a credit score; the structured and fully quantitative definition of usability is like the structured and fully quantitative definition of credit worthiness; the usability testing report is like the actual credit worthiness data collected. Like a credit score, although a sole quantitative usability value of a system is meaningful already, it cannot tell it all. The best way to publish the usability information of a system is to list the following contents in a structured way: the usability value of the system along with its all or at least the major use features; the listed use features? values; their respective allocated weights; and the usability problems associated with each listed use feature. This practice will serve well for the system?s usability engineering purpose. As mentioned before, this methodology is discount usability engineering friendly and scalable. One of the techniques is goal-task grouping, i.e., similar goal-tasks can be practice, in order to avoid any future confusion, we will give a version number to each set of quantitative usability equatio s. T ove set of quantitative usability equations can be named Quantitative Usability Equations SeT version 1.0 (QUEST v1.0). The titative usability value of a system should be stated along with the QUEST version number. The format can be like Usabi ity: U For example, a 63 grouped together so that only the group level usability evaluations and the group weights will appear in the system level QUEST. In each group, all or selected goal-tasks? usabilities will be evaluated, and a valuation is the simple average of the usabilities of go hnique is randomly or selectively sampling, i.e., only random or selected goal-tasks? usabilities will ld then be scaled up to the entire system level. The sca to respectively, and we only choose to sability test gt1 and gt2. After usability testing, we get their usability evaluations: for gt1, and for gt2. We can then assume that the entire system just consists of these two sks, with their new weights of group?s usability e al-tasks evaluated in the group. Another tec sampling be evaluated. Their usability results wou ling up process can be done like this: let?s say there are 10 goal-tasks in a system: gt1 to gt10 with their weights 1gt w 10gt w 1 gt U 2 gt U goal-ta 21 1 gtgt gt ww u w 21 2 gtgt gt ww w + + and respectively. 64 ATURES OF WEBSITES CHAPTER 4 SOME FE 4.1 The general architecture of WWW Generally speaking, WWW is a Client/Server Model-based application built upon Internet as illustrated in Figure 4.1. Figure 4.1 The general architecture of WWW The basic information unit on the WWW is a hypertext document marked up in Hypertext Markup Language (HTML) [122], which is often simply called a webpage. Normally, a webpage contains the following information: content, page layout information, content presentation information, and hyperlinks. Among them, the 65 hyperlinks are what make the WWW as the Web we know. A hyperlink is described by a RL) [123], which, besides containing other kinds of inform and formed a worldwide web of infor ation upon the Internet. n request dynamically generated by, som Universal Resource Locator (U ation, denotes the address of other information resource on the Internet, such as the address of another webpage, the address of some multimedia information resource, or even just a place within the document itself. Through using URL as hyperlink, all the Internet resources become globally addressable and are contained within a universal addressing space. It is in this simple way that almost all the computerized information resources all over the world have been connected together m Normally, all the webpages are stored in, or upo e website hosted on some web server located somewhere around the globe. Now, there are tens of millions of websites distributed all over the world. To use the web, a web user, via a web client, normally a web browser located in a local computer (in this dissertation, we are only interested in web browsers as web clients), connects to a web server and requests a webpage; the web server returns the webpage, and the web browser presents or displays it to the web user. Over the years, many client side and server side web technologies have been developed, which keep the web technology evolving at a whizzing speed, overwhelming even many professional web application developers. Some technologies just come and go, but some stay over time. Among those useful extensions is the three-tiered or n-tiered 66 beyond the sco 4.2.1 Unification of functional services and contents Generally speaking, no matter what its purposes are, any website is a nonlinear composition of functional service items and browsing items that are presented on webpages and linked together through hyperlinks. Here, a functional service item means a complete piece of functional service; a browsing item means a complete piece of content. Different from traditional applications that strictly differentiate between application architecture. From website usability?s point of view, as to what web technologies to use, on the client side, apparently maximum cross-browser supportability should always overrule. In this spirit, we assume cross-browser supportability is pe of this dissertation. On the server side, regardless of the technologies used, webpage request response time should always be our top concern. 4.2 Some features of websites Nowadays, websites have become the major means of information and services delivery over the Internet. Most websites have been built mainly for two purposes: 1. information publication and retrieval; 2. Web-based functional services (applications) delivery. Compared to traditional software, websites have distinctive features. Because these important unique features are critical to understanding our approach to website usability engineering, we will introduce them one by one. 67 d documents (i.e., data), websites do not distinguish betwee ems can be given a at can be used by sk of a website. In contrast to normal CICI?s, the navigation system of a website is a unique means commands (i.e., functions) an n functional service items (i.e., implementations or presentations of commands or functions) and browsing items (i.e., implementations or presentations of documents or contents) at all. Both kinds of items are presented in the same way as a series of web pages. Because of this phenomenon, on the surface, a functional service item and a browsing item are not that different on a website. This distinctive feature of websites is called the all-purpose composability 7 of the World Wide Web, which makes websites extremely flexible and has proved to be a major beauty and strength of websites. As a result, on the Web, both functional service items and browsing it unified term: Conceptually Independent Composing Item (CICI) (read as kick), which means they are conceptually independent, complete, and indivisible. Typical instances of CICI?s are things like: a complete online article or book, a complete web-based transaction, etc. Abstractly, each CICI consists of a series of webpages that are put together for a purpose. On each of its webpages, besides its presentation, a CICI can be associated with methods, which are normally presented as links or buttons, th users to operate on it. In essence, a CICI is simply a designed goal-ta designed goal-task that is solely for gluing the entire website together and providing a of navigation between and beyond the CICI?s of the website to end users. 7 It should be kept in mind that abusing the ?all-purpose composability? of a website can severely damage its usability. 68 and ephemeral navigation. Fixed n 4.2.2 Contentized navigation The navigation system of a website is analogous to the menu system of traditional software. Because of the World Wide Web?s ?all-purpose composability?, a website?s navigation (organization) architecture can often be so contentized or expanded that the traditional clear distinctions between navigating items (menus) and data (real contents) become blurred or even disappeared. For example, each ?menu? of a website can be a very descriptive or verbose webpage, which resembles or even mingles in the presentation of real content. Even so, the main purpose of a website?s navigation is still to provide an efficient means of reaching the CICI?s of the website to end users. There are two flavors of navigation: fixed navigation avigation means each CICI of a website can be directly reached through the website?s main navigation. Ephemeral navigation means some CICI?s can only be reached through the links embedded in other CICI?s. Ephemeral navigation by nature is context-dependent and easy to get lost. Because ephemeral navigation can cause severe usability problems, it should be avoided altogether or be replaced by short-cuts. An extreme example of contentized navigation on the Web is the sitemap. 4.2.3 Extensive utilization of short-cuts Because of the World Wide Web?s ?all-purpose composability? and the rich presentation space of each webpage, visualized short-cuts are extensively used on the 69 , i.e., it is not guaranteed that a particular short-cut would be there when it is needed; on the other hand, they have the advantage of efficient navigation, i.e., they can extremely shorten the reaching distance of the referenced CICI?s. If properly used, short-cuts can provide important alternative methods to efficiently navigate on the WWW. A good usage of short-cuts is to easily provide immediate cross-referencing between CICI?s. But just as anything good, abusing short-cuts can also adversely affect the usability of a website. Although short-cuts and ephemeral navigations look similar, it is important to understand their difference. Short-cuts are intended to provide pure convenience of reaching the referenced CICI?s efficiently, and they are redundant alternative navigation methods with no intention to be part of the regular navigation of a website. In contrast, ephemeral navigations provide accesses to some CICI?s in such an obscure way that the referenced CICI?s are conceptually disconnected from the regular navigation of a website. 4.2.4 High dynamicity and unchanging usability expectance Websites are extremely dynamic. Some websites can be updated many times a day. The user populations of websites can also be very dynamic: the kinds of users of a WWW. A short-cut is a redundant alternative navigation method that is provided outside the regular navigation and embedded in some webpage as a convenient way to efficiently reach some CICI on or off the current website. Compared to regular website navigation, short-cuts, on one hand, have the drawback of uncertainty 70 website are simply unpredictable; a specific user may only be interested in a specific small portion or topic of a website; and, some users may only visit a specific website once for their lifetime. However, unchanging usability expe walk-up-and-use for everybody is a default and ctance for almost all websites. 71 5.1 G Desig d for end users to achieve by the designers. Desig End U goal-task is a procedural sequence of steps and Use: execution of a designed goal-task by an end user, and it is Use F CHAPTER 5 WEBSITE USE FEATURES eneral terms ned Goal (G ): Designed goal G is the outcome of a designed goal-task that is intende d d End Users? Goal (G ): End users? goal G is the outcome of a designed goal-task that is anticipated by end users. ned Goal-Task: A designed goal-task is a procedural sequence of steps and actions designed by the designers to be taken by end users to achieve the designed goal. sers? Goal-Task: An end users? u u actions anticipated by end users to take to achieve the end users? goal. Use is an improvised real a human-tool interaction process that consists of a sequence of use steps and actions taken by the end user to achieve the end user?s goal. eature: A use feature of a goal-task is any feature of the goal-task that is essential or significant for the use of the goal-task. A goal-task can only be used through its use features. 72 Basic Use Featu t does not consist of other use feature Composite o derivative use feature is a use feature ponent use feature of a compo y aspect of the composite he comprehensive perfect bility aspect represented by itself. Distance Of the actual value of a use feature in a designed goal-task and the anticipated ideal value pre Desig izational and physical environments that are specified as restrictions of use by the designers. For example, the designed context of use of a (bank account) balance transfer goal-task can be specified as: re: A basic use feature is a use feature tha s. r Derivative Use Feature: A composite or that consists of other use features. While each com site use feature measures the perfectness of a particular usabilit use feature, the composite use feature measures t ness of all its component use features in the usa A Use Feature: The distance of a use feature is the distance between of the use feature by end users, and it is ex ssed as a ratio (in percentage) to measure the imperfectness of the use feature in terms of the use feature itself (100% = the worst, and 0% = the best). Result Of Use ( set R ): Result of use set R is a use feature that signifies the set of items achieved through a use. ned Context Of Use ( d C ): Designed context of use d C is a use feature that signifies the set of quantified or enumerable ranges of characteristics of the end users, the designed goal-task, and the organ 73 gender: male/female; re configuration: all; range of internet connection: all; m Actua set of suffic Satisfied Co contex design If a c characteristic c d C = { users { range of range of age: ? 18; range of language: English; range of level of expertise: all levels; range of permission: registered in the system }; computers { range of hardwa range of operating syste : all; range of browser: any, with 128-bit cipher strength }} l Context Of Use ( a C ): Actual context of use a C is a use feature that signifies the actual values in a use for those characteristics that are in or at least iently implied in the designed context of use. ntext Of Use: When all the actual values of the characteristics in an actual t of use are within the ranges of the corresponding characteristics in the ed context of use, this actual context of use is called a satisfied context of use. haracteristic in the designed context of use is not applicable in a use, that an be regarded as satisfied. 74 5.2 Website goal-tas 5.2.1 Presentation and its basic use features The comprehensive aptne the use. We define the following 9 basic use features for it, each of them measures its imperfectne d users misunderstand it by its label. A ted according to the following rules (unless noted otherwise, these rules apply to e items, and among them, one is k use features presentation composite use feature of a goal-task measures the ss (in percentage) of all the interfaces and presentations involved in ss in one particular usability aspect. 1 , Confusing-Misleading Interface Items Ratio: 1 P is defined as f P 1 , the number of confusing, misleading, or too-constrictive interface items involved in the use, divided by b P 1 , the total number of interface items involved in the use. Note: A confusing interface item means end users cannot understand it by its label. A misleading interface item means en P too-constrictive interface item means it is an input interface item that has a shorter than reasonable input length. The interface items are coun other basic use features where interface-item-counting is involved): Rule 1: Nested interface items, such as radio buttons, selection lists, etc., should be counted by nested computation method, i.e., a whole nested interface item is counted as 1. For example, let?s assum a selection list has 10 member 75 isleading, then th input field labeled ?First name? should be counted Rule 3: Interface items on repeated pages should be counted only once. , In occupied by the theme and the total displayed content space of a browser is the page?s theme-ratio. is defined as , the number of ted otherwise, this rule applies to other basic use features where page-counting is involved). , M ansfers, each is defined as , the number of pages involved in the use that have insufficient methods, divided by the total number of pages involved in the use. confusing or m e whole selection list should be counted as 1/10. Rule 2: An interface item and its label are two separate interface items. For example, an separately from its label. 2 P appropriate Theme-Ratio Pages Ratio: Each page should have a theme. The ratio between the displayed space 2 P f P 2 pages involved in the use whose theme-ratio is less than 65%, divided by b P 2 , the total number of pages involved in the use. Note: Repeated pages should be counted only once (unless no 3 P ethods-Insufficient Pages Ratio: Each page should provide sufficient necessary methods to end users. For example, in a list of submitted banking tr 3 P f P 3 b P 3 , transfer should have methods to view, edit, or delete it. 76 th s pages involved in the use. u , D : is defined as , the number of pages involved in the use that have severely distra , the total number of p , Inap is defined as , the n it he number of pages involved in ck to actions performed on the r of pages involved in the use. ccumulatively. 4 , Memory-Exacting Pages Ratio: 4 P is defined as f P 4 , the number of pages involved in the use at force end user to accurately remember facts from previous pages in order to finish the actions on the current page, divided by b P 4 , the total number of P Note: Repeated pages should be counted accum latively. istracting Pages Ratio 5 P 5 P f P 5 cting extra features, divided by b P 5 ages involved in the use. f P 66 P propriate Layout or Item-Grouping Pages Ratio: 6 P umber of pages involved in the use that have inappropriate layout or em-grouping, divided by b P 6 , the total number of pages involved in the use. 7 , Bad Feedback Pages Ratio: 7 P is defined as f P 7 , tP the use that do not present appropriate feedba previous page, divided by b P 7 , the total numbe Note: Repeated pages should be counted a 77 ch page should provide page level help of pages involved in the use that have l number of pages involved in the use. , Bad Readability Pages Ratio: is defined as , the number of pages involved in umber of pages involved in the use. Let?s assume the 9 basic use features have equal weights. Then, according to rmula (3-1), the aptness of presentation of a goal-task should be: 8 , No/Bad Page Level Help Pages Ratio: EaP methods. 8 P is defined as f P 8 , the number no/bad page level help, divided by b P 8 , the tota 9 P 9 P f P 9 the use that have bad readability, divided by b P 9 , the total n fo i i PP ? = ?= 9 1 9 1 1 (5-1) Figure 5.1 illustrates the relationship between presentation and its basic use features. As an example, to specify user usability requirement for the presentation of goal-task gt1, end users can simply demand that: gt1?s confusing-misleading interface items ratio should be less than 5%; gt1?s inappropriate theme-ratio pages ratio should be less than 5%; gt1?s methods-insufficient pages ratio should be no more than 0%; gt1?s memory-ex gt1?s distracting pages ratio should be less than 5%; gt1?s inappropriate layout or item-grouping pages ratio should be less than 5%; acting pages ratio should be less than 5%; 78 ; gt1?s n gt1?s bad feedback pages ratio should be no more than 0% o/bad page level help pages ratio should be less than 5%; gt1?s bad readability pages ratio should be no more than 0%. Then, according to formula (5-1), we get: %67.96)0% 5% 0% 5% 5% 5% 0% 5% 5%(1 9 1 =++++++++ So, 96.67% is the user usability requirement for the presentation of gt1. ?=P Confusing-Misleading Interface Items Ratio ( 1 P ) Inappropriate Theme-Ratio Pages Figure 5.1 Goal-task presentation and its basic use features Presentation ( P ) Inappropriate Layout or Item-Grouping Pages Ratio ( 6 P ) Ratio ) ( 2 P Methods-Insufficient Pages Ratio ( P ) 3 Memory-Exacting Pages Ratio ) ( 4 P Distracting Pages Ratio ) ( 5 P Bad Feedback Pages Ra ( 7 P ) tio No/Bad Page Level Help Pages Ratio ( 8 P ) Bad Readability Pages Ratio ( 9 P ) 79 5 aptne e following 4 bas usability asp , Mistake tio: is defined as , the number of actions involved in the use that cannot be or cancelled means that the action has already caused a failure. In other words, in order to accomplish the goal-task, the goal-task has , Mista umber of actions involved in th ollowing rules: n reason should not be counted. ounted accumulatively. .2.2 Interaction and its basic use features The interaction composite use feature of a goal-task measures the comprehensive ss (in percentage) of all the interactions involved in the use. We define th ic use features for it, each of them measures its imperfectness in one particular ect. -Error Intolerant Actions Ra 1 I 1 I f I 1 corrected, undone, or cancelled, divided by b I 1 , the total number of possible actions involved in the use. Note: An action means an input action or a command method. An action that cannot be corrected, undone, to be started all over again. Repeated actions should only be counted once. 2 I ke-Error Actions Ratio: 2 I is defined as f I 2 , the n the use that have caused mistakes or errors DUE TO the design, divided by b I 2 , e total number of actual actions involved in the use. Note: Actions are counted according to the f Rule 1: Mistake-error actions due to user?s ow Rule 2: Repeated actions should be c 80 d as , the number of actions c ry, unreasonable, awkwardly designed, ctual actions involved in the use. Note: An action that is unnecessary, unreasonable, awkwardly designed means that , not logical, not necessary, but is forced upon the user by the design. Repeated actions should be counted accumulatively. , Unsuccessful Users Ratio: is defined as , the number of users who cannot finish the goal-task, divided by , the total number of users who have tried to accomplish the goal-task. Let?s assume the 4 basic use features have equal weights. Then, according to formula (3-2), 3 , Imposed-Upon Awkward Actions Ratio: 3 I is defineI f I 3 involved in the use that are unne essa divided by b I 3 , the total number of a the action is out of place or order, not straightforward I 4 I f I 4 b I 4 4 the aptness of interaction of the goal-task should be: i I ? i I = 4 4 1 (5-2) ?= 1 1 Figure 5.2 illustrates the relationship between interaction and its basic use features. As an example, to specify user usability requirement for the interaction of goal-task gt1, end users can simply demand that: gt1?s mistake-error intolerant actions ratio should be no more than 0%; gt1?s mistake-error actions ratio should be no more than 0%; 81 n 0 % Then, according to gt1?s imposed-upon awkward actions ratio should be no more tha %; gt1?s unsuccessful users ratio should be less than 1 . formula (5-2), we get: %75.99)1% 0% 0% 0%(1 4 1 =+++?=I So, 99.75% is the user usability requirement for the interaction of gt1. time that is spent on a goal-task by a user, so efficiency is a basic use feature by itself. As explained in 3.2, in contrast to the old ways, we define the efficiency of a goal-task, Mistake-Error Intolerant Actions Ratio ( 1 I ) Mistake-Error Actions Ratio ( I ) Figure 5.2 Goal-task interaction and its basic use features 5.2.3 Efficiency For website goal-task efficiency, currently we only consider the E , as the ratio (in percentage) between the amount of time expended on the goal-task that is perceived necessary and the total amount of time expended on the goal-task. Let?s assume the actual Interaction ( I ) Imposed-Upon Awkward Actions Ratio ) ( 3 I 2 Unsuccessful Users Ratio ) ( 4 I 82 to impos tal amount of time expended on a goal-task is T , the amount of time wasted that is ed upon the user by the design is w T , then: T TT E w ? = (5-3) It should be noted that the amount of tim wasted on a goal-task that is due to users? personal reasons should be excluded from both parts of the above ratio. In order to identify the amount of time wasted that is imposed upon users by design, Think-Aloud Protocol should be used. As an example, to specify user usabil quirement for the efficiency of goal-task gt1, end users can simply dem nd that: gt1?s efficiency should be at least 95%. 5.2.4 Effectiveness and its basic use features The effectiveness composite use feature of a goal-task measures the comprehensive completeness and percentage) with which users achieve their goals through the use. Theoretically 8 , we define the following 2 basic use features for it, each of them measures its perfectness rticular usability aspect. e ity re a accuracy (in in one pa 8 In practice, the value o ers can assess the effectiveness of a go ir assessments can be used as the value of the effectiveness of the goal-task as if it were computed in the way introduced in this section. In fact, this dissertation takes this practical approach in assessing the effectiveness of a goal-task. f effectiveness can be obtained by questionnaires from end users tested. End us al-task based on their accomplishments of uses, and then the average of the 83 , Re cording to its relative importa ong all the expected items in , and the sum of the weights for all the items in equals 1. Apply the same weight of each item in to its corresponding item in the result of a use: only those items that are present both in and get their weights, other items in get 0 as their weights. Then, equals the sum of weights of all the items in , Result Ac f its value in is less than its value in , divide its value in by its value in , then the result is the accuracy of this item; otherwise, its accuracy is 1. Then, equals the weighted sum of the accuracies for all the items in that are present in both and Note: The weight used for each item?s accuracy is the same as the weight allocated to that item in the definition of Let?s assume the 2 basic use features have equal weights. Because both of them are effectiveness?s positive basic use features, differently from formula (3-5), we define the effectiveness of a goal-task, sult Completeness: For each item x in an end user?s goal u G , assign a weight to it ac 1 R nce am u G u G u G set R u G set R set R 1 R set R . 2 R curacy: For each item that is present in both G and R , i u set set R u G set R u G 2 R set R u G set R . 1 R . , as: R i i RR ? = = 2 1 2 1 (5-4) 84 illustrates the relationship between effectiveness and its basic use feature In practice, the value of effectiveness can be obtained by questionnaires from end users. End users can assess the effectiveness of a goal-task according to their accomplishments of uses. The average of the assessments can then be used as the value of the effectiveness of the goal-task. In this approach, effectiveness is regarded as a basic use feature by itself. We take the practical approach. As an example, to specify user usability requirement for the effectiveness of goal-task gt1, end users can simply demand that: gt1?s effectiveness should be 100%. Figure 5.3 s. Figure 5.3 Goal-task effectiveness and its basic use features Effectiveness ( R ) Result Completeness ( 1 R ) Result Accuracy ( 2 R ) 85 5.2.5 S d positive attitude toward the use. As one of the top 5 major usability aspects of a goal-task, it serves as a catch-up bag to bility facets that are hard to define and not captured by the other 4 major usability aspects, for example, the ntent or the usefulness of a content, etc. d context of use in the fol atisfaction Satisfaction measures the comprehensive degree (in percentage) of users? general feelings of freedom from discomfort in the use an capture users? feelings about the quality of all the other general usa users? feelings about the quality of a co In practice, satisfaction is regarded as a basic use feature by itself and obtained from end users through questionnaires. As an example, to specify user usability requirement for the satisfaction of goal-task gt1, end users can simply demand that: gt1?s satisfaction should be no less than 90%. 5.2.6 Usability of a goal-task Usability of a goal-task ( gt U ) is a composite use feature that measures the comprehensive quality (in percentage) of the goal-task under a satisfie lowing 5 usability aspects: presentation ( P ), interaction ( I ), efficiency ( E ), effectiveness ( R ), and satisfaction ( S ). Let?s assume P , I , E , and S have equal weights. Then, according to formula (3-7), the usability of a goal-task should be: RSEIPU gt )( 4 1 4 1 4 1 4 1 +++= (5-5) P , I , E , , and S R 86 As an example, using the user usability requirements for of goal-task gt1 (see the user usability requirements specification examples in sections .2.1 ~ 5.2.5 for details) in formula (5-5), we get: 5 %36.95%100%)90%9599.75%96.67%( 4 1 =+++= gt U So, 95.36% is the user usability requirement for the usability of gt1. .3 Website navigation use features The navigation system of a website is analogous to the menu system of traditional oftware. Although it is unique in many ways, it is just a designed goal-task that is solely for gluing the entire web aching the CICI?s of the website to end users. Structurally, it is a single-entrance multi-exit functionality. Figure 5.4 illustrates the relationship between the navigation and the normal goal-tasks on a ebsite. In Figure 5.4, the inner nodes are ?sub-menus?, and the leaf-nodes are normal oal-tasks. Conceptually, Figure 5.4 can be transformed into Figure 5.5 to demonstrate e simplified relationship between the navigation and each goal-task. Because navigation is the first goal-task that end users have to use when they use website, its usability is important. Since navigation is just another goal-task, we can still se formula (3-7) to evaluate its usability. But because it is also unique when compared to other normal sks must be customized to fit this unique goal-task?s special situation. 5 s site together and providing a means of re w g th a u goal-tasks, the use features defined above for normal goal-ta 87 Website Figure 5.4 Navigation and goal-tasks Figure 5.5 Conceptually-simplified navigation and goal-tasks Navigation for Goal-Task 1 Goal-Task 1 Goal-Task 3Goal-Task 2 Goal-Task 8Goal-Task 7 Goal-Task 6 a Goal-Task 9 Goal-Task 10 Goal-Task 11 Goa sk 5Goal-Task 4 l-T Website Goal-Task 1 Navigation for Goal-Task 11 Goal-Task 1 Navigation for Goal-Task 2 Goal-Task 2 ? ? ? ? ? ? 1 88 5.3.1 ehensive ptness (in percentage) of all the interfaces and presentations in the navigation system. We measu , ng, misleading, or i e navigation pages involved in the navigation process leading to the desired goal-task, , Inappropriate Theme-Ratio Pages Ratio: is defined as , the number of n lved in the navigation process leading to the desired goal-task. , D involved in the navigation process leading to the desired goal-task that have Presentation and its basic use features The presentation composite use feature of navigation measures the compr a define the following 5 basic use features on a per goal-task basis for it, each of them res its imperfectness in one particular usability aspect on a per goal-task basis. Confusing-Misleading Navigation Methods Ratio: gt P 1 is defined as gt f P 1 , the number of confusi gt P 1 llegible navigation m thods on all the divided by gt b P 1 , the total number of navigation methods on all the navigation pages involved in the navigation process leading to the desired goal-task. Note: Navigation methods on repeated pages should be counted only once. gt P 2 gt P 2 gt f P 2 avigation pages involved in the navigation process leading to the desired goal-task whose theme-ratio is less than 65%, divided by gt b P 2 , the total number of navigation pages invo istracting Pages Ratio: gt P 3 is defined as gt f P 3 , the number of navigation pages gt P 3 89 severely distracting extra features, divided by , the total number of navigation pages involved in the navigation process leading to the desired goal-task. , Inappropriate Layout or Item-Grouping Pages Ratio: is defined as , the number of navigation pages involved in the navigation process leading to the desired goal-task that have inappropriate layout or item-grouping, divided by the total number of navigation pages involved in the navigation process leading to the desired goal-task. , the number of g to e desired goal-task tal number of navigation g to the desired goal-task. eights. Then, according to of navigation in locating the desired goal-task, gt b P 3 gt P 4 gt P 4 gt f P 4 gt b P 4 , gt P 5 , No/Bad Page Level Help Pages Ratio: gt P 5 is defined as gt f P 5 navigation pages involved in the navigation process leadin th that have no/bad page level help, divided by gt b P 5 , the to pages involved in the navigation process leadin Let?s assume the 5 basic use features have equal w formula (3-1), the aptness of presentation gt P , should be: gtgt PP ? ?= 5 1 1 (5-6) i i=1 5 Figure 5.6 illustrates the relationship between presentation and its basic use features. Confusing-Misleading Navigation Methods Ratio ) ( gt P 1 90 Figure 5.6 Navigation presentation and its basic use features As an example, to specify user usability requirement for the presentation of navigation in locating goal-task gt1, end users can simply demand that: The confusing-misleading navigation methods ratio in locating gt1 should be 0%; The inappropriate theme-ratio pages ratio in locating gt1 should be less than 5%; The distracting pages ratio in locating gt1 should be less than 5%; The inappropriate layout or item-grouping pages ratio in locating gt1 should be less than 5%; T lo Then, he no/bad page level help pages ratio in cating gt1 should be less than 5%. according to formula (5-6), we get: %96)5% 5% 5% 5% 0%(1 5 +++?=P 1 =+ gt So, 96% me er) a website consists of t goal- ts respectively, is the user usability require nt for the presentation of navigation in locating gt1. Let?s assume (this assumption holds for the rest of this Chapt tasks, 1 gt w , 2 gt w , ? ? ?, t gt w are their weigh Presentation ( gt P ) Inappropriate Layout or Item-Grouping Pages Ratio ( gt P 4 ) Inappropriate Theme-Ratio Pages Ratio ) ( gt P 2 Distracting Pages Ratio ) ( gt P 3 No/Bad Page Level Help Pages Ratio ( gt P 5 ) 91 ? i gt w =i gt i 1? for i = 1 ? ? ? t, and 1= ? t w . Assume 1 gt P , 2 gt P , ? ? ?, t gt P are 0 1 respectively the presentations of navigation in locating thes e presentation of the entire navigation system, , as: v i i PwP 1 (5-7) 100%. So, 96% is the user usab ion system. 5.3.2 Interaction and its basic use feature The interaction composite use feature of navigation measures the comprehensive aptness (in percentage) of all the interactions in the navigation system. We only define the following 1 basic use feature on a per goal-task basis for it. l Users Ratio: is defined as , the num er of users who cannot lo 1 e goal-tasks. W define the nav P ? = = t i gt gtna If we assume the example website only has 1 goal-task gt1, then its weight is According to formula (5-7), we get: %96%96%100 == ? nav P ility requirement for the presentation of entire navigat gt I , Unsuccessfu 1 gt I 1 gt f I 1 b cate the desired goal-task, divided by gt I , the total number of users who have tried to locate the desired goal-task. Apparently the weight for gt I is 100%. According to formula (3-2), the aptness of interaction of navigation in locating the desired goal-task, gt b 1 I , should be: gtgt II 1?= (5-8) 1 92 ssful users ratio in locating gt1 should be 0%. Then, a - r usa requirement for the interaction of navigation in locating gt1. L As an example, to specify user usability requirement for the interaction of navigation in locating goal-task gt1, end users can simply demand that: The unsucce ccording to formula (5 8), we get: %100%01 =?= gt I So, 100% is the use bility et?s assume 1 gt I , 2 gt I , ? ? ?, t gt I are respectively the interactions of navigation gtnav i 1 (5-9) Because the example website only has 1 goal-task gt1 (i.e., gt1?s weight is 100%), according to formula (5-9), we get: So, 100% is the user usability requirement for the interaction of entire navigation system. 5.3.3 Efficiency Instead of time, efficiency of navigation is better considered in terms of human physical effort needed to reach a desired CICI through the navigation architecture of a website. Specifically, the human physical effort means how many levels an end user has to click through the navigation architecture in order to reach the desired CICI. If we name the top level of a navigation architecture as level 1, then we can define the reaching in locating the t goal-tasks. We define the interaction of entire navigation system, nav I , as: ? = t gt i IwI i= %100%100%100 == ? nav I 93 distance of a particular CICI as the lev ther words, a CICI s et?s assume a CICI has an access probability , its reaching distance is , and the define the average probability reaching distance, , as: (5-10) In order to have the best efficiency, a website needs to have an optimal average probability reaching distance. Besides , another factor that can affect the efficiency of navigation is the breadth of a navigation architecture. Breadth, , is normally defined as the maximum number of navigation items at the same level of any branch of the navigation architecture. It is believed that a navigation architecture is most efficient when = 1, and any navigation architecture with ? 5 should be avoided [65][112][113][114][115] [116]. It is also believed that has much less effects on the efficiency of navigation than [117][118][119] [120], but it is normally suggested that should not be more than nine 9 [121]. In other words, an efficient navigation architecture should be shallow and wide, but not too wide. el at which the CICI can be located. In o ?s reaching distance is simply the least number of mou e clicks for the CICI to be reached. i i p i dL total number of reachable CICI?s is n , we ap D i n i iap pdD ? = = 1 ap D max W ap D ap D max W ap D max W 9 Sometimes, this limitation is not practical on the WWW. Actually, in extreme situations, the number of items on one level of the navigation architecture of some websites can easily run up to the order of thousands or even millions, for example, the topic lists on some forum websites, or the search result lists of web search engines. 94 ccording to the above discussion, we define the efficiency of navigation, , as: nav EA ) 9 %10 4 %90(1 ee nav wd E ?? +?= (5-11) In (5-11), 4 e d is the inefficiency caused by , and ap D 9 e w is the inefficiency caused by , and, Dif DfiD d ? ? <