SISE: A Novel Approach to Individual Effort Estimation by Russell Thackston A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 3, 2013 Keywords: effort, estimation, individual Copyright 2013 by Russell Thackston Approved by David Umphress, Chair, Associate Professor of Computer Science and Software Engineering James Cross, Professor of Computer Science and Software Engineering Dean Hendrix, Associate Professor of Computer Science and Software Engineering Abstract Author?s note: This dissertation has been prepared following a manuscript style. Each chapter has been constructed as a stand alone manuscript, suitable for separate publication. Therefore, each chapter contains an abstract and introductory subject material, as well as some overlapping content. The software engineering discipline is filled with many varied examples of software pro- cess methods and tools focused on the team or organization. In recent years, the agile approach to software engineering has increased the focus of software process on small teams and individuals; however, not all aspects of software process have been deeply or fully ad- dressed. The majority of effort estimation models ? traditional and agile ? focus on teams or groups of software engineers. The discipline is ripe with various examples of team-based models including Wideband Delphi, Planning Poker, function point analysis, COCOMO, etc. The few examples of effort estimation models focused on the lone software engineer are limited to traditional mathematical models with (relatively) substantial complexity and required time investment; the Personal Software Process (PSP) contains one such model: PROBE.Thedisciplinelacksatrulyagilemodelbasedonaminimalcombinationofempirical data and expert judgment. The SISE model under development at Auburn University?s microISV Research Lab is a simple to understand, lightweight, and agile effort estimation model that specifically targets individual software engineers. SISE combines an individual?s personal, empirical data with his or her expert judgment and experiences to produce relatively accurate estimates with a minimal investment of training and time. ii The SISE model rests on two foundational principles. First, software engineers are capable of identifying the largest of a pair of tasks based solely on their descriptions. Second, a software engineer who is presented with a future work activity is capable of identifying two historical tasks ? one larger, one smaller ? which may serve as a prediction of the future activity?s size. The name ?SISE? is an acronym for the model?s four basic steps: Sort, Identify, Size, and Evaluate. The first step ? Sort ? involves the ordering of historical data by the actual effort required to complete the activity. The second step ? Identify ? involves choosing two tasks from the historical data set: one confidently known to be smaller, one confidently known to be larger, and both relatively close in size to the future work. Once the practitioner has chosen a pair of tasks, the third step ? Size ? produces a rough prediction interval of the future activity?s size using the actual effort values for the two completed tasks. The final step ? Evaluate ? involves shifting or resizing the prediction interval to account for any historical bias. This last step is optional and is only applied if the estimator is dissatisfied with the precision, accuracy, or confidence level of his or her estimate. Validation of the SISE model included two major steps. First, the foundational principle that relative tasks sizing by software engineers is suitably accurate was validated. The validation occurred in the form of a survey, presented to over 100 software engineering students, which presented the respondents with a series of task pairs from which they were to identify the larger. Some of the pairs had a known, verifiable size difference based on ten years of time logs provided by students in the Software Process course, while some of the pairs did not. The results indicated that, on average, a majority of software engineers were able to identify the larger task, while not typically misidentifying the smaller. When presented with tasks demonstrating no significant difference in size, the respondents were typically swayed by the wording, format, or word count. iii The second phase of validation involved a series of Software Process students who were asked to identify where a future activity should be placed in the ordered list of their com- pleted tasks. In addition, the students were asked to construct a PCSE estimate. The results indicated that SISE predictions were no more or less accurate than the PCSE model?s esti- mates. In addition, the students indicated that SISE, in their opinion, took less time and was based on less a complex model. In summary, SISE appears capable of producing results of equal quality, in less time, and with less training. iv Acknowledgments I would like to acknowledge David Umphress for his extensive contributions, both as a mentor and editor. I would also like to thank Laura Thackston ? my wife, my peer, my best friend ? for keeping me grounded and on track. v Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Individual Effort Estimating: Not Just for Teams Anymore . . . . . . . . . . . . 1 1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Rise of the one-person team . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 The plight of the individual estimator . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Estimation landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6 Estimation for the one-person team . . . . . . . . . . . . . . . . . . . . . . . 6 1.7 A ?Better than Guessing? Agile Approach . . . . . . . . . . . . . . . . . . . 8 1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.9 Further Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Support for Individual Effort Estimation . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 The Importance of Effort Estimation . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Effort Estimation and the Individual . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Effort Estimating Tools, Techniques, and Approaches . . . . . . . . . . . . . 13 2.5 Common Estimation Approaches . . . . . . . . . . . . . . . . . . . . . . . . 15 2.6 Common Estimation Tools and Techniques . . . . . . . . . . . . . . . . . . . 19 2.7 Measuring Estimation Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.8 Estimation Techniques for the Individual . . . . . . . . . . . . . . . . . . . . 28 vi 2.9 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 SISE: A Novel Approach to Individual Effort Estimation . . . . . . . . . . . . . 32 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 The SISE Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Step 1: Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.2 Step 2: Identify . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.3 Step 3: Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.4 Step 4: Evaluate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Getting Started with SISE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Accuracy, Precision, and Confidence Level . . . . . . . . . . . . . . . . . . . 39 3.6 SISE Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7 Validation of SISE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Validation of the SISE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3 SISE: An Agile Estimation Model . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4 Relative Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4.5 Questions and Presentation . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.7 Individual Respondents . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 vii 4.5 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.5.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.5.2 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.5.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5.5 Questions and Presentation . . . . . . . . . . . . . . . . . . . . . . . 69 4.5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.6 Time Investment and Perceived Value . . . . . . . . . . . . . . . . . . . . . . 71 4.6.1 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6.2 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6.3 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.6.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6.5 Questions and Presentation . . . . . . . . . . . . . . . . . . . . . . . 74 4.6.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 Conclusions and Additional Research . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3 Additional Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A Output Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 B Themes in Relative Sizing Rationale . . . . . . . . . . . . . . . . . . . . . . . . 101 C Relative Sizing Survey Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 103 D Attitudinal Survey Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 viii List of Figures 4.1 Distribution of actual construction times (all data). . . . . . . . . . . . . . . . . 52 4.2 Distribution of actual construction times (values < 1,000 minutes). . . . . . . . 52 4.3 Relative sizing survey results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Distribution of percentage of correct answers (task pairs 1-4). . . . . . . . . . . 65 4.5 Sample PROBE calculation using the assignment spreadsheet. . . . . . . . . . . 69 ix List of Tables 2.1 Common approaches to estimation. . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Common approaches to estimation. . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Sample MARE values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 One completed activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Two completed activities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Three completed activities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Four completed activities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Ten completed activities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6 Adjusting for width bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1 Average construction effort (in minutes). . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Average construction time comparison for assignment pairs. . . . . . . . . . . . 53 4.3 Relative Sizing Survey Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Task pair 1 distribution of themes in respondents? rationales. . . . . . . . . . . . 57 4.5 Task pair 2 distribution of themes in respondents? rationales. . . . . . . . . . . . 58 4.6 Task pair 3 distribution of themes in respondents? rationales. . . . . . . . . . . . 59 4.7 Task pair 4 distribution of themes in respondents? rationales. . . . . . . . . . . . 60 4.8 Task pair 5 distribution of themes in respondents? rationales. . . . . . . . . . . . 61 4.9 Task pair 6 distribution of themes in respondents? rationales. . . . . . . . . . . . 61 4.10 Task pair 7 distribution of themes in respondents? rationales. . . . . . . . . . . . 62 4.11 Task pair 8 distribution of themes in respondents? rationales. . . . . . . . . . . . 63 4.12 Task pair 9 distribution of themes in respondents? rationales. . . . . . . . . . . . 63 4.13 Sample assignment summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.14 Summary of survey results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 x Chapter 1 Individual Effort Estimating: Not Just for Teams Anymore Author?s note: This manuscript was originally published in the May/June 2012 edition of CrossTalk: Journal of Defense Software Engineering. 1.1 Abstract Truly viable software ? mobile device apps, services, components ? are being written by one-person teams, thus demonstrating the need for engineering discipline at the individual level. This article examines effort estimation for individuals and proposes a lightweight approach based on a synthesis of a number of concepts derived from existing team estimation practices. 1.2 Introduction Software engineering is usually portrayed as a team activity, one where a myriad of technical players are choreographed to build a software solution to a complex problem. With such a perspective, estimating the effort required to write software involves a top-down view of development projects. Effort is forecasted based on a characterization of previous projects that is intended to represent teams performing to the statistical norm. While large projects are predominant ? especially in the DoD environment ? the fact remains that teams are made of unique individuals, each of whom write software at a different tempo and with different properties. At some point in a project, the top-down prediction of required effort must be tempered with a bottom-up frame of reference in which the team fades from being a group of generic members to being a collection of distinct persons. 1 Historically, estimation methods have focused on effort at the team level. Recent agile software development practices have shed light on taking individuals ? who are acting as part of a team ? into consideration. Emerging trends in software development for mobile devices suggest that effort estimation methods can be employed for one-person endeavors and that those methods can benefit teams, but also that such methods are still very much in their infancy. This paper examines effort estimation from the one-person team perspective and describes a lightweight effort estimation technique in that light. 1.3 Rise of the one-person team One-person software companies ? historically referred to as shareware authors, but more recently labeled ?micro Independent Software Vendors,? or microISVs ? are on the rise, fueled in part by the proliferation of free and open source tools and new eco-systems, such the Apple App Store and the Android Market. In fact, metrics gathered by AdMob ? a mobile advertising network ? estimated the market for paid mobile apps in August of 2009 at $203 million [1]. That is an estimated $2.4 billion per year, for mobile apps alone, where the greatest barrier to entry is the cost of the mobile device. For example, publishing a game app in the Apple App Store requires an investment of around $100, not including the device, as the only requirement is the $99/year to join the Apple iOS Developer Program [4]. Publishing an Android app is even less costly since the developermembershipfeeisonly$25[3]. Bothplatformsboastgeneroussupportdocumenta- tion, which is freely available on the Apple and Google websites, respectively. Furthermore, many books are available for app authors wishing to dive deeper into the technologies of these platforms. In addition to the mobile app market, opportunities exist in other markets, such as the traditional downloadable software venue as well as the relatively new concept of Software as a Service (SaaS) websites. Gartner Research measured worldwide SaaS revenue for 2010 at $10 billion and should reach $12.1 billion in 2011 [14]. 2 MicroISVsencapsulatetheentirespectrumofcompanyactivitiesintoasingleindividual. As a result, the successful microISV focuses only on those activities that provide clear, measurable benefit and contribute directly to the bottom line; all other activities are deemed unimportant and are, as a result, discarded. It is through this natural selection process that successful microISV founders learn what is essential to the operation of a business and what is not. Unfortunately, the benefits derived from certain activities may not be directly visible. Many business activities ? such as planning and risk management ? are proven to provide measurable benefit, but may seem unimportant by a microISV operator confronted with the daily operations of a business. Too often, this category includes non-technical activities, such as taking the time to understand the size and scale of upcoming work (e.g. estimating effort). 1.4 The plight of the individual estimator MicroISVs are not the only one-person software development operations in action. Re- gardless of the size of the team (enterprise, company, firm, etc.), software development still boils down to the individual on the front line: the developer. No matter what label is given to the developer ? microISV owner, consultant, or team member ? it is his or her respon- sibility to help craft estimates and to manage his or her own time. The motivation varies, depending on the circumstances. For example, microISV owners might primarily use esti- mates to plan release cycles and orchestrate business and technical tasks. Consultants might use personal estimates to provide billing estimates. Team members could use personal esti- mates to validate requested delivery dates and coordinate time allocations among multiple activities and projects. However, individual software engineers remain generally ill equipped to construct personal estimates. This leaves the individual developer faced with the choice between guesswork, formal models, and/or relying on team-oriented techniques. 3 Unfortunately, team members, too, fall into behaviors which either avoid giving proper attention to estimating effort, or they fall back on one of the least accurate approaches: ?gut feel? estimates. Why do individual software developers not value good effort estimates? An informal survey of members of the Association of Software Professionals (ASP) reveals the perception that developing an estimate provides no direct value, either to microISV owners or their customers, the primary reason being that microISV product requirements are too fluid and are, therefore, not a good basis for an estimate [5]. Some microISV owners indicate that effort estimates are unnecessary unless a customer or client is holding them accountable; however, this is not often the case for microISVs, which tend to focus on shrink-wrapped products. Many of the same arguments expressed by microISVs apply to individual developers acting as team members. Developers on a team may not be asked for effort estimates; someone in a senior position may directly provide deadlines to them. In the event that a developer is asked for an estimate, it is likely they are not properly equipped to provide a good estimate or they do not view this non-technical activity as an interesting and engaging problem, like coding. This can lead to an estimating rule of thumb, such as ?make a guess and multiply by three.? Regardless of the circumstances and rationale, many individual developers are not equipped to construct and derive benefit from personal effort estimates. The spectrum of effort estimation approaches is broad, often overwhelming so. At the near end of the spectrum lies guesswork; at the far end lies formal models. The former is quick, but difficult to tune; the latter involves complex mathematical calculations to model past performance and requires a heavy investment of time. The agile software development movement strives to strike a conciliatory balance that takes advantage of individual expert opinions adjusted by the collective wisdom of multiple participants. Planning Poker, for example, exemplifies this philosophy of guessing effort individually then attenuating the range of guesses through 4 team dialog [9]. Up to this point, there are no single person equivalents to Planning Poker (i.e., ?Planning Solitaire?). Additionally, developers do not see a benefit to themselves in constructing an effort estimate. This is due in part to the tools and techniques currently available to the individ- ual, which are either too heavyweight or non-existent. Either the process of constructing the estimate takes too much time and effort (heavy-weight) or it produces low-quality re- sults (light-weight, guessing). In either case, the ROI doesn?t fit the circumstances. What is needed is a lightweight effort estimating process, focused on the individual, capable of constructing an estimate with a reasonable degree of accuracy. 1.5 Estimation landscape The process of estimating the cost of future software development efforts, in terms of time and resources, is a complex issue. Researchers have designed and tested models for predicting effort using a variety of approaches, techniques, and technologies. For example, while researchers found that half of all published research into effort estimation (up through 2004) utilized some form of regression technique ? predicting future effort based on past effort ? a good deal of research was still going into other techniques such as function point analysis, expert judgment, and analogy [22]. These models are capable of predicting effort in a variety of environments with varying degrees of success and accuracy. Most approaches share a common thread of complex mathematical models, requiring calibration and tuning. For example, COCOMO ? one of the most well known models ? uses 15 cost drivers (or ?attributes?) to estimate the size of the product to be created [7]. Organizations must accurately rate each cost driver, as well as determine software project coefficients that characterize their environment. Function point analysis employs a model based on determining the number and complexity of files internal to the system, files external to the system, inputs, outputs, and queries. This allows an organization to estimate effort by comparing the number and type of function points to historical data from past projects. 5 Like COCOMO, function point analysis requires adjusting the estimate based on a variety of technical and operational characteristics, such as performance and reusability. Despite the large amount of research focused on producing an accurate effort estimation, the typical software project is on time, on budget, and fully functional only about one-third of the time [37]. Clearly, the factors behind this statistic are poorly enumerated and vaguely understood, at best, given the number of variables involved. Despite the complexity of the problems facing the software cost estimating disciple, a wide variety of research into the problem has been conducted, focusing on such aspects as estimating approaches and models [22]. Perhapsoneofthecontributingfactorsliesatthebottomofthehierarchy, withsoftware engineers who are ill equipped to provide estimates and validate deadlines imposed by their managers. While there are many approaches to estimating effort, it is clear that the available approaches focus on the team or enterprise. Few approaches deal with individual effort estimation, such as at the task level. Furthermore, few approaches can be characterized as ?light-weight? and suitable to the agile environments of one-person software development teams. 1.6 Estimation for the one-person team In 1995, Humphrey introduced the Personal Software Process (PSP), which defined the first published software process tailored for the individual [17]. PSP defines a highly structured approach to measuring specific aspects of an individual?s software development work. With respect to effort estimation, PSP employs a proxy-based approach referred to as PROBE (Proxy-Based Estimating). In general, a proxy is a generic stand-in for something else, where the two object?s natures are similar. With regard to software estimating, a proxy is a completed task of similar size to an incomplete task. Therefore, it can be assumed that the time to complete the second task should be similar to that of the first. PROBE specifically uses a lengthy, mathematical process for determining anticipated software lines 6 of code (SLOC), which are used to compare tasks and find an appropriate proxy from which a time estimate is derived. PSP has demonstrated popularity in both academic and business environments. Busi- nesses, such as Motorola and Union Switch & Signal, Inc., have adopted PSP and claimed varying degrees of success. Many universities have integrated PSP into software engineering courses in an effort to demonstrate and measure the value of a structured personal devel- opment process. A University of Hawaii case explored some of the benefits and criticisms of PSP [20]. The study demonstrated that students using PSP developed a stronger appre- ciation for utilizing a structured process. However, the study also noted PSP?s ?crushing clerical overhead,? which could result in significant data recording errors. Interestingly, Humphrey?s Team Software Process (TSP) [18] corroborates the fusing individual estimation efforts into a team-level estimate. Individual members of TSP teams practice PSP and employ individual PSP-gathered measures in making team-level estimates. Researchers at Auburn University have developed a lightweight alternative to PSP, known as the Principle-Centered Software Engineering (PCSE) [40]. PCSE uses a proxy- based approach to estimating software lines of code, which is clearly lighter weight, yet still relies on non-trivial, mathematical models to produce a result. On one hand, the lightweight nature of PCSE calculations overcomes the heavy mathematical models of PROBE. On the other hand, the use of software lines of code limits the usefulness of PCSE in graphical or web-based development, which may include graphics, user interface widgets, etc. The informal survey of ASP members reveals anecdotal evidence that software engineers typically opt for a ?gut-feel? approach, where effort estimation is based on no more than educated guesses [5]. This impression is especially true of software engineers working outside the constraints of an organization, which typically mandates the use of formal tools and processes. Since developers typically underestimate 20-30% lower than actual effort [41], this would explain why the gut feel approach is typically heavily padded. 7 1.7 A ?Better than Guessing? Agile Approach A void exists in the spectrum of effort estimating tools, specifically focused on light weight tools for the individual. SISE ? which stands for Sort-Identify-Size-Evaluate ? rep- resents a new approach to forecasting future software development effort by combining a standard regression model with relative task sizing. SISE introduces an agile approach to sizing by the individual in much the same way Planning Poker introduced agile sizing to the team. Specifically, the SISE model guides the estimator through the process of organizing future tasks, not by matching them to historical analogies, but by size ranking, relative to historical tasks with known effort measurements. The SISE model then derives its results based on a simple principle: if the perceived size of a future task falls in between the actual size of two historical tasks, then the actual effort of the future task should also lie between the actual effort of the two historical tasks. Forexample,assumetwotaskshavebeencompletedonaprojectbyasoftwaredeveloper: the first task in four hours and the second in six hours. A third task is then assigned to the developer, who estimates the relative size of the task to be somewhere between the first two tasks. It can be assumed, then, that the actual effort for the third task is somewhere between four and six hours. Note that this approach does not necessarily produce a single value estimate, but rather an upper and lower bound for the estimate (i.e. prediction interval). If a single-value estimate is required, it can be extracted from the prediction interval in a number of ways. One approach to deriving a specific value for the estimate is to take the upper bound values. This produces a high confidence estimate, yet strongly resembles the practice of ?padding? the estimate (i.e. playing it safe). Another approach involves simply averaging the upper and lower values and relying on the law of averages to even out the errors. In many instances, this approach may produce a specific value that is ?good enough? for the circumstances with a minimal amount of effort and complexity. A third approach relies on a simple statistical analysis of the developer?s historical data to produce a weighting 8 factor which is applied to the upper and lower bounds to produce a single value. Simply put, the weighting factor represents where, on average, the developer?s actual effort for all tasks fell between the upper and lower bounds of the estimates for those tasks. For example, a developer who, on average, completes tasks exactly at the midpoint between the upper and lower bounds of the estimate will possess a weighing factor of 0.5. In its simplest form, the SISE model is specifically targeted at individual software de- velopers, such as microISV operators, independent consultants, and team members. The approach?s strength lies in the fact that individuals may develop reasonably accurate per- sonal estimates based on their own historical data, while excluding team-level factors such as communication and overhead costs. Although the factors involved in a team-level effort estimate are more numerous, project-level estimates may be also be derived with the assis- tance of the SISE model by combining individual team members? estimates and applying existing, proven approaches to adjusting for overhead. Another, more radical application of SISE might involve treating entire projects as tasks and deriving an estimate in the same manner as used by individual developers. This type of application would be most useful in certain organizations, which insist on estimates extremely early in the development cycle (i.e. before requirements are fully elaborated and understood). Obviously, the SISE approach requires a calibration period, during which a historical data pool is constructed for deriving future estimates. Fortunately, many organizations and individuals already track effort expenditures, via time sheets or project management tools, providing a ready source for historical data. This leaves a historical gap for new developers and for environments lacking historical records. Fortunately, the calibration period can be relatively short, depending on the typical size variations in tasks; organizations with task sizes that vary widely will require a longer calibration period to derive a suitable data pool of historical values. Although not recommended, in the absence of historical data, it is possible to use another software engineer?s actuals as a surrogate data pool and then rotate surrogate data out as actual developer data becomes available. 9 A variety of other factors exist which may affect the accuracy or precision of an estimate. These factors ? such as programming environment changes, statistical outliers in the data set, data recording and accuracy, and granularity ? must be addressed in a consistent manner whenimplementingtheSISEmodelinanorganization?senvironment. Thekeytosuccessfully applyingthisapproachliesintheindividual?sabilitytorecognizeandadjustforthesefactors. 1.8 Conclusion The emergence of software written by one-person teams ? mobile device apps, services, components ? renders inaccurate the portrayal of software engineering exclusively as a team activity. It brings a vanguard of exciting concepts in which the individual plays a pivotal role in not only creating viable software, but in controlling the development process. Systematic effort estimation, once the province of teams, has benefit to individuals as well; however, it, like so many other software methods, has to be stripped of unnecessary tasks if individual developers are to reap that benefit. It has to be intuitive, usable, and produce results that are more accurate than outright guessing. 1.9 Further Information TheSISEmodeliscurrentlyunderdevelopmentatAuburnUniversitybyITveteranand graduate student Russell Thackston. The model is currently under evaluation by Auburn University?s Computer Science and Software Engineering program. 10 Chapter 2 Support for Individual Effort Estimation 2.1 Abstract Estimation models play an important role in developing software both within teams and for individuals, as they validate deadlines, design project plans, and schedule work. These models rely on approaches such as regression and analogy to provide practitioners with the tools to derive useful estimates. The perception of the quality of these models is often defined by the accuracy of their estimates. However, many practitioners contend that the true value of an estimate lies in its usefulness as a project-planning tool, not in its ability to predict the future. Despite the wide variety of existing estimation techniques available to teams, individ- uals few realistic choices in building estimates: PSP?s PROxy-Based Estimates (PROBE); PCSE?s Component-Based Estimating; and expert judgment (i.e. guesswork). The struc- tures of PSP?s and PCSE?s estimation models arguably demonstrate a level of time com- mitment higher than guesswork. Anecdotal evidence suggests that a subset of practitioners choose guesswork as an alternative to established models due to the perceived overhead asso- ciated with using such a model. The space between guesswork and established models (such as PROBE and PCSE) represents a gap that can be filled with a lighter-weight, reasonably accurate model tailored to the individual and suited to estimation of individual tasks. 2.2 The Importance of Effort Estimation The exact benefits of accurate software effort estimates vary from organization to or- ganization. However, some common threads exist regardless of the environment. First, a 11 software estimate predicts the scale ? from small to massive ? of the work being undertaken. From this estimate of scale, predictions for staffing, computing resources, milestones, and deadlines are derived. Without a reasonably accurate estimate, many project management activities could not be conducted without resorting to guesswork. For example, an accu- rate estimate allows the project manager to predict milestone and delivery dates. These milestones assist in predicting project staffing or outsourcing needs, which may require in- teraction and scheduling with external entities. This leads into a proper project schedule and efficient resource management; without a reasonable estimate, resources may be brought in too early, resulting in slack time, or too late, resulting in delays. Risk management is also heavily impacted in the absence of a good estimate, since the purpose of an estimate is to enable efficient work management, not predict exact schedules [26]. Activities outside project management may also be affected by the lack of an effort estimate. For example, consulting firms who competitively bid on projects must rely on the estimatetoproperlyestimatetheirtimeandresourceinvestmentand, basedonthatestimate, how much they will bill their clients. Furthermore, most clients will wish to establish and work to delivery dates based on the initial estimates. Inaccurate or absent estimates require a level of guesswork be employed and may result in cost overruns or excessively high, non- competitive bids for work. The importance of effort estimates has led to the development of a wide variety of estimating approaches, tools, and techniques. Naturally, much of the research focuses on projects and teams; team projects present some of the most complex challenges in estimat- ing as they attempt to address the myriad factors, such as overhead and communication. However, few approaches and tools attempt to address the challenge of estimates at the level of the individual. 12 2.3 Effort Estimation and the Individual A wide variety of organizations exist for the purpose of creating software. Ultimately, these organizations employ one or more software engineers, regardless of team size or com- position, for the purpose of building the product. In addition to the well-established role of ?team member,? software engineers can also be found in roles such as independent consul- tants, where the lone consultant performs all the functions and duties of a small team. In the extreme, software engineers are also found operating as one-person software companies, known as micro Independent Software Vendors (microISVs). Software engineers, in all their incarnations, are faced with deriving, validating, and be- ing managed to estimates. For example, software engineers working on teams are regularly given assignments with attached deadlines. A proper analysis of the allotted time (estimate) allows the software engineer to plan his or her schedule accordingly, or provide useful feed- back to his or her superiors in order to adjust the deadline. Independent consultants must competitively bid projects. An accurate estimate is essential to deriving a high quality bid; if the estimate is inaccurate or incomplete, the consultant may underbid the project and incur a loss or may overbid and fail to secure the contract. MicroISV owners rely on estimates in selecting where best to spend their limited time: development, marketing, support, etc. Only two established methodologies, discussed later, present individual software engi- neers with reasonable alternatives to guesswork, an all too common alternative to established methods. The remaining challenge for consultants, team members, and microISVs is to im- plement these models into their own personal process in a non-invasive manner with minimal overhead and time investment. 2.4 Effort Estimating Tools, Techniques, and Approaches Extensive research in the field of effort estimation has produced a wide variety of tools and techniques. These tools and techniques typically implement one or more underlying 13 approachestosolvingtheproblem. Todifferentiateamongtools, techniques, andapproaches, consider Mountain Goat Software?s implementation of Planning Poker [29]. In Planning Poker, agroupofexpertsdiscusstherequirementsbehindthetaskstobeestimated. Foreach task, the individual estimators select a number from a Fibonacci sequence to represent the complexityofthetask, thoughnotnecessarilyinhoursordays. Allestimatorssimultaneously reveal their estimate to the other members of the group. If the estimates differ significantly, a discussion is held in which the outliers describe the reasoning behind their selections. The process then repeats until the estimates converge on a common value. The Planning Poker model may be described in terms of the tool used to implement the model as well as the underlying technique and approach. To clarify, a tool is defined as ?an instrument or apparatus? and a technique as ?the manner in which technical details are treated? [27]. The tool used to implement Planning Poker is a simple deck of playing cards. The technique can be most succinctly described as a group of experts iteratively developing, presenting, and discussing individual estimates of effort for tasks, which ends as consensus is reached. At a more detailed level, the technique behind Planning Poker involves aspects such as informality, speed, and simplicity. Planning Poker does not, in fact, implement a single approach to estimation; like most estimation models, Planning Poker combines multiple approaches: expert opinion, analogy, and work breakdown (disaggregation). These approaches may be used in different combi- nations to produces a variety of techniques. For example, the Wideband Delphi estimation model uses the same approaches as Planning Poker and the descriptions of the two models sound very similar. In contrast, the Wideband Delphi process involves formal meetings, paper forms, and anonymous estimates. Furthermore, the process takes days or weeks, as opposed to minutes or hours. The specific tools used to implement a technique could vary based on the needs of the organization or its structure. For example, Planning Poker could be implemented for 14 geographically disperse teams using computers, voting software, and video chat, without sacrificing the core principles of the technique. 2.5 Common Estimation Approaches There are a wide variety of approaches to address the problem of effort estimation [22]. Table2.1liststhesomecommonapproacheswithabriefdescription. Thislistisnotintended to be comprehensive; rather, it is designed to present a general cross-section of some widely used approaches. Regression, also known as regression analysis, involves modeling and analyzing multiple variables that are assumed to be interdependent. In software effort estimation, regression typically involves ?sizing? a project in terms of some known or estimated quantity ? features, inputs/outputs, orlinesofcode?thencalculatingtheeffortfromthatquantity. Forexample, COCOMOisaseriesofmodelsdesignedtocomputesoftwaredevelopmenteffortasafunction of size in estimated software lines of code (SLOC) [7] [8]. The SLOC is translated into effort based on industry data and, depending on the COCOMO model employed, the project type, cost drivers, or phase. Another common approach to estimating effort involves drawing analogies. Analogy- based reasoning involves drawing conclusions about a future occurrence based on the details of a similar, past occurrence. Estimating software effort by analogy involves four factors, which directly influence the accuracy of the estimate: the availability of an appropriate analogue; the soundness of the selection strategy; how the differences between the analogue and target are considered and adjusted for; and the accuracy of the available data points [42]. A third approach involves the expert judgment of one or more members of the project team. Simply put, expert judgment involves relying on an individual, or group of individuals, to gather, evaluate, discuss, and analyze data concerning a target project [21]. Instead of inputting the data into a formal analytical model and publishing the result, the estimators 15 Approach Description Analogy Drawing conclusions about a future occurrence based on the details of a similar, past occurrence. Artificial Neural Networks Application of massively parallel, computer-simulated, bi- ological neurological systems to predict outputs through the use of complex dependent and independent input vari- ables. Classification and Regression Trees Building a binary tree with branches representing possi- ble effort values for each estimation characteristic, then locating the ?optimal? sub-tree. Traversing the sub-tree from terminal node to root allows for the calculation of an effort estimate. Expert Judgment Relying on an individual, or group of individuals, to gather, evaluate, discuss, and analyze data concerning a target project to build an estimate. Function Point The use of system inputs, outputs, and persistent data as ameasureoftheamountoffunctionalityrequiredbyasys- tem. The functionality is expressed as ?function points,? which can be used to derive effort. Mathematical Models A mathematical formula that predicts effort (output) based on the multiple inputs, such as team productivity and project scale. Proxy-Based Using a known or predicted unit of size (screens, lines of code, etc.) for a task to infer required future effort for the task. Regression Modeling and analyzing multiple variables that are as- sumed to be interdependent. For example, predicting task?s duration based on the estimated lines of code. Simulation A computer model that attempts to simulate the abstract model of the required work effort for a set of activities. Work Breakdown Decomposition of an effort into individual tasks. Also known as ?disaggregation.? Table 2.1: Common approaches to estimation. 16 produce an estimate based on their knowledge of the work to be performed and the environ- ment. The estimators may follow a checklist or set of guidelines, but no mathematical model is employed to derive the final numbers. Some experts characterize expert judgment as a ?gut-feel? alternative to established models. However, when viewed as an approach, expert judgment can be characterized as one aspect of a larger tool or technique. In fact, aspects of expert judgment find their way into most tools and techniques, simply because their input process is managed by individuals making decisions about how to divide, structure, and organize work. Work breakdown, as an approach, is fundamental to many estimation tools and tech- niques, as well as project management in general. In simple terms, a work breakdown is a decomposition of an effort into individual tasks. The level of decomposition required is defined by many factors, such as the overall size of the project; the size and structure of the project team; and the type of project. One major benefit of creating a thorough work breakdown is that it helps reveal all the individual tasks involved in the effort, reducing the chance of leaving out small, but important steps [26]. The work breakdown also serves as a tool for comparing a future project to past projects that have been similarly decomposed into smaller tasks. The disadvantage of the work breakdown approach is that it requires a more complete knowledge of the system to be built at the time the estimate is derived. Another approach, known as function point analysis, involves counting the number and complexity of functions performed by a software product [2]; such functions include files operations internal to the system, files operations external to the system, inputs, outputs, and queries. The number of function points may be translated into an estimate of SLOC, which in turn may be used to estimate effort in terms of time commitment, based on the historical productivity of the project team. Proxy-based estimation approaches involve identifying and counting known features of a task or effort ? such as the number of screens, lines of code, number of functions/procedures, 17 etc. ? then inferring future effort based on those ?proxies.? For example, a high level archi- tecturaldesign may describethe numberofcomponents thatmustbe createdtocomplete the software product. By combining the number of components, their individual estimated com- plexity, and historical data on productivity, an estimator can infer the amount of required effort to build the new components. Artificial neural networks (ANN) are software models inspired by the architecture of biological neural networks [43]. ANNs represent a novel approach to estimation by utilizing a massively parallel network of interconnected nodes (representing virtual, biological neurons), each with a series of inputs, and each generating an output when the sum of the inputs exceeds a predefined threshold. Neural networks with a feedback ? or learning ? mechanism improve performance by fine-tuning the weighting of the inputs and/or the threshold for producing the output. ANNs are particularly suited to modeling software estimates when a non-linear relationship exists between the inputs (e.g. size) and outputs (e.g. effort) [13]. Estimation approaches based on mathematical models are also common. Researchers have attempted to derive mathematical models and formulas to represent the relationship between size and effort. The simplest model of this relationship is a linear relationship: as size increases, effort also increases at a steady rate. Linear models, however, are not suitable for estimating non-trivial projects in large and complex environments; therefore, more complex models were developed, such as Putnam?s model, which is based on a Rayleigh distribution [31]. These models, in their various forms, attempt to compensate for factors such as increased overhead and communications as the size of the project and/or team increases. In addition to the aforementioned approaches, researchers and practitioners have devel- oped a wide variety of approaches that are best described as ?academic exercises? intended to explore novel theories or highly specialized situations [22]. While these additional ap- proaches add significant value to the discipline, they are highly specialized and are beyond the scope of this analysis. 18 2.6 Common Estimation Tools and Techniques The tools and techniques derived from various approaches may be categorized in several ways. For example, some tools are highly structured, requiring the practitioner to follow specific steps in a prescribed order. Some tools use a strict mathematical formula to calculate their output, while others rely more on expert judgment or human reasoning. COCOMO is a well-structured, formal approach to developing software estimates [7]. COCOMO, in its original form (COCOMO 81), addressed the software development prac- tices of the day, such as mainframe overnight batch processing. In 2000, COCOMO II was published in a revised form to reflect recent changes, such as software reuse and off-the-shelf software components [8]. Four major steps comprise the foundation of COCOMO. In the first step, the nominal effort is determined based on estimating the number of ?source in- structions,? or lines of code. Next, a series of fifteen cost drivers ? relating to the product, environment, and hardware ? are evaluated and each is assigned a weight or value. In the third step, the product of these ?effort multipliers? is used to derive an effort value ? usu- ally in man-months ? from the nominal effort from step 1. Lastly, the estimator adjusts for additional factors, beyond the first three steps. Boehm?s stated goal in developing the COCOMO model is to ?help people understand the cost consequences of the decisions they will make in commissioning, developing, and supporting a software product? [7]. The Putnam Model is another formal technique based on historical data and mathe- matical analysis [31]. The Software Lifecycle Model (SLIM) is the proprietary tool released by QSM, Inc., a company founded by Putnam. The Putnam model follows the idea that his- torical project data such as time, effort, and size can be mapped to a consistent distribution of data, or curve on a graph. Therefore, an effort estimate, usually in man-months, may be derived for a future project by fitting it to the curve based on the projects estimated size (e.g. lines of code, etc.). The term function point analysis refers to both an approach (described earlier) and a technique. Function point analysis is a highly structured technique that begins with a 19 thoroughreview of projectrequirementsto uncover acomprehensivelistof software?function points.? These function points are comprised of a list of inputs, inquiries, outputs, internal files, and external interfaces. The function points are counted, adjusted for factors such as complexity, and summed. The resulting value is a dimensionless number representing the relative measure of the number of functions defined by the requirements. By comparing this number to past projects and their corresponding values, an estimate of required effort can be proposed. Various tools exist ? such as the Construx Estimate tool from Construx Software Builders [10] and ESTIMACS (known as CA-Estimacs) from Computer Associates [25] ? to assist in deriving software estimates based on function points. PROxy-Based Estimating (PROBE) ? a component of the Personal Software Process (PSP) ? is both a technique and a tool for deriving estimates [17]. As a technique, PROBE drawsonseveralapproaches, includingproxy-basedestimation, regression, andanalogy. Like the preceding techniques, the PROBE tool is a well-defined model, however, its estimates are derived through the use of proxies or objects that are used to estimate the size of the software product. Each object is assigned a type ? which loosely defines the relative complexity ? and an estimated number of methods. The combination of type and method count defines the estimated size of the object. Based on the proxy list and historical data ? past estimates and actuals ? the overall size of the project is estimated. PROBE is not the only proxy-based technique for deriving estimates. The Principle Centered Software Engineering (PCSE) process also derives estimates through the use of proxy-based estimating. While both PROBE and the PCSE estimation approach are proxy- based, the techniques differ significantly in terms of complexity and number of steps. PCSE?s technique derives an estimate of effort through the estimation of the number of ?parts? required to build the software, which are translated into an estimate of software lines of code. The required effort is then inferred from the combination of lines of code and historical productivity. 20 Case-based reasoning, a superset of analogy-based reasoning, involves constructing a model of a problem, retrieving an appropriate analogue, transferring the solution used for the analogue to the target instance, mapping attributes between the analogue and the target, then adjusting the estimate to account for attribute differences. Estor is one example a software product, which combines the concepts of function points and case-based reasoning to derive estimates [42]. Analogy Software Tool (ANGEL) is another software product, which applies analogy- based reasoning to developing estimates [36]. Unlike Estor, which uses function points, ANGEL allows the estimator to define and input variables to describe the features of the project to be estimated. From the user-defined variables, a subset is selected and ANGEL locates the ?closest? match between the target project and historical projects based on the Euclidean distance between the variable sets. Web Objects is an approach designed to address the many disparate elements that make up web-based applications, such as static and dynamic pages, JavaScript, cascading style sheets, images, etc [33]. Web Objects calculates size using the language independent Halstead?s equation [15], which focuses on the operands and operators involved in the pro- posed application to determine the ?volume of work? from which an estimate of effort is obtained. Within the Web Objects context, an operand is something done to an operator. The Web Objects approach accounts for complexity by weighting the operators/operands as low, medium, or high. The Wideband Delphi method presents a formal approach to deriving estimates based on expert judgment [7]. In this approach, a group of experts follow a structured series of steps, managed by a coordinator, to reach a consensus on an estimate for a project or series of tasks. First, each expert is presented with the specification and forms to share his or her estimates. The experts meet, discuss the issues involved in completing the work, and then anonymously provide their estimates using the provided forms. If the estimates vary significantly, the coordinator calls another meeting in which the issues are further discussed. 21 The approach repeats the discussion/estimation steps until the estimates converge, forming the basis for the final estimate. Planning Poker is an agile approach to estimation, similar to Wideband Delphi, which involves a group of experts iteratively involved in discussion and estimate presentation, in an attempt to reach consensus [29]. Unlike Wideband Delphi, however, no forms are used and the discussion is not anonymous. Furthermore, the process occurs in a single meeting, not over a period of days. In addition, the estimates are presented in terms of complexity, such as hours, days, or story points; a single number chosen from a deck of cards, usually made up of the values from a Fibonacci sequence, represents the expert?s estimate of complexity. As with the Wideband Delphi approach, significant differences in estimates are discussed; however, unlike the Wideband Delphi method, the discussion is immediate and public. Once a consensus is reached, the results form the basis of the full estimate. Table 2.2 lists the previously discussed tools and techniques, cross-referenced with the underlying approaches demonstrated by the technique. Note that expert judgment plays some role in all techniques/tools, since the inputs to the processes are constructed or guided by a human, at some point. In addition, most techniques require some form of work break- down to occur prior to beginning the estimation process. 2.7 Measuring Estimation Quality Despitethelargenumberofestimationapproachesandtechniquesavailable,thesoftware engineering discipline does not formally recognize a single ?standard? quantitative measure- ment for the quality of a software effort estimation model. However, most research assumes the quality of a model is directly related to the model?s ability to accurately predict future effort. This accuracy, or lack thereof, may be measured in a variety of ways. Most quality measurements begin by comparing the estimated effort to the actual effort. For example, the estimated effort for a task may be ten hours less than the actual effort. However, the value of ?ten hours? has little meaning outside the context of the project. 22 Approach Technique/Tool Analogy/Case-BasedExp ert Ju dgme nt Fu nction Poin ts Mathematical Mo dels Pro xy-Based RegressionW ork Breakdo wn ANGEL CLAIR COCOMO/COCOMO II ESTIMACS Estor Expert Judgment Function Point Analysis Planning Poker PROBE PCSE Putnam Model/SLIM Web Objects Wideband Delphi Table 2.2: Common approaches to estimation. 23 Ten hours could be a significant error in the context of a two hours task; on the other hand, ten hours would be virtually insignificant in the context of a five person-year project. Therefore, the relative error (RE) is used as a foundation for measuring estimation accuracy (see Equation 2.1). For example, if a 100-hour project is underestimated by ten hours, the relative error is 0.1 (10%), or ten divided by 100. This may produce either a positive number (underestimation) or a negative number (overestimation). In many cases, the magnitude of relative error (MRE) is used to eliminate negative numbers and simplify calculations (see Equation 2.2). RE = (actual estimate)actual (2.1) MRE = (actual estimate) actual (2.2) The RE and MRE values are useful for determining the accuracy of a single estimate. However, a single large-scale estimate is typically composed of multiple smaller estimates; in such cases, the Mean Absolute Relative Error (MARE) is used [11]. The MARE value (see Equation 2.3) may be referred to as the Mean Magnitude of Relative Error (MMRE) or the average magnitude of relative error. Variations of MARE and MMRE also include the Median MRE. MARE = 1n nX i=1 (actuali estimatei) actuali (2.3) where n is the number of individual tasks in the set. The MARE produced by a particular model is sensitive to the environment in which the model is applied. Certain models have demonstrated widely varying MARE values when applied to different projects in different organizations with different teams. Furthermore, the calibration process appears to be one of the most significant influencing factors. For example, one study found MARE values for all tested models varying from an average of 24 Model MARE Notes Jeffery and Low, 1990 [19] CLAIR 79% Average of three organizations with values ranging from 43% to 117%. Function Points 108% Average of six organizations with values rang- ing from 39% to 132% Kemerer, 1987 [25] SLIM 772% Overestimated in all cases. COCOMO 601% Overestimated in all cases. Function Points (FP) 102% Function points to person months Function Points (SLOC) 167% SLOC to person months Function Points (FP to SLOC) 38% Large negative bias. ESTIMACS 85% Low (92%) confidence level. Mukhopadhyay et al., 1992 [30] Expert Judgment 31% - Estor 53% - Function Points 103% - COCOMO 619% - Ruhe et al., 2003 [32] Function Points 33% - Web Objects 24% - Expert Opinion 37% - Schoedel, 2006 [34] PROBE 14% Limited study of one student over 10 SQL pro- grams with values ranging from 1% to 67%. Yenduri et al., 2007 [44] Expert Judgment 59% 14 projects COCOMO II 35% 49 projects Table 2.3: Sample MARE values. 57 percent to almost 600 percent [30]. Table 2.3 lists several MARE values from various researchers and studies. Practitioners should note that the MARE value represents the difference between an estimate of effort for a task and the actual effort expended to complete the task. A MARE value cannot reflect the complex environmental, architectural, and social factors that in- evitably influence the time spent by an individual or team working to complete a task or project [24]. This is why many estimates should be given in the form of a low and high value, as opposed to a single value; these low and high values represent the reasonable best and worst cases. Furthermore, an estimate ? in its truest form ? is designed to facilitate project management controls, not predict the future. Therefore, while an ?accurate? estimate is 25 desirable, an ?inaccurate? estimate does not necessarily indicate a failure of the estimation model. WhereastheMREandMAREvaluesrepresentthe?accuracy? ofanestimate, practition- ers must also take into consideration a variety of other factors, such as confidence interval, prediction interval, and estimation bias. The confidence interval indicates the estimator?s confidence that the actual effort will fall within the range of the estimate, assuming the estimate is composed of a low and high value, such a ?90 to 110 person hours.? A confidence interval of 90% or higher is recommended for project planning models [28]. Therefore, an estimator?s goal is to produce an estimate in which he or she has a high level of confidence in its accuracy. The prediction interval (PI) represents the low and high bounds of the estimate. For example, an estimate of ?90 to 110 person hours? for an activity has a prediction interval of [90,110]. A natural correlation exists between the confidence interval of an estimate and the prediction interval: as the prediction interval expands, the confidence in the estimate should increase. For example, a prediction interval of ?between 10 and 1,000 person hours? is more likely to be correct than a prediction interval of ?between 99 and 101 person hours.? Therefore,anestimateshouldincludeboththepredictionintervalandtheconfidenceinterval. The quality of an estimation model may be evaluated over a period of time by analyzing the hit rate, width-accuracy correlation, and estimation bias. The hit rate is defined as the percentage of time the actual effort falls within the prediction interval [23]. Equation 2.4 shows the formula for calculating the hit rate. This value can be compared to the confidence interval; a hit rate that is lower than the confidence interval indicates overconfidence in the estimate and vice versa. 26 HitRate = 1nX i hi;hi = ( 1;min i actuali maxi 0;actuali > maxi_actuali < mini (2.4) where mini and maxi are the minimum and maximum values, respectively, of the PI for the effort estimate of task i; actuali is the actual effort of the task i; and n is the number of the estimated task. The width-accuracy correlation is a measurement of quality that attempts to determine if the estimates produced by the model are the result of informed analysis or wild guessing. It can be assumed that informed analysis will produce estimates with a high correlation between accuracy of the most likely effort and PI width. In other words, narrow PI widths should accompany accurate estimates, and wider PI widths should accompany inaccurate estimates. Wild guessing would, theoretically, produce a low correlation between the PI width and the estimate accuracy. Equation 2.5 shows the formula for calculating the balanced relative error (BRE), which is used as the foundation for the width-accuracy correlation, due to the fact that the BRE allowsformorerealisticmodelingoflinearrelationships, suchthewidth-accuracycorrelation. BREi = actuali estimateimin(actual i;estimatei) (2.5) where min(actuali,estimatei) is the lowest value for actuali and estimatei. Lastly, a set of estimates can be used to determine if estimation bias exists (i.e. a tendency toward the low or high ends of the PI). The Actual Effort Relative to PI (ARPI) value can be a good indicator of estimation bias. ARPI is a measure of the distance between the actual effort and the midpoint of the PI, normalized by the width of the PI. Equation 2.6 demonstrates the ARPI calculation. 27 ARPIi = actuali PI_midpointmax i mini (2.6) where PI_midpoint = (Maxi + Mini) / 2 Insummary, thequalityofaneffortestimationmodel?soutputmustbemeasuredagainst a variety of factors. First, the model?s accuracy ? ability to predict future effort within an acceptable margin of error ? must be measured and evaluated (i.e. relative error and mag- nitude of relative error). Next, the model?s output for a set of activities must be considered (i.e. mean absolute relative error). Lastly, the attributes of an estimate should be considered to address factors such as confidence, quality over time, and bias. 2.8 Estimation Techniques for the Individual Despite the wide variety of existing research and products available to teams, individual estimators possess few realistic options for deriving estimates for single tasks. Many of the aforementioned techniques and approaches require a significant investment of time and resources, which are not realistic for an individual to perform. An individual interested in deriving personal estimates at the task level has a few limited options. The most obvious and least time-consuming option for deriving estimates is expert opinion, or guesswork. Naturally, the quality of estimates produced in this manner varies widely with the experience of the estimator, the type of work, and the environment. In fact, research suggests that subjective estimates are most accurate when derived from groups of estimators, as opposed to lone individuals [16]. In opposition to guesswork are the established models based on many of the afore- mentioned techniques and approaches. These established models require a significant time investment in the form of organizational historical data, carefully derived multipliers, and/or mathematical models. The required investment makes these models impractical for an indi- vidual estimator to implement, leaving a handful of models constructed specifically for the 28 individual. PSP?s PROBE and the PCSE estimation approach are two such models designed specifically for individuals and small task set estimation. The PROBE tool is characterized by a highly structured approach to software devel- opment, involving careful recordkeeping and data analysis. The fourteen-step estimation method and the five-step estimation script (with an accompanying one-page worksheet) lead estimators through the process of planning, preparing, and implementing a proxy-based esti- mate [17]. Although this highly structured approach makes it a good fit for PSP advocates, some practitioners see the ?crushing clerical overhead? as too time consuming for certain environments [20]. The PCSE process is currently under development at Auburn University. One goal of the PCSE software process is to remain as lightweight as possible while providing the most benefit. The use of a proxy-based estimation technique in PCSE ? which requires a moderate level of recordkeeping and a relatively simple regression formula ? fits nicely between guesswork and PROBE. The agile development movement tends to avoid complex and highly structured software processes and relies on simpler techniques, such as expert judgment or expert opinion. For example, Scrum endorses an agile, expert opinion-based approach to estimating effort, such as Planning Poker, which involves the entire team [35]. The key feature of Scrum estimates are the use of abstract values, such as Fibonacci numbers or story points to represent the relative size of a task, rather than actual size in hours, days, or weeks. Taking into consideration the lightest weight approaches to effort estimation in terms of time spent preparing to derive an estimate, team-oriented models, such as COCOMO and SLIM, are impractical for an individual to set up, tune, and operate for single tasks. Guess- work/expert opinion requires the least commitment, as there is no mandatory recordkeeping requirement. PROBE and PCSE require the practitioner to maintain a detailed record of work activities; however, no additional analysis ? such a deriving cost drivers ? is required to 29 begin preparation of the estimate. In terms of pre-estimation time investment, a gap exists between guesswork and PROBE/PCSE. With regard to time spent deriving an estimate, the alternatives ? guesswork, PROBE, and PCSE ? represent varying investments. Guesswork, arguably, requires the least time investment, as the estimate is derived as a result of careful consideration on the part of the estimator; no formal standards exist as to what steps should be employed to derive a value. PROBE and PCSE require a series of forms to be completed along with a set of non-trivial calculations. Arguably, PCSE?s forms and calculations are less in-depth and complex than PROBE, however, both instances require larger time investments than guesswork. Again, a gapexistsbetweenguessworkandPROBE/PCSE,thistimeintermsoftime spent calculating an estimate. As previously mentioned, a study of PSP?s PROBE model demonstrated a MARE value of 14%. On the other hand, studies of expert opinion-based estimation demonstrate MARE values ranging from 31% to 59%. Although the scope of the PROBE study was extremely limited, it is logical to assume that a structured estimation model would, on average, produce more accurate results than unstructured guesswork. This implies a third aspect to the gap between guesswork and established estimation models: accuracy. Arguably, these three aspects ? preparation time, execution time, and accuracy ? define an exploitable space between the simplest estimation approach (guesswork) and the lightest-weight established models (PSP/PROBE and PCSE). 2.9 Summary and Conclusions Effort estimation plays an important role in software development. A broad spectrum of approaches, tools, and techniques exist to support the estimation process. The quality of the models built upon these techniques may be measured by their accuracy, consistency, and specificity. In fact, most models present organizations with an opportunity for improved project and risk management, regardless of their specific ability to predict the future. 30 A vast majority of these tools and techniques focus on teams or projects; few approaches address the needs of the individual software engineer or microISV. The exceptions are PSP and PCSE, the only two processes specifically designed for individuals. However, the time investment in both PSP and PCSE are non-trivial. The space between guesswork and these two estimating approaches is wide enough to accommodate a new model, targeted at individuals interested in an estimation approach that outperforms guesswork without the overhead of the estimation models currently available. Ideally, an estimating approach tailored to the individual ? such as a consultant, team member, or microISV owner ? would have the features of simplicity, speed, and relative accuracy. Such attributes would naturally increase the likelihood of adoption and continued usage. While PSP represents a model designed specifically for the individual and PCSE represents a highly streamlined model, neither fully bridges the gap between established models and guessing. 31 Chapter 3 SISE: A Novel Approach to Individual Effort Estimation 3.1 Abstract Individuals rely on their personal processes to develop software in a systematic and structured manner. Time management, which relies on relatively accurate effort estimation techniques, has been shown to be a key component in planning and executing software development activities. Despite a plethora of research into team-based effort estimation models, few models are suitable for use by individual software engineers. Models tailored to the individual include guesswork, an approach commonly used in industry; the PCSE model, under development at Auburn University; and the PROBE model, the only peer- reviewed model devoted to lone software engineers. This spectrum of choices features a gap between guesswork and more formal models, which could be filled with a lightweight, agile, and reasonably accurate alternative. The SISE model combines expert judgment ? in the form of relative sizing decisions ? with empirical, historical data to create such an alternative. The SISE model is based on a four-step process in which historical tasks are sorted by actual effort values; a future activity?s effort is forecasted, relative to the historical tasks? requirements; and a prediction interval is constructed for the future activity. 3.2 Introduction In recent years, dramatic changes to the software industry have brought individual developers to the forefront of software engineering practices. In addition, the rise of the software micropreneur in markets such as mobile app development and web applications has reinforced the need for lightweight, agile software engineering practices. For example, 32 recent surveys of the microISV industry have shown that time management and related issues topped their founders? list of ?pain points? [39]. Historically, however, many of the software process tools available to software engineers have been team-oriented, making them impractical for the individual to benefit from their usage [38]. In response, researchers at Auburn University have been focusing their efforts on con- structing tools targeted directly at the individual software engineer. One such tool is the SISE effort estimation model. SISE is a lightweight, agile model designed to construct estimates based on expert knowledge and empirical evidence. In this respect, SISE outperforms simple guesswork, while incurring a much lower overhead than traditional, established models, which rely on complex software, algorithms, or mathematical calculations. The SISE Model ?SISE? is an acronym for the model?s four-step process. The four steps, in order, are Sort, Identify, Size, and Evaluate. The first step ? Sort ? involves the ordering of historical data by the actual effort required to complete the activity. The second step ? Identify ? involves choosing two tasks from the historical data set: one confidently known to be smaller, one confidently known to be larger, and both relatively close in size to the future work. Using this pair of tasks, the estimator begins the third step ? Size?by producing a rough prediction interval of the future activity?s size using the actual effort values for the two completed tasks. The final step ? Evaluate ? involves shifting or resizing the prediction interval to account for any historical bias. This last step is optional and is only applied if the estimator is dissatisfied with the precision, accuracy, or confidence level of his or her estimate. The design of the SISE model focuses specifically on the individual software engineer. Its estimates are based solidly on empirical data gathered by the software engineer and only applicable to that person. Personal skills and experiences are too numerous to list, quantify, and apply to every estimation scenario. Therefore, the SISE model seeks to join empirical 33 data to the process of expert judgment. This results in a model that must be individually calibrated by each software engineer using his or her own personal data. 3.3 The SISE Steps 3.3.1 Step 1: Sort The Sort step involves the ordering of historical data by the actual effort required to complete the activity. The simplest approach is to maintain an electronic record of historical data, such as a spreadsheet or database. The data is then sorted by the actual effort, from smallest to largest. Next, the numeric values associated with each historical data point ? estimated effort, actual effort, etc. ? are hidden, leaving only the text description of the completed tasks; this prevents the software engineer from selecting tasks based on a desired numeric outcome, such as ?eight hours.? 3.3.2 Step 2: Identify Next, the description of the future activity is compared to the descriptions of the histor- ical tasks. Two historical tasks must be located: one confidently smaller and one confidently larger than the future activity. The smaller task should be one which the estimator is con- fident is smaller than the future activity, but is as close as possible in size to the future activity. The larger task should be the inverse: larger in size, but still relatively close. Since the historical data set is already sorted, a very efficient way of locating these two tasks is through the use of a binary search algorithm. 3.3.3 Step 3: Size Once the practitioner has chosen a pair of tasks, the third step ? Size ? produces a rough prediction interval of the future activity?s size. The size of the future activity is inferred by looking at the actual effort values of the two historical tasks. For example, assume the historical record contains twenty completed tasks and the estimator has selected tasks 9 and 34 14 as the two tasks confidently believed to be smaller and larger, respectively, than the future activity. The rough size of the future activity can, therefore, be inferred to fall between the actual sizes of tasks 9 and 14. Prediction intervals are expressed using a low estimate and a high estimate, with the actual value expected to fall somewhere in between. Prediction intervals are expressed using the notation ?[low, high].? For example, the prediction interval [5, 7] means we expect the actual value to fall somewhere between five and seven hours, inclusive [23]. The actual effort values associated with the bracketing tasks represent the low bound and high bound of a prediction interval. However, this interval is a rough estimate of the expected effort and may need to be refined. 3.3.4 Step 4: Evaluate The final step ? Evaluate ? is optional and may be applied in the event the estimator is dissatisfied with the precision, accuracy, or confidence level of the estimate. The estimator may choose to shift the prediction interval based on an analysis of his or her historical bias. This involves analyzing the practitioner?s track record with using SISE and implies a prerequisite: the practitioner has been using SISE or some other prediction interval-based estimation approach and has an idea of his or her historical accuracy. This historical bias is then used to modify the rough estimate to produce a specific estimate. For more details on how to shift a prediction interval based on historical bias, refer to the sidebars Removing Shift Bias and Removing Width Bias. It should be noted that within the SISE model estimation bias is not an indication or measure of error committed by an estimator. Rather, it is a measure of how the best efforts of the estimator translate through the SISE model to create an estimate that mirrors actual effort. 35 Removing Shift Bias Shift bias involves a prediction interval that is too low or too high and may be corrected by shifting the interval. Shift bias exists only if the historical actuals fall predominately below or above the associated prediction intervals; estimation error that is spread equally between overestimatesandunderestimatesisawidthbiasandmustbecorrectedinadifferentmanner. To determine if a shift bias exists, a form of simulation must be conducted. The simulation involves (1) compiling a list of the historical estimation error values, (2) shifting all the historical prediction intervals by each error value, then (3) checking the change in overall hit rate with each shift. Consider, for example, the following historical data: Activity Prediction Interval Actual Error Task 1 10-15 hours 16 hours 1 Task 2 12-16 hours 18 hours 2 Task 3 2-5 hours 5 hours 0 Task 4 1-3 hours 3 hours 0 The hit rate for the unmodified data set is 50%. All the prediction intervals could be shifted by 1 hour, which would cause Task 1 to become a successful prediction. Additionally, all the prediction intervals could be shifted by 2 hours, which would cause Task 2 to become successful. But how would these shifts affect the other predictions? Ifallthepredictionintervalsareshiftedby1hour,thehitraterisesto75%; task1?sprediction interval now contains the actual effort and Tasks 3 and 4 are still successful. If the intervals are shifted by 2 hours, the hit rate rises to 100%. So, given this limited data set, shifting future estimate?s prediction intervals by 2 hours may produce more accurate results. 36 Removing Width Bias Once shift bias has been accounted for, the estimator may wish to either improve their preci- sion or confidence level. This action involves a trade-off since increasing one reduces the other. For example, if the estimator wishes to increase their confidence level, the prediction intervals must be widened, making the estimates less precise. If the estimator wished to increase the precision of their estimates, by reducing the size of the prediction interval, the confidence level in the estimate will be proportionally reduced. Improving the confidence level is accomplished by symmetrically widening all past prediction intervals by whatever amount is necessary to reach a hit rate equal to the desired confidence level. For example, if the historical record demonstrates a hit rate of 50% and the estimator would like to reach a confidence level of 80%, then all the past estimates? prediction intervals are widened until 80% of the actuals fall within the associates prediction intervals. The inverse operation may be performed to improve the precision of the estimates. Past prediction intervals may be symmetrically reduced in size until the desired prediction interval width is reached. The new (and reduced) confidence level may then be calculated by checking the hit rate for the entire historical record. Here?s an example of how widening the prediction interval may allow for an increase in the hit rate from 60% to 80%. Activity Original Prediction Interval Actual New Prediction Interval Task 1 10-15 hours 10 hours 9-16 hours Task 2 12-16 hours 16 hours 11-17 hours Task 3 5-7 hours 6 hours 4-8 hours Task 4 9-11 hours 12 hours 8-12 hours Task 5 13-15 hours 11 hours 12-16 hours Shifting the prediction intervals would not have improved the hit rate; however, if all the prediction intervals are increased by two hours (-1 to the low and +1 to the high), the hit rate moves from 60% to 80%. 37 3.4 Getting Started with SISE Introducing the SISE model into an individual?s software process is simple. As with all regression-based approaches, the first step is to begin tracking effort expended to complete the work activities. As each new task is completed, it is recorded in the historical record with its description, estimated effort, actual effort, etc. This historical record will be the basis for all future estimates. If an estimator has already been tracking his or her time, then this information may be used, as long as it matches the granularity of the future activities to be estimated. ThesoftwareengineerproducesaSISEestimatebyreviewinghisorherhistoricalrecord. Thehistoricalrecordissortedfromsmallesttolargestbyactualeffortandthenumericvalues are hidden from view (step 1). The engineer reviews the list looking for a task that he or she is confident is smaller than the future activity. If a task is located (step 2), the actual effort is revealed and that value is recorded as the low end of the future task?s prediction interval (step 3). If the estimator is not confident that any historical task is smaller than the future activity, then a value of zero is recorded as the low end of the future task?s prediction interval. Next, the estimator reviews the list a second time to locate a confidently larger task, again using only the descriptions of the future and historical tasks. If one is located, the actual effort value is revealed and recorded as the high bound of the future task?s prediction interval. If a larger task cannot be confidently identified, then the upper bound of the future activity?s prediction interval is recorded as ?unknown? using the sign for infinity (1). With the prediction interval for the future activity established, the software engineer proceeds with work on the activity. Once the activity is completed, the actual effort is recorded in the historical record and the process repeats. 38 Measuring Accuracy The accuracy of a single value estimate is determined by the magnitude of the estimate?s error, relative to the actual effort. For example, if an activity is estimated to take 4 hours, but actually takes 5, the magnitude of relative error (MRE) is 0.2 (or 20%). Here is the formula: MRE = (actual-estimate)/actual When using prediction intervals to describe an effort estimate, the practitioner?s accuracy is determined by the number of activities with actual effort values that fall within the predicted interval. Here?s the formula: Hit Rate = No. hits / No. estimates For example, consider the following list of work activities. Activity Prediction Interval Actual Task 1 10-15 hours 12 hours Task 2 12-16 hours 15 hours Task 3 2-5 hours 5 hours Task 4 1-2 hours 3 hours Task 5 16-22 hours 15 hours Task 6 9-13 hours 12 hours Task 7 4-6 hours 5 hours Task 8 6-8 hours 8 hours Task 9 3-4 hours 4 hours Task 10 6-10 hours 9 hours Eight of the ten activities were completed within the time frame defined by the prediction interval; Tasks 4 and 5 took more and less time, respectively, than predicted. Therefore, the hit rate for this sample is 0.8, or 80%. 3.5 Accuracy, Precision, and Confidence Level By using a prediction interval as the basis for estimates, the SISE model presents the software engineer with a competing set of factors: accuracy, precision, and confidence. The accuracy of an estimate is measured in different ways depending on the type of estimate. Many project managers and project management applications expect an effort estimate to be phrased as a single value. Single value estimates are easy to understand, simple to aggregate, and are expected to be wrong. After all, what is the probability that an activity estimated at 10 hours will take exactly 600.00 minutes? Therefore, the accuracy of a single value estimate is measured in terms of its error (see sidebar titled Measuring Accuracy). 39 The accuracy of a prediction interval, on the other hand, is measured by how often the actual effort falls within the interval. The overall percentage of actual effort values falling within their prediction intervals is known as the hit rate. Several logical observations can be made about the use of a hit rate. First, wider prediction intervals are less precise and will typically produce higher hit rates; conversely, smaller prediction intervals are more precise and will typically produce lower hit rates. In other words, precision and accuracy are inversely proportional, generally tasking the estimator with balancing the two. For ease of use, the SISE model deliberately takes a statistically simplistic approach to assigning confidence levels; the model assumes the software engineer will repeatedly employ the same method for determining relative size and creating estimates. Based on this assump- tion, the estimator?s past performance can be used as a predictor of future performance. For example, if an estimator?s hit rate is 50%, it can be said that half of the activities they have estimated have had actual effort values that fell within his or her prediction interval. Therefore, all things being equal, a new estimate has a 50% probability of being correct. Put another way, the estimator has a 50% confidence level in his or her next estimate. Note that confidence level should not be confused with an estimator?s logical or emo- tional confidence in his or her abilities and estimates. It can be assumed that when an estimator produces an estimate, he or she does so to the best of their ability; the estimator is confident the estimate is correct. Confidence level, on the other hand, is a measure of the probability that the estimate will be correct and allows the estimator to make statements such as: In the past, my estimates have been correct 90% of the time. Therefore, I have a 90% confident level in my next estimate, which I feel confident I have done my best in constructing. Beginningwiththefirstestimate, theSISEmodelassignseachnewestimateaconfidence level based on the estimator?s current hit rate. As noted in the fourth step of SISE, however, the estimator may take steps to adjust this confidence level by compensating for historical 40 Smaller ! Task Low Est. (hours) High Est. (hours) Actual (hours) Design security model 0 1 10 Larger Hit rate = 100% Table 3.1: One completed activity. bias (see sidebar Adjusting for Width Bias and Adjusting for Shift Bias). Note that shift and width biases are not to be viewed as errors on the part of the estimator; rather they are to be viewed as the manner in which the SISE model adapts to an individual software engineer?s perspective of past and future work. 3.6 SISE Example Assume a software engineer, who has never engaged in time tracking, has decided to begin using the SISE model for his web development project. The developer been assigned a new work activity: ?Design security model.? Given that the software engineer?s historical record is empty, he has no data points for an estimate; no smaller task or larger task can be identified to use as the basis for a prediction interval. Therefore, following the SISE model, the prediction interval for the first activity is [0, 1]. Once the first activity is completed and the actual value is recorded, the hit rate is calculated to be 100% (see Table 3.1). The next activity assigned to the software engineer is to ?Design the user model.? Since only one items exists in the historical record, the first SISE step (sorting) is complete by default. Our software engineer hides all but the first column and compares the future activ- ity?s description to the task description in the historical record. He decides that designing a user model is easier than designing a security model; we have a larger historical task, but no smaller one. The estimate, therefore, is a prediction interval of [0,10]. Our confidence in the estimate is equal to the hit rate, which is currently 100%. 41 Smaller ! Task Low Est. (hours) High Est. (hours) Actual (hours) Design user model 0 10 8 Design security model 0 1 10 Larger Hit rate = 100% Table 3.2: Two completed activities. Smaller ! Task Low Est. (hours) High Est. (hours) Actual (hours) Design user model 0 10 8 Design security model 0 1 10 Larger Design content model 8 10 11 Hit rate = 67% Table 3.3: Three completed activities. Work proceeds and the activity is completed in eight hours. The estimate and actual are recorded and the new hit rate is calculated to be 100% (see Table 3.2). For convenience, the historical data in these examples will be kept sorted from smallest to largest task. The third activity is assigned to the software engineer: ?Design the content model.? Our software engineer scans the historical record, after hiding the numeric values, and decides that ?designing a user model? is smaller and ?designing a security model? is larger. Therefore, the prediction interval for the future activity is set at [8, 10]. The work is completed with an actual effort of 11 hours, giving a new hit rate of 67%, with two of the three completed tasks falling within his prediction intervals (see Table 3.3). A fourth activity is assigned to the software engineer: ?design database tables.? By scanning the historical record?s task descriptions, the software engineer decides the confi- dently larger task is ?design content model,? but is unable to designate a smaller task. The prediction interval, therefore, is set as [0, 11]. 42 Smaller ! Task Low Est. (hours) High Est. (hours) Actual (hours) Design database tables 0 11 6 Design user model 0 10 8 Larger Design security model 0 1 10 Design content model 8 10 11 Hit rate = 75% Table 3.4: Four completed activities. The confidence level is assumed to be 67%, based on the historical hit rate. After referring to the sidebars on adjusting for bias, the software engineer considers making a shift adjustment. A one-hour upward shift of all the historical prediction intervals would move the hit rate from 67% to 100%. This leaves the estimator with two choices. The estimate?s prediction interval could be shifted one hour upward to account for a possible historical bias, or the estimate could be left alone. In short, the estimator now has two options to choose from: [0, 11] with a 67% confidence level or [1, 12] with a confidence level of 100%. Assume the estimator chooses to not shift the estimate due to the small data set size; the work is performed and recorded (see Table 3.4). Assuming the software engineer proceeds in this fashion, he will accumulate a sizable historical record. With each hit or miss within the prediction interval, the hit rate will rise and fall. The software engineer may, at some point, choose to adjust a future estimate for width bias in order to increase his confidence level in a new estimate. Here?s a simple example, assuming ten completed tasks, with no verifiable shift bias to correct. As Table 3.5 indicates, the hit rate is 70%, with three of the ten tasks falling outside their prediction intervals. A future activity, ?Create Contact Us page,? has been assigned a prediction interval of [2, 8] and the confidence level is assumed to be 70%. In this case, however, the manager has requested a higher confidence level. To accomplish this, the software engineer adjusts for width bias. 43 Smaller ! Task Low Est. (hours) High Est. (hours) Actual (hours) Missed Prediction Interval? Design FAQ data model 0 4 2 Create FAQ classes 2 6 2 Create security classes 2 8 3 Create user classes 5 8 4 Yes Create database tables in MySQL 0 6 5 Design database tables 0 11 6 Design user model 0 10 8 Design security model 0 1 10 Larger Design content model 8 10 11 Yes Create data connector classes 0 11 14 Yes Hit rate = 70% Table 3.5: Ten completed activities. Smaller ! Task LowEst. (hours) Adj. Low High Est. (hours) Adj. High Actual (hours) Missed Prediction Interval? Design FAQ data model 0 0 4 5 2 Create FAQ classes 2 1 6 7 2 Create security classes 2 1 8 9 3 Create user classes 5 4 8 9 4 Create database tables in MySQL 0 0 6 7 5 Design database tables 0 0 11 12 6 Larger Design user model 0 0 10 11 8 Design security model 0 0 1 1 10 Design content model 8 7 10 11 11 Create data connector classes 0 0 11 12 14 Yes Hit rate = 90% Table 3.6: Adjusting for width bias. The margins of error for each of the three tasks are one hour, one hour, and three hours, respectively. If the prediction intervals for all historical tasks were increased by one hour in each direction, the hit rate would rise to 90%. See Table 3.6. Therefore, the prediction interval for the future activity ?Create Contact Us page? must also be adjusted using a one-hour expansion, making it [1, 9] with a confidence level of 90%. In summary, the software engineer has a choice between two, fact-based estimates: [2,8] with a 70% confidence level or [1,9] with a 90% confidence level. Each of the subsequent iterations through the SISE model follows a similar pattern to those reviewed above. The software engineer is assigned a new activity to complete. The activity is compared to previously completed tasks to identify a smaller and larger task, 44 which leads to a prediction interval. The prediction interval is adjusted, if necessary and possible, to achieve a desired confidence level or prediction interval. 3.7 Validation of SISE The SISE model has been validated through a multi-step process. First, over 100 soft- ware engineering students participated in a relative sizing activity, where they were asked to identify the larger of two tasks, based solely on the task descriptions. The results demon- strated that a majority of students were able to identify the larger task two-thirds of the time. Equally as important, the results indicated that students, on average, were unlikely to incorrectly identify a task?s size; instead, they tended to identify the tasks as similar in size. The next step in validating SISE involved sizing estimates using classroom programming assignments. Each student constructed a SISE-style estimate, as well as, an estimate based on a proxy-based model, derived from PSP?s PROBE model. Overall, the SISE model?s predictions proved no more or less accurate than the proxy-based approach. In addition, the students indicated that SISE, in their opinion, took less time and was based on less a complex model. 3.8 Conclusion The SISE model represents an empirically based approach to effort estimation that re- lies less on complex mathematical models and more on intuitive expert judgment, without sacrificing the quality of the final product. Software engineers willing to take the first tenta- tive steps toward adopting a personal process now have access to a truly lightweight, agile estimation model. The SISE model does not burden the practitioner with any more work than the absolute minimum necessary to produce a reasonably accurate, fact-based effort estimate. In addition, the model is the first of its kind, suitable for use by a single software engineer. 45 Further development and improvements to the model are currently underway at Auburn University?s microISV Research Lab. We are formalizing ways in which the SISE model may be integrated into team-based software processes, as well as tool development. For more information, visit our website at http://microisvresearch.org. 46 Chapter 4 Validation of the SISE Model 4.1 Abstract Personal software processes rely on individuals approaching software development ac- tivities in a systematic and structured manner. Time management has been shown to be a key component in meeting obligations, but relies on relatively accurate effort estimation techniques. Despite a plethora of research into team-based effort estimation models, few models are suitable for use by individual software engineers. Models tailored to the indi- vidual include guesswork, an approach commonly used in industry; the PCSE model, under development at Auburn University; and the PROBE model, the only peer-reviewed model devoted to lone software engineers. This spectrum of choices features a gap between guess- work and more formal models, which could be filled with a lightweight, agile, and reasonably accurate alternative. The SISE model combines expert judgment ? in the form of relative sizing decisions ? with empirical, historical data to create such an alternative. Four key fea- tures of the model have undergone validation through a series of surveys and experiments: relative sizing by software engineers, model accuracy, perceived time investment, and per- ceived value. This research has demonstrated that software engineers are generally capable of sizing development tasks relative to each other based solely on the tasks? descriptions, which is a key feature of SISE. Additionally, this research has demonstrated that the SISE model?s accuracy is not significantly different from that of PROBE, its nearest validated competitor. This research has also indirectly demonstrated that software engineers perceive the SISE model to require a smaller time investment than the PROBE mode, by relative comparison to the PCSE model, which represents a subset of the PROBE model. Lastly, this 47 research has demonstrated that software engineers perceive the value of the SISE model?s results to be higher than that of guesswork. 4.2 Introduction Softwareengineeringasadisciplineseekstocreateasystematicandstructuredapproach to the development of software. In order to effectively manage software development efforts, monitoring and controlling activities must be applied, especially at the team level. However, such management activities should also be self-imposed by individual software engineers as they manage themselves and their individual efforts. Time management is one of the most critical of these elements. A software engineer must be willing and capable of properly allocating his or her time and effort as necessary to either complete tasks on time or provide project management resources with the information necessary to properly manage risks. A major component of time management is the forecasting of future effort, known as effort estimation or, less formally, sizing. The ability of a software engineer or engineering team to predict future effort directly impacts the ability to allocate resources, schedule development cycles, and meet client needs in a timely fashion. As a result, a great deal of time and effort has been spent developing estimation models for teams. These models encompass the full spectrum, from lightweight, agile models such as Planning Poker, to complex, all-encompassing models such as COCOMO. The vast majority of effort estimation models rely on a team of individuals to set up and implement. Planning Poker, for example, relies on the members of a team to discuss requirements, share their thoughts, and iteratively refine a series of estimates. COCOMO, on the other hand, requires weeks, months, or years of data collection and planning to properly tune the core metrics that drive the underlying estimation model, which is focused on large- scale effort that are too large for a single developer to attempt. As a result, individual software engineers do not have access to a simple, agile estimation tool to use for their own benefit when posed the all-too-common question ?When can you have this done?? 48 Twoestimation modelsexist, which were specificallydesignedfortheindividualsoftware engineer. Watts S. Humphrey introduced the first model ? PROBE ? in A Discipline for Software Engineering [17]. PROBE relies on careful record-keeping and the use of proxies to estimate future effort. The underlying model is built upon a series of complex mathematical formulas and workflows to derive an estimate. The second model ? PCSE ? is currently under development at Auburn University and it also relies on the use of proxies to build estimates, but uses only a subset of PROBE features. Thefinalestimationapproachfrequentlyusedinindustryisexpertjudgment, commonly known as guesswork [21]. Expert judgment relies on an individual software engineer?s ability to assign a numeric value to the estimate based on his or her knowledge, experiences, and intuition. Expert judgment is not a prescriptive model and estimators may use a variety of conscious or subconscious approaches to construct the estimate, such as analogy or work breakdown [42] [26]. Unlike the complete spectrum of effort estimation options available to teams, the indi- vidual software engineer?s array of estimation tools has a gap. The gap falls directly between expert knowledge and the two aforementioned models (PROBE and PCSE). Any viable solution for filling this gap would be best describes as lightweight, agile, and reasonably accurate. 4.3 SISE: An Agile Estimation Model The SISE model is designed to provide individual software engineers with a lightweight, agile, and reasonably accurate effort estimation model. The SISE model is built upon a few common estimation concepts. First, SISE relies on regression, the principle that future effort may be predicted based on past actual effort. Second, effort predictions should always be formulated as a prediction interval: a range in which the actual effort is expected to fall. Third, a confidence level must be assigned to any estimate to indicate the likelihood of the 49 actual effort falling within the prediction interval. Lastly, the confidence level for an estimate is directly related to the practitioner?s historical accuracy in constructing estimates. Building on these four concepts, the SISE model assumes the following: a past activity, which is perceived to be smaller than a future activity, may be used as the low bound of the future task?s prediction interval; the reverse is also true for a larger activity used as the high bound of the prediction interval. In addition, an analysis of an estimator?s historical accuracy (or error) in predicting future effort can be used to establish an optimal size for the prediction interval as it correlates to a desired confidence level. ?SISE? is an acronym for the four-step process underlying the model: Sort, Identify, Size, Evaluate. The model begins by having the estimator sort his or her historical work activities by actual effort from smallest to largest. Next, a future task?s description is compared to the descriptions of the past activities; comparisons of numbers (e.g. actual effort, estimated effort, etc.) should be avoided to prevent the introduction of biases based on unconscious guesswork by the estimator. The estimator must identify two tasks, one perceived to be smaller than the future activity, one perceived to be larger, and both as close to the future activity as possible. The actual effort values of these two tasks constitute the rough prediction interval, or size, of the future task. The last step involves evaluating the rough estimate and ? if the estimator is dissatisfied with the precision, accuracy, or confidence level of the estimate ? adjusting for historical bias. 4.4 Relative Sizing The first step in validating the SISE model lies in an examination of its core underlying assumption: software engineers are generally capable of identifying the larger of two software developments tasks. This principle of the SISE model was tested through a relative sizing survey. 50 4.4.1 Hypothesis The SISE model relies on the abilities of an estimator to predict the relative size of each future activity, as compared to past, completed tasks. To confirm that a software engineer is generally capable of relative sizing, a survey was constructed to test the following hypothesis: Ha1: An estimator is, on average, capable of identifying the larger of a pair of tasks, in terms of required effort to complete (selection accuracy > 0.5). with the null hypothesis as H01: An estimator is, on average, unable to identify the larger of a pair of tasks, in terms of required effort to complete (selection accuracy 0.5). 4.4.2 Survey Project data was gathered from the Auburn University?s Software Process course for the last 10 years; the data represented 4,060 individual programming projects for 772 students with 53 unique programming assignments. Students completing these assignments were required to maintain a time log of their efforts. Table 4.1 lists the name, average effort, and sample size for nine distinct tasks. Appendix C ? Relative Sizing Survey Questions contains the ten survey questions with the stated requirements, as provided to the students completing the work and those responding to the survey. Data Distribution An inspection of the actual effort values demonstrated that the data points do not follow a normal distribution. Figure 4.1 shows the entire dataset and Figure 4.2 shows the less extreme data points between 1 and 1,000 minutes. Due to the non-normal distribution of data, parametric tests, such as a t-test, were ruled inappropriate for data analysis; therefore, non-parametric tests were utilized, such as the Mann?Whitney U test. 51 Assignment Avg. Effort (min.) Sample Size CriticalPath 408.9 114 T-Dist 292.5 279 T-Dist2 285.7 29 ComponentInfo 265.2 27 5-Slot 215.8 29 Text 202.5 38 CalcCorr 200.6 28 A-M-S 194.0 84 M-P-P-S 143.1 49 Table 4.1: Average construction effort (in minutes). Figure 4.1: Distribution of actual construction times (all data). Figure 4.2: Distribution of actual construction times (values < 1,000 minutes). 52 No. Assignment Pair Diff. Psuedomedian 95% CI P-Value 1 T-Dist vs. M-P-P-S 173 105.0 75.0 - INF 1.007e-09 2 CriticalPath vs. T-Dist 91 67.0 37.0 - INF 0.0001667 3 T-Dist vs. 5-Slot 90 62.0 15.0 - INF 0.01516 4 Text vs. M-P-P-S 57 47.4 16.0 - INF 0.004951 5 T-Dist2 vs. CalcCorr 95 46.0 -5.0 - INF 0.06837 6 5-Slot vs. M-P-P-S 83 43.8 -5.4 - INF 0.05171 7 ComponentInfo vs. 5-Slot 362 38.0 -34.0 - INF 0.1399 8 Text vs. A-M-S 6 22.0 -6.0 - INF 0.1012 9 Text vs. CalcCorr 2 21.0 -20.0 - INF 0.1689 10 A-M-S vs. M-P-P-S 51 27.0 -3.0 - INF 0.06977 Table 4.2: Average construction time comparison for assignment pairs. Task Pair Selection Based on the student time logs, four diverse task pairs, numbered 1 through 4, were identified, which have a statistically significant difference in size (95% confidence) based on actual construction effort. Six uniform task pairs, numbered 5 through 10, were identified that had no statistically provable difference in size. The statistical difference was determined via a Mann?Whitney U test using R. Output Listings 1 - 10 show the results from R; Table 4.2 summarizes the task pairs, the difference in average construction times, an estimate of the psuedomedian, the 95% nonparametric confidence interval for the difference, and the associated p-value for the comparison. 4.4.3 Metrics Estimators presented with two tasks that are significantly different in size (i.e. a correct answer exists) should demonstrate a significant tendency to choose the larger of the two. Given that random chance in an A/B scenario would result in 50% accuracy, a group of estimators who demonstrate selection accuracy significantly greater than 50% are making selections in a non-random manner. To determine if the group?s selection accuracy is sig- nificantly greater than 50%, a 1-sample proportion test was employed using R (accuracy > 0.5). 53 Conversely, estimatorspresentedwithtwotasksthatarenotsignificantlydifferentinsize (i.e there is no correct answer) should approximate random chance in their selections. In this case, a group of estimators who demonstrate selection accuracy significantly different from 50% are making selections in a non-random manner. To determine if the group?s selection accuracy is significantly different from 50%, a 1-sample proportion test was employed using R (accuracy6= 0.5). An additional metric was employed to determine if a significant proportion of the re- spondents were able to select a simple majority of correct answers; since only task pairs 1-4 had a statistically correct answer, the respondents? answers to task pairs 5-10 were excluded from consideration. To determine if this proportion was significant, the number of respon- dents who correctly identified more than half the larger tasks was compared to the total number of respondents, via a 1-sample proportion test using R (proportion > 0.5). Output Listing 21 lists the results. Finally, a linguistic analysis was performed on the rationales provided by each respon- dent to determine what general themes motivated his or her selections. The analysis involved a subjective assignment of a theme category based on word and phrase choice, such as ?com- plex,? ?familiar with,? or ?number of methods.? The number of responses including each theme was tabulated and compared. 4.4.4 Participants Survey respondents were selected from Auburn University?s Department of Computer Science and Software Engineer classes, including Software Process (COMP 5700) and Data Structures (COMP 2210), in Fall 2012. The Software Process course is designed to pro- vide: insight into process-oriented software development; exposure to common engineering processes; and experience with a software process [12]. The Data Structures course is a continuation of the introductory programming course with emphasis on data structures such 54 as lists, trees, graphs, and hash tables [6]. A total of 113 software engineering students responded. 4.4.5 Questions and Presentation The survey was presented as a ten page questionnaire, with each page detailing two task descriptions, side-by-side. Appendix D ? Attitudinal Survey Questions contains the complete descriptions of each of the nine tasks used in the task pairs. The following instruction were provided with each task pair: Read the descriptions for the two tasks below. Based on your knowledge and experience, select which of the two tasks you believe will require the most effort to complete; this is the ?larger? task. In the box at the bottom of the page, write the number of the larger task. In the space provided, write a one-sentence justification for selecting that task as the larger task. Respondents were not allowed to specify that a task pair was equal in size and had to choose a larger task in all instances. In addition to the relative sizing questions, each respondent was asked to identify his or herenrollment(undergraduate, masters, doctoral), programofstudy, andcompletedcourses. 4.4.6 Results Table 4.3 shows the respondents? accuracy for task pairs 1-4 and the respondents? selec- tion of the first vs. second task for task pairs 5-10; the table also shows the corresponding 95% confidence interval for the proportion of correct answers. Figure 4.3 also shows the results of the survey, including the accuracy of the responses and the corresponding 95% confidence interval for the proportion. The possibility that a particular answer was selected at random is 50%, as demonstrated by the red line. Response ranges that do not include the 50% mark are considered to be non-random. 55 Task Pair Correct 95% Prop. CI Div erse 1 84% 77-100% 2 71 63-100 3 50 41-100 4 63 63-100 Uniform Task Pair First Task 95% Prop. CI 5 46% 36-55% 6 81 72-87 7 56 47-66 8 67 58-77 9 68 59-76 10 95 88-98 Table 4.3: Relative Sizing Survey Results. Figure 4.3: Relative sizing survey results. 56 M-P-P-S T-Dist Total Theme Count Pct. Count Pct. Count Pct. Data 3 17% 0 0% 3 3% Familiarity 3 17 14 15 17 15 Math 3 17 15 16 18 16 Methods 5 28 5 5 10 9 Planning 0 0 3 3 3 3 Problem 1 6 7 7 8 7 Reuse 0 0 4 4 4 4 Simplicity 2 11 44 46 46 41 Testing 0 0 2 2 2 2 Unspecified 1 6 1 1 2 2 Total 18 16 95 84 113 Table 4.4: Task pair 1 distribution of themes in respondents? rationales. Diverse Pairs For task pairs 1, 2, and 4, the proportion of correct responses was significantly greater than 50%, indicating non-random selection. The proportion of correct responses for task pair 3 was not significantly different from 50% and failed to demonstrate non-random selection. The linguistic analyses of themes in the respondents? rationales are listed in Tables 4.4 - 4.7. For task pair 1, a significant majority of the respondents correctly selected the T-Dist assignment as larger. The dominant theme was the simplicity of the M-P-P-S assignment or the complexity of the T-Dist assignment, indicating a majority of the respondents intuitively selected the larger task. For task pair 2, a significant majority of the respondents correctly selected the Criti- calPath assignment as larger. Although the dominant theme was, again, the simplicity or the complexity, the respondents for this theme were roughly split between the two assign- ments. However, a significant number of respondents voting in favor of the CriticalPath assignment as larger, cited their familiarity with the assignment as the rationale, breaking the tie. In addition, a large number of respondents cited the data handling requirements and other problem-specific features of the Critical Path assignment as reason for choosing it as the larger task. In summary, this indicates that a roughly equal number of students 57 CriticalPath T-Dist Total Theme Count Pct. Count Pct. Count Pct. Data 10 13 2 6 12 11 Familiarity 15 19 1 3 16 14 Math 1 1 2 6 3 3 Methods 6 8 4 12 10 9 Planning 7 9 2 6 9 8 Problem 11 14 4 12 15 13 Reuse 1 1 0 0 1 1 Simplicity 22 28 18 55 40 35 Testing 4 5 0 0 4 4 Unspecified 3 4 0 0 3 3 Total 80 71 33 29 113 Table 4.5: Task pair 2 distribution of themes in respondents? rationales. made an intuitive decision about the assignments? relative size, some correctly and some in- correctly. However, a significant number of students made their choices based on the larger assignment?s complexity by evaluating the problem details, resulting in a correct decision, on average. For task pair 3, the respondents were split 56/57 between the T-Dist and 5-Slot as- signments, despite the T-Dist assignment being historically larger. The provided rationales indicated that amajority ? 37vs. 12 ? ofrespondentsintuitively (andincorrectly) viewed the 5-Slot assignment as more complex or the T-Dist assignment as simpler. On the other hand, a larger majority ? 22 vs. 2 ? of respondents (correctly) cited the T-Dist assignment as larger due to the size of the assignment in terms of lines of code, methods, or functions; in fact, an optimal implementation of the 5-Slot assignment will primarily involve a single method, called by the remaining methods with different parameter values. In summary, a significant portion of the respondents failed to recognize certain attributes of the assignments, such as code reuse, leading to an incorrect relative sizing. For task pair 4, a significant majority of the respondents correctly selected the Text assignment as larger. Although the dominant theme was the simplicity or the complexity, the respondents for this theme were roughly split between the two assignments. However, 58 T-Dist 5-Slot Total Theme Count Pct. Count Pct. Count Pct. Data 4 7 0 0 4 4 Familiarity 6 11 6 11 12 11 Math 2 4 0 0 2 2 Methods 22 39 2 4 24 21 Problem 2 4 2 4 4 4 Reuse 1 2 4 7 5 4 Simplicity 12 21 37 65 49 43 Testing 0 0 2 4 2 2 Unspecified 7 13 4 7 10 9 Total 56 50 57 50 113 Table 4.6: Task pair 3 distribution of themes in respondents? rationales. a significant number of respondents voting in favor of the Text assignment as larger, cited the complexities of file processing or other problem-specific attributes of the assignment, breaking the tie. In summary, this indicates that a roughly equal number of students made anintuitivedecisionabouttheassignments?relativesize, somecorrectlyandsomeincorrectly. However, asignificantnumberofstudentsmadetheirchoicesbasedonthelargerassignment?s complexity by evaluating the problem details, resulting in a correct decision, on average. In summary, when presented with two tasks of distinctly different size, a majority of software engineering students were able to successfully identify the larger task in three- quarters of the instances. Uniform Pairs An analysis of the responses to the uniform task pairs revealed varying results. In a situation where there is no discernible difference in size for two tasks, one would expect the survey responses to approach a random distribution in choosing between one task and the other (i.e. the responses should be split 50-50). However, this only occurred for two of the six uniform task pairs. For the other four, a majority of respondents selected one of the tasks as larger. 59 M-P-P-S Text Total Theme Count Pct. Count Pct. Count Pct. Data 1 2 1 1 2 2 Familiarity 5 12 4 6 9 8 File 2 5 12 17 14 12 Math 4 10 5 7 9 8 Methods 5 12 3 4 8 7 Planning 1 2 6 8 7 6 Problem 8 19 11 15 19 17 Reuse 2 5 2 3 4 4 Simplicity 14 33 17 24 31 27 Testing 0 0 6 8 6 5 Unspecified 0 0 4 6 4 4 Total 42 37 71 63 113 Table 4.7: Task pair 4 distribution of themes in respondents? rationales. A linguistic analysis was performed on the respondents? rationale for choosing a partic- ular task. The analysis indicated that respondents were influenced either by specific aspects of the problem definition or by the format, word count, and word choice of the requirements. For task pair 5, the respondents were split 59/51, which is consistent with the fact that neither assignment is historically larger than the other. The respondents cited a variety of themes ? familiarity, length of code, problem details, simplicity ? with no single theme demonstrating a significant majority. In summary, the students were unable to cite a single, consistent reason for sizing one task larger than the other. For task pair 6, respondents selected the 5-Slot assignment as larger than the M-P-P-S assignment by more than a 4-to-1 margin. The two most commonly cited themes were the relative simplicities/complexities of the assignments and the length of the required code. In summary, a majority of the respondents incorrectly viewed the 5-Slot assignment as larger than the M-P-P-S assignment, similar to the results of task pair 3. For task pair 7, the respondents were split 57/55, which is consistent with the fact that neither assignment is historically larger than the other. The respondents cited a variety of themes ? length of code, reuse, simplicity ? with no single theme demonstrating a significant 60 CalcCorr T-Dist2 Total Theme Count Pct. Count Pct. Count Pct. Data 1 2 0 0 1 1 Familiarity 9 15 4 8 13 12 File 1 2 0 0 1 1 Math 3 5 3 6 6 5 Methods 7 12 18 35 25 23 Planning 1 2 0 0 1 1 Problem 3 5 7 14 10 9 Reuse 1 2 0 0 1 1 Simplicity 27 46 15 29 42 38 Testing 0 0 1 2 1 1 Unspecified 6 10 3 6 9 8 Total 59 54 51 46 110 Table 4.8: Task pair 5 distribution of themes in respondents? rationales. M-P-P-S 5-Slot Total Theme Count Pct. Count Pct. Count Pct. Data 0 0 1 1 1 1 Familiarity 0 0 2 2 2 2 File 1 5 0 0 1 1 Math 2 9 6 7 8 7 Methods 6 27 51 56 57 50 Problem 7 32 8 9 15 13 Reuse 3 14 2 2 5 4 Simplicity 2 9 17 19 19 17 Testing 0 0 2 2 2 2 Unspecified 1 5 2 2 3 3 Total 22 19 91 81 113 Table 4.9: Task pair 6 distribution of themes in respondents? rationales. 61 ComponentInfo 5-Slot Total Theme Count Pct. Count Pct. Count Pct. Data 3 5 1 2 4 4 Familiarity 2 4 5 9 7 6 File 1 2 3 5 4 4 Math 4 7 5 9 9 8 Methods 9 16 11 20 20 18 Planning 3 5 3 5 6 5 Problem 4 7 1 2 5 4 Reuse 8 14 6 11 14 13 Simplicity 13 23 16 29 29 26 Testing 2 4 1 2 3 3 Text 5 9 2 4 7 6 Unspecified 3 5 1 2 4 4 Total 57 51 55 49 112 Table 4.10: Task pair 7 distribution of themes in respondents? rationales. majority. In summary, the students were unable to cite a single, consistent reason for sizing one task larger than the other. For task pair 8, respondents selected the Text assignment as larger than the A-M-S assignment by a 2-to-1 margin. Two of the three most commonly cited themes were the rel- ative simplicities/complexities of the assignments and the mathematics involved; references to these themes were roughly split between the two assignments. The third commonly ref- erenced theme was challenge of implementing code with file handling; respondents in favor of the Text assignment as larger referenced this theme more often by a 10-to-1 margin. In summary, a majority of the respondents incorrectly viewed the Text assignment as larger than the A-M-S assignment due to the perceived complexities of the required file operations. For task pair 9, respondents selected the Text assignment as larger than the CalcCorr assignment by more than a 2-to-1 margin; it should be noted that the CalcCorr assignment was accompanied by a six-page handout describing, in detail, the process for completing the required mathematical calculations. The most commonly cited themes were the students? familiarity with the requirements, relative simplicities/complexities of the assignments, the requirement to read files, the requirements to process text, and the mathematics involved. Of 62 Text A-M-S Total Theme Count Pct. Count Pct. Count Pct. Data 1 1 0 0 1 1 Familiarity 7 9 2 5 9 8 File 20 26 2 5 22 19 Math 10 13 14 38 24 21 Methods 4 5 4 11 8 7 Planning 4 5 0 0 4 4 Problem 1 1 1 3 2 2 Reuse 1 1 0 0 1 1 Simplicity 18 24 7 19 25 22 Testing 2 3 0 0 2 2 Text 1 1 0 0 1 1 Unspecified 7 9 7 19 14 12 Total 76 67 37 33 113 Table 4.11: Task pair 8 distribution of themes in respondents? rationales. the respondents selecting the CalcCorr assignment as larger, a significant portion referenced the file and text processing themes; of the respondents selecting the Text assignment as larger, a significant portion referenced their familiarity, the mathematics, and the intuitive judgment of the relative simplicity/complexity. In summary, a majority of the respondents incorrectly viewed the Text assignment as larger than the CalcCorr assignment due to a general attitude was that unstructured text processing presenting more inherent difficulties than a highly structured and well-documented, complex math formula. CalcCorr Text Total Theme Count Pct. Count Pct. Count Pct. Familiarity 2 6 11 14 13 12 File 7 19 2 3 9 8 Math 5 14 33 43 38 34 Planning 4 11 6 8 10 9 Simplicity 5 14 21 27 26 23 Testing 1 3 1 1 2 2 Text 12 33 0 0 12 11 Unspecified 0 0 3 4 2 2 Total 36 32 77 68 113 Table 4.12: Task pair 9 distribution of themes in respondents? rationales. 63 For task pair 10, 95% of the respondents selected the M-P-P-S assignment as larger than the A-M-S assignment, despite the fact that, based on the historical data, the A-M-S assignment took an average of 51 minutes longer to complete (though this difference is not statistically significant, given the small dataset). A review of the requirements and student comments revealed the A-M-S assignment to be a subset of the M-P-P-S assignment; in other words, to complete the M-P-P-S assignment, a software engineer would need to first complete all the requirements of A-M-S, then, additional work would need to be done to complete M-P-P-S. Interestingly, 95% of the respondents recognized this relationship and identified the M-P-P-S assignment as larger. In summary, when presented with two tasks in which neither task is significantly larger than the other, software engineering students do not relied on a single, consistent method or process; their decision-making involves a variety of influencers such as personal experience, problem-specific attributes, word choice, and/or description length. 4.4.7 Individual Respondents An analysis was performed to determine if a significant proportion of the respondents were able to select a simple majority of correct answers. Since task pairs 1-4 were the only pairs with a demonstrably correct answer, task pairs 5-10 were excluded. Of the 113 respondents, 72 correctly identified at least 3 out of the four larger tasks. Figure 4.4 shows the distribution of correct answers among individual respondents and Output Listing 21 shows the 1-sample proportion test conducted using R (accuracy > 0.5). In summary, the 95% confidence interval for the proportion of respondents with a sim- ple majority of correct answers is 55.6-100% with a p-value < 0.0001 making the results significant. 64 Figure 4.4: Distribution of percentage of correct answers (task pairs 1-4). 4.4.8 Conclusions Overall, the survey results indicate that software engineers are generally capable of identifying the larger of two tasks when the size difference is statistically significant. When the two tasks are not significantly different in size, software engineers tend to choose based on their perceptions of the task complexity, as influenced by the problem description. Lastly, individual responses indicate that a significant proportion of software engineers are capable of selecting a simple majority of correct answers. Decision: Reject the null hypothesis H01 in favor of the alternate Ha1, concluding that software engineers are generally capable of identifying the larger of two tasks. 4.5 Accuracy Direct validation of the accuracy of the SISE model began in the Software Process classroom in Fall 2012 and continued through Spring 2013. Prior to beginning their last assignment of the semester, the students were asked to review the requirements and size it relative to their past assignments; this relative sizing formed the basis for a SISE estimate. The accuracy of these estimates was then compared to the accuracy of PROBE estimates, collected from the Software Process course over a period from 2001 to 2008. 65 4.5.1 Hypothesis The accuracy of the SISE model should, at a minimum, equal the accuracy of existing, validated models, tailored to the individual. To confirm the relative accuracy of SISE, an experiment was constructed to test the following hypothesis: Ha2: The SISE model produces estimates that are equally or more accurate than PROBE (accuracySISE accuracyPROBE). with the null hypothesis as H02: The SISE model produces estimates that are less accurate than PROBE (accuracySISE < accuracyPROBE). 4.5.2 Experiment Students enrolled in the Software Process course were taught and required to follow a personal software process. Each coding assignment was accompanied by a spreadsheet, completed by the student, which detailed his or her adherence to an individual software process and activities as they related to the completion of the assignment. The components ofthespreadsheetthatarerelevanttothisresearchincludethedescriptionoftheassignment?s requirements, adetailedtimelog, andvariousestimates. Eachstudentwasinstructedtokeep a careful record of time spent during each aspect of construction and project management. From 2001 through 2008, the students in the Software Process course completed a por- tionoftheirassignmentsbyprovidingaPROBEeffortestimate. Atotalof34distinctassign- ments that included PROBE estimates were completed by students across twelve semesters. Within that sample, 406 assignments produced valid estimates utilizing a prediction interval. These estimates formed the basis for the accuracy measurement of the PROBE model. In the Fall 2012 and Spring 2013 classes, students compiled a historical record of their actual effort values by completing a series of six to seven coding assignments. Prior to beginningworkontheirlastassignment, thestudentscompletedabriefsurveyontherelative 66 sizing for the future assignment as compared to the completed ones; the results were used to construct a SISE-style estimate. From 2001 through 2012, for all assignments that did not include a PROBE or PCSE estimate, the students were required to construct an estimate prior to beginning development work based on their expert judgment. This estimate was included in their spreadsheets and turned in with their completed work. No mechanism was imposed to prevent these estimates from being constructed after development had begun or after it had been completed. Due to this fact, the estimates based on expert judgment were not considered reliable to use for accuracy comparisons. The accuracy values for the PROBE and SISE models were compared to determine if a significant difference in their respective accuracies existed. 4.5.3 Metrics Both the SISE and PROBE models produced an estimate containing a prediction in- terval. In addition, the PROBE model produced a single, planned effort value. The SISE model, however, does not employ a prescriptive method for translating its prediction interval into a single value estimate. Therefore, to construct a valid comparison between the two models, the metric employed must rely on a comparison of prediction intervals, as opposed to single value estimates. Two methods are commonly used to evaluate estimation model accuracy. The first method involves the calculation of the mean absolute relative error (MARE) for a set of estimates [11]. The MARE value (see Equation 4.1) is an average of the magnitude of all estimate errors and requires single value estimates to calculate. 67 MARE = 1n nX i=1 (actuali estimatei) actuali (4.1) where n is the number of individual tasks in the set. The second method of comparison involves a calculation of the estimation model?s hit rate and requires that the original estimates be phrased in terms of a prediction interval. A prediction interval (PI) represents the low and high bounds of the estimate. For example, an estimate of ?90 to 110 person hours? for an activity has a prediction interval of [90,110]. The prediction interval width (PI width) is the difference between the high and low bounds. The hit rate is defined as the percentage of time the actual effort falls within the prediction interval [23]; Equation 4.2 shows the formula for calculating the hit rate. Logically, the hit rate may be greatly affected by the PI width, which, in turn, relates to the confidence level. HitRate = 1nX i hi;hi = ( 1;min i actuali maxi 0;actuali > maxi_actuali < mini (4.2) where mini and maxi are the minimum and maximum values, respectively, of the PI for the estimate of task i; actuali is the actual effort of the task i; and n is the number of the estimated task. Due to the fact that SISE only produces estimates in terms of a prediction interval, the hit rate was chosen as the method for comparison to PROBE. In addition, the PI widths were compared to give perspective to the relative hit rates. 4.5.4 Participants The students involved in this experiment were, at the time, enrolled in Auburn Univer- sity?s Software Process course (COMP 5700). The Software Process course is designed to provide: insight into process-oriented software development; exposure to common engineer- ing processes; and experience with a software process [12]. 68 Figure 4.5: Sample PROBE calculation using the assignment spreadsheet. 4.5.5 Questions and Presentation Students preparing a PROBE estimate utilized a custom spreadsheet, which guided and facilitated the model?s calculations. Figure 4.5 shows an example. In Fall 2012 and Spring 2013, the Software Process course had moved beyond the PROBE estimation model and was employing PCSE estimates. In these semesters, prior to the last assignment, each student was provided a summary of his or her actual effort values from past assignments (see Table 4.13). The students were asked to review the sum- mary data and answer the following questions before beginning work on the assignment (approximately a week prior). Place a check next to each past assignment that you are confident is smaller than CA07 in terms of required effort to complete. 69 Assignment Actual Effort Description CA06 272 min. Calculate effort based on historical data. CA04 298 min. Extract design components from a file containing Python source code. CA03 433 min. Determine the size of a software componentrelativetoalistofcom- ponents. CA02 765 min. Analyze a time log. Table 4.13: Sample assignment summary. 2 CA02 2 CA03 2 CA04 2 CA06 Place a check next to each past assignment that you are confident is larger than CA07 in terms of required effort to complete. 2 CA02 2 CA03 2 CA04 2 CA06 Basedontheresultsofthesequestions, theresearchersconstructedaSISE-styleestimate for each student. The accuracy of the SISE estimates was calculated and compared to the historical accuracy of PROBE estimates. 4.5.6 Results Atotalof406estimateswereconstructedusingthePROBEmodel. Ofthatnumber, 176 assignments were completed within the estimate?s prediction interval and 230 fell outside. The average PI width was 758 minutes. 70 A total of 77 estimates were constructed using the SISE model. Of that number, 26 assignments were completed within the estimate?s prediction interval and 51 fell outside. The average PI width was 292 minutes. The hit rates were compared using 2-sample test for equality of proportions with conti- nuity correction in R. Output Listing 22 shows the results. The p-value of 0.07535 prevents the rejection of the test?s null hypothesis that the SISE proportion is less than the PROBE proportion. Therefore, it cannot be concluded that the accuracy of the PROBE model out-performs the SISE model. The PI width values for each model were analyzed and determined to follow a non- normal distribution. Therefore, the comparison of the PI widths was conducted using a Mann-Whitney U test (see Output Listing 23). At a 95% confidence interval, the p-value of 0.008988indicatedthatthePIwidthvaluesfromthetwodatasetsweresignificantlydifferent. A second Mann-Whitney U test was conducted (see Output Listing 24) to determine if the PROBE PI widths were significantly larger than the SISE PI widths; the p-value of 0.004494 confirmed they were. 4.5.7 Conclusions Overall, the analysis revealed that no provable statistical difference existed between the accuracy of the PROBE and SISE models? hit rates. In addition, the SISE model demonstrated significantly smaller PI widths than that of the PROBE model, which may ? subjectively ? be interpreted as more useful for project planning purposes. Decision: Reject the null hypothesis H02 in favor of the alternate Ha2, concluding that the SISE model estimates are equally or more accurate than PROBE. 4.6 Time Investment and Perceived Value The next steps in validation of the SISE model involved demonstrating the model?s required time investment and perceived value. To accomplish this, an attitudinal survey was 71 conducted in the Spring 2013 Software Process class, which had constructed estimates using both expert judgment and PCSE in project assignments. The PCSE model is based on a subset of the activities composing the PROBE model; it follows that the complexity and time investment required to complete a PCSE estimate is equal to or less than that of the PROBE model. Therefore, a comparison of the SISE model to the PCSE model was determined valid in demonstrating the SISE model?s complexity and required time investment as (indirectly) compared to the PROBE model. 4.6.1 Hypothesis The amount of time spent constructing an estimate directly impacts a practitioner?s perception of the model?s value and may influence the decision to use the model. The direct comparison of SISE time investments to PCSE, which uses a subset of the PROBE model, indirectly demonstrates SISE?s relationship to PROBE. To confirm the relative time investment in a SISE estimate is less than that of a PCSE estimate, a survey was constructed to test the following hypothesis: Ha3: An estimator using the SISE model will invest less time in producing an estimate as the time required using PCSE (timeSISE < timePCSE). with the null hypothesis as H03: An estimator using the SISE model will invest as much or more time in producing an estimate as the time required to use PCSE (timeSISE timePCSE). In addition, the perceived value of an estimate may influence a practitioner?s usage of a particular model or approach. A comparison of the value of a SISE estimate to guesswork, a commonly used approach, demonstrates a significant factor in adoption. To confirm that a software engineer?s perception of value for a SISE estimate is greater than that of guesswork, a survey was constructed to test the following hypothesis: 72 Ha4: An estimator introduced to the SISE model and underlying approach will perceive the output of the model as equally or more useful than guesswork (val- ueSISE valueGuess). with the null hypothesis as H04: An estimator introduced to the SISE model and underlying approach will perceive the output of the model as less useful than guesswork (valueSISE < val- ueGuess). 4.6.2 Survey A survey was conducted of software engineering students to determine their attitudes and opinions regarding the relationships between SISE, PCSE, and guesswork with respect to their relative time investment and expected value. The survey presented twenty-six ques- tions covering PCSE and SISE comprehension; model usage as compared to risk; and time investment, complexity, and value comparisons. 4.6.3 Metrics The answers to the survey questions followed either a Likert scale or categorical model. The scaled responses allowed the student to choose along a rating scale such as ?much more,? ?somewhat more,? ?same,? ?somewhat less,? and ?much less.? The categorical responses typically followed a pattern of ?choice 1,? ?choice 2,? or ?neither.? Once the survey responses were tabulated, a proportion was assigned to each response. The responses were analyzed to determine if the number of people choosing a particular response was significantly greater than the others by calculating a 1-sample proportions test with continuity correction in R. In addition, the scaled responses were categorized into more general ?agree,? ?disagree,? or ?neither? values to determine if a particular category demonstrated a significantly greater proportion than the others. 73 4.6.4 Participants The students involved in this experiment were, at the time, enrolled in Auburn Univer- sity?s Software Process course (COMP 5700). The Software Process course is designed to provide: insight into process-oriented software development; exposure to common engineer- ing processes; and experience with a software process [12]. 4.6.5 Questions and Presentation Appendix D ? Attitudinal Survey Questions contains the complete list of survey ques- tions. The questions relevant to this research include: contrasting the time investment required for SISE to that of PCSE and contrasting the perceived value of a SISE estimate to that of expert judgment. 21. The PCSE and SISE effort estimation models take two distinct approaches to constructing an estimate. (Descriptions of models omitted for brevity.) Based on the descriptions of each model, select the statement below that best described your impression of the PCSE model as compared to the SISE model in terms of time investment. 2 I believe the PCSE model would require a larger time investment than the SISE model. 2 I believe neither model is more time-consuming to utilize than the other. 2 I believe the SISE model would require a larger time investment than the PCSE model. 26. Based on the description of each the SISE model and your experience with expert judgement (i.e. guesswork), select the statement below that best described your impression of the SISE model as compared to expert judgement in terms of the value of the estimate produced. 74 2 I believe the SISE model will produce much more valuable estimates as com- pared to expert judgement. 2 I believe the SISE model will produce somewhat more valuable estimates as compared to expert judgement. 2 I believe the SISE model will produce estimate of equal value as compared to expert judgement. 2 I believe the SISE model will produce somewhat less valuable estimates as compared to expert judgement. 2 I believe the SISE model will produce much less valuable estimates as compared to expert judgement. In addition, students were asked to provide their opinion of the SISE and PCSE models? relative complexities. 22. Based on the descriptions of each model, select the statement below that best described your impression of the PCSE model as compared to the SISE model in terms of complexity. 2 I believe the PCSE model is much more complex as compared to the SISE model. 2 I believe the PCSE model is somewhat more complex as compared to the SISE model. 2 I believe the PCSE model is the same complexity as the SISE model. 2 I believe the PCSE model is somewhat less complex as compared to the SISE model. 2 I believe the PCSE model is much less complex as compared to the SISE model. The survey was made available to the students near the end of the semester and partic- ipation was voluntary. 75 Reason Count Prop. Time Investment PCSE requires a larger time investment than SISE 31 88.6% Neither model is more time consuming 3 8.6 PCSE requires a smaller time investment than SISE 1 2.9 Total 35 Perceived Value SISE much more than expert judgment 18 51.4% SISE somewhat more than expert judgment 16 45.7 No difference 1 2.9 SISE somewhat less than expert judgment 0 0.0 SISE much less than expert judgment 0 0.0 Total 35 Perceived Complexity PCSE much more than SISE 20 57.1% PCSE somewhat more than SISE 13 37.1 No difference 0 0.0 PCSE somewhat less than SISE 2 5.7 PCSE much less than SISE 0 0.0 Total 35 Table 4.14: Summary of survey results. 4.6.6 Results A total of 35 responses to the survey were received. Table 4.14 lists the questions, the responses, the count of each response, and the relative proportion. Intermsoftimeinvestment, theresponsesdemonstratethat31outof35studentsbelieve the PCSE model requires a larger time investment as compared to the SISE model, based on the provided descriptions. The 95% confidence interval for this proportion is 72.3-96.2% (see Output Listing 25). In terms of perceived value, the responses demonstrate that 34 out of 35 students be- lieve the SISE model provides greater value ? either ?much more? or ?somewhat more? ? as compared to expert judgment, based on the provided descriptions. The 95% confidence interval for this proportion is 83.4-99.9% (see Output Listing 26). 76 In terms of complexity, the responses demonstrate that 33 out of 35 students believe the PCSE model is more complex ? either ?much more? or ?somewhat more? ? as compared to the SISE model, based on the provided descriptions. The 95% confidence interval for this proportion is 79.5-99.0% (see Output Listing 26). 4.6.7 Conclusions In summary, a significant proportion of the respondents indicated their belief that SISE requires a smaller time investment as compared to PCSE; SISE provides a higher level of perceived value as compared to expert judgment; and PCSE is a more complex model as compared to SISE. Decision: Reject the null hypothesis H03 in favor of the alternate Ha3, concluding that an estimator using the SISE model will invest less time in producing an estimate as required using PCSE. Decision: Reject the null hypothesis H04 in favor of the alternate Ha4, concluding that an estimator introduced to the SISE model and underlying approach will perceive the output of the model as equally or more useful than guesswork. 4.7 Summary This work addresses the viability of the SISE estimation model as a reasonable option for individuals wishing to construct effort estimates. Specifically, SISE attempts to fill the gap between expert judgment (guesswork) and other individual estimation models, such as PROBE. It has been demonstrated that software engineers are generally capable of identifying the larger of two tasks. In terms of estimation accuracy, this research had demonstrated that the SISE model is no more or less useful than PROBE, its nearest, validated competitor. In fact, this research has demonstrated that the SISE model, with no provable difference in accuracy, produces estimates with a narrower, and arguably more useful, prediction interval. 77 This research has also demonstrated that the SISE model is perceived as less complex and less time consuming than the PCSE model, and by relative comparison, the PROBE model. Lastly, it has been demonstrated that software engineers view the output of the SISE model as more valuable than that of expert judgment, a factor that may influence adoption and continued usage. Obviously, further research is required to determine the extent to which these results apply to an industrial environment. Furthermore, additional research into the benefits and effectiveness of the SISE model should be conducted to determine how the model behaves when calibrated with larger, more extensive historical data sets. Lastly, the relative sizing survey may produce different and more interesting results if it is modified to allow the respondents to specify a third option of ?unknown,? for the relative size differences. 78 Chapter 5 Conclusions and Additional Research 5.1 Summary The software engineering discipline is filled with many varied examples of software pro- cess methods and tools focused on the team or organization. In recent years, the agile approach to software engineering has increased the focus of software process on small teams and individuals; however, not all aspects of software process have been deeply or fully ad- dressed. The majority of effort estimation models ? traditional and agile ? focus on teams or groups of software engineers. The discipline is ripe with various examples of team-based models including Wideband Delphi, Planning Poker, function point analysis, COCOMO, etc. The few examples of effort estimation models focused on the lone software engineer are limited to tradition mathematical models with (relatively) substantial complexity and required time investment. The discipline lacks a truly agile model based on a minimal combination of empirical data and expert judgment. The SISE model under development at Auburn University?s microISV Research Lab is a simple-to-understand, lightweight, and agile effort estimation model that specifically targets individual software engineers. SISE combines an individual?s personal, empirical data with expert judgment and experiences to produce relatively accurate estimates with a minimal investment of training and time. The SISE model rests on two foundational principles. First, software engineers are capable of identifying the largest of a pair of tasks based solely on their descriptions. Second, a software engineer who is presented with a future work activity is capable of identifying 79 two historical tasks ? one larger, one smaller ? which may serve as a prediction of the future activity?s size. The name ?SISE? is an acronym for the model?s four basic steps: Sort, Identify, Size, and Evaluate. The first step ? Sort ? involves the ordering of historical data by the actual effort required to complete the activity. The second step ? Identify ? involves choosing two tasks from the historical data set: one confidently known to be smaller, one confidently known to be larger, and both relatively close in size to the future work. Once the practitioner has chosen a pair of tasks, the third step ? Size ? produces a rough prediction interval of the future activity?s size using the actual effort values for the two completed tasks. The final step ? Evaluate ? involves shifting or resizing the prediction interval to account for any historical bias. This last step is optional and is only applied if the estimator is dissatisfied with the precision, accuracy, or confidence level of his or her estimate. Validation of the SISE model included two major steps. First, the foundational principle that relative tasks sizing by software engineers is suitably accurate was validated. The validation occurred in the form of a survey, presented to over 100 software engineering students, which presented the respondents with a series of task pairs from which they were to identify the larger. Some of the pairs had a known, verifiable size difference based on ten years of time logs provided by students in the Software Process course, while some of the pairs did not. The results indicated that, on average, a majority of software engineers were able to identify the larger task, while not typically misidentifying the smaller. When presented with tasks demonstrating no significant difference in size, the respondents were typically swayed by the wording, format, or word count. The second phase of validation involved a series of Software Process students who were asked to identify where a future activity should be placed in the ordered list of their com- pleted tasks. In addition, the students were asked to construct a PCSE estimate. The results indicated that SISE predictions were no more or less accurate than the PCSE model?s esti- mates. In addition, the students indicated that SISE, in their opinion, took less time and 80 was based on less a complex model. In summary, SISE appears capable of producing results of equal quality, in less time, and with less training. 5.2 Conclusions Several conclusions may be drawn from the results of this research. First, it should be noted that a lightweight, agile effort estimation mode ? SISE ? has been proven effective as a tool for individual software engineers. From this, several other conclusions may be drawn. For example, this research has reinforced the notion that effort estimation, in general, does not need to be a heavy weight activity. Approaches such as Planning Poker and SISE provide valuable results with minimal cost. In fact, it is possible that many data-driven activities within the software engineer discipline may benefit from a lightweight version based on expert judgment and backed by empirical data, in much the same way SISE is built. In other words, new models may be constructed to build upon the intuitive knowledge and experience of software engineers while grounding the activities in solid, fact-based data. Lastly,thisresearchhasdemonstratedthatindividualsoftwareengineerspossessskillsets that are unpredictable and likely very difficult to quantify. Nowhere has this been more apparent than in this research?s attempts to identify ?common? rank orderings of tasks by effort. Actual effort values demonstrated that one person may struggle on a simple task, whereas another person may finish quickly. In fact, the wide variety of responses from studentsonwhytheybelievedonetaskmightbelargerthananotherdemonstratedaplethora of subtle experiences, skills, and talents, each capable of affecting personal productivity in significant ways. 5.3 Additional Research Building upon the conclusions drawn from this research, the authors have identified several topics of future of research. 81 Individualsoftwaredevelopers?whethertheyarelabeledasteammembers, consultants, or micropreneurs ? represent fertile ground for future research activities. A wide variety of tools exist for facilitating team software development activities; however, the tools and techniques specific to the lone software engineer are few and far between. For example, the study of effort estimation approaches and models has been underway for decades, however, thesoftwareengineeringindustrystilllacksforagile, reasonablyaccuratetoolsforindividuals to size their own personal work efforts. Another obvious area of research remaining is the method or methods by which SISE maybeintegratedintoateamenvironment. Teamintegrationshouldincludeabasicmethod- ology for combining individual and team estimates, calculating and incorporating overhead costs, and creating synergy between team members. Once the role and behaviors of SISE model have been defined with the larger context of a team environment, a sample implementation plan must be formulated. The implementa- tion plan will cover the basics steps in introducing SISE into a team environment, training the participants, equipping team members with appropriate tools, measuring the model?s effectiveness, and adjusting for quality, as necessary. A key factor in support of the SISE model will be development of supporting tools. Although the use of tools is not mandatory for the successful use of the SISE model, several areas of the model may benefit from their creation. Such areas include data pruning, histor- ical data management, and relative sizing. For example, a tool supporting the data pruning process, based on a variety of algorithms, would reduce the preparation time for each esti- mate. A data repository for historical activities ? descriptions, dates, estimated effort, and actual effort values ? would streamline the data pruning process. Lastly, a relative sizing tool ? designed to present a list of activities and assist in the process of inserting a new, future task into the list ? would be useful in both training new estimators and assisting experienced ones. As such tools are developed, methods must be developed for integrating these tools 82 into a team environment (e.g. integration with time tracking tools). Such integration will encourage SISE adoption into team-based development environments. Another area of research can be found in the development of pruning algorithms to support the SISE model. Such research would be based in the exploration of the factors involved in the pruning of historical data (e.g. age of activities, sizes of activities, data distribution types, etc.). Depending on the factors involved, algorithms may be developed to identify and remove extraneous historical data, while leaving sufficient data to establish a reliable estimate within the desired confidence levels. In instances where the SISE model is utilized for long periods of time, research should be undertaken to determine if the model is self-correcting, or if a combination of pruning and output tuning are necessary. It is possible that the model will not require specific actions, other than historical pruning, to maintain an acceptable level of quality. However, it may also be possible that the practitioner will eventually need to tune the estimates based on historical performance. Additional research into such feedback mechanisms may benefit the overall quality of the model. During the process of gathering and analyzing data related to relative task sizing, a moderate-to-strong correlation was noted between individual students and groups, such as a class. Additional research into such correlations may provide insight into task sizing in general. Questions that may be explored include: How strong are these correlations? How homogeneous must the groups be to maintain a strong correlation to the individuals? sizings? Can relative sizing techniques be combined with rank correlation techniques to enhance existing estimation models? Surprisingly, this research noted that early estimates, within the context of the software process course, tended to outperform formal estimation models, such as PCSE and SISE. While this may be due to a lack of data to calibrate the models, further research into the relationship between expert judgment, formal models, and data set sizes may reveal some interesting trends that can be used to improve existing estimation approaches. 83 One last area of potential research, beyond the ?software engineering? focused practices, is the underlying psychological factors involved in the use of the SISE model (and sizing in general). For example, research indicates that word choice, word count, and grammatical structure during requirements definition affects a reader?s perception of complexity and re- quired effort. In addition, the attitudinal survey revealed a tendency, on the part of software engineers, to equate higher complexity with more value, which may not always be the case. Techniques for identifying and addressing these psychological influences would positively affect both the implementation of the SISE model and other estimation models. 84 Bibliography [1] AdMob Metrics. July 2009 metrics report. http://metrics.admob.com/2009/08/ july-2009-metrics-report/, August 2011. Accessed August 14, 2011. [2] Alan Albrecht. Function points: A new way of looking at tools. IBM, 1979. [3] Android Market. Android market developer signup. https://market.android.com/ publish/signup/, August 2011. Accessed August 14, 2011. [4] Apple. Apple Developer Programs 2011. http://developer.apple.com/programs/, August 2011. Accessed August 14, 2011. [5] Association of Software Professionals. ASP member forum. http://members. asp-software.org/newsgroups/showthread.php?t=25806, August 2011. Accessed August 14, 2011. [6] Auburn University. COMP 2210 course description. http://www.eng.auburn.edu/ files/acad_depts/csse/syllabi/comp2210.pdf, 2013. Accessed May 13, 2013. [7] Barry W. Boehm. Software engineering economics. Software Engineering, IEEE Trans- actions on, SE-10(1):4 ?21, jan. 1984. [8] Barry W. Boehm, Chris Abts, A Winsor Brown, Sunita Chulani, Bradford K. Clark, Ellis Horowitz, Ray Madachy, Donald J. Reifer, and Bert Steece. Software Cost Esti- mation with Cocomo II with CD-ROM. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 2000. [9] M. Cohn. Succeeding with Agile: Software Development Using Scrum. Addison-Wesley signature series. Addison-Wesley, 2009. [10] Construx. Software development practices. http://www.construx.com/Page.aspx? nid=68, 2012. Accessed January 26, 2012. [11] S. D. Conte, H. E. Dunsmore, and V. Y. Shen. Software engineering metrics and models. Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, USA, 1986. [12] David Umphress. COMP 5700/6700/6706 software process. http://www.eng.auburn. edu/users/umphress/comp6700/index.html, 2013. Accessed May 13, 2013. [13] Iris Fabiana de Barcelos Tronto, Jos? Demisio Sim?es da Silva, and Nilson Sant Anna. Comparison of artificial neural network and regression models in software effort estima- tion. In IJCNN, pages 771?776, Brazil, 2007. IEEE. 85 [14] GartnerIncorporated. Gartnersaysworldwidesoftwareasaservicerevenueisforecastto grow21percentin2011. http://www.gartner.com/it/page.jsp?id=1739214, August 2011. Accessed August 14, 2011. [15] Maurice H. Halstead. Elements of Software Science (Operating and programming sys- tems series). Elsevier Science Ltd, Amsterdam, May 1977. [16] M. Host and C. Wohlin. An experimental study of individual subjective effort estima- tions and combinations of the estimates. In Software Engineering, 1998. Proceedings of the 1998 International Conference on, pages 332 ?339, Sweden, apr 1998. IEEE, IEEE. [17] Watts S. Humphrey. A Discipline for Software Engineering. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. [18] W.S. Humphrey. Introduction to the team software process(sm). SEI series in software engineering. Addison-Wesley, 2000. [19] D.R. Jeffery and G. Low. Calibrating estimation tools for software development. Soft- ware Engineering Journal, 5(4):215 ?221, jul 1990. [20] Philip M. Johnson and Anne M. Disney. The personal software process: A cautionary case study. IEEE Software, 15(6):85?88, November 1998. [21] M.Jorgensen, B. Boehm, and S. Rifkin. Software development effort estimation: Formal models or expert judgment? Software, IEEE, 26(2):14 ?19, march-april 2009. [22] M. Jorgensen and M. Shepperd. A systematic review of software development cost estimation studies. Software Engineering, IEEE Transactions on, 33(1):33 ?53, jan. 2007. [23] M Jorgensen, K H Teigen, and K J Molokken-Ostvold. Better sure than safe? overcon- fidence in judgment based software development effort prediction intervals. Journal of Systems and Software, 70(1-2):79?93, 2004. [24] Magne J?rgensen. A critique of how we measure and interpret the accuracy of software development effort estimation. In Jacky Keung, editor, 1st International Workshop on Software Productivity Analysis and Cost Estimation, pages 15?22, Tokyo, Japan, 2007. Information Processing Society of Japan. [25] Chris F. Kemerer. An empirical validation of software cost estimation models. Com- munications of the ACM, 30(5):416?429, May 1987. [26] Steve McConnell. Software Estimation: Demystifying the Black Art (Best Practices (Microsoft)). Microsoft Press, Redmond, WA, USA, kindle edition edition, 2006. [27] Mirriam-Webster. Mirriam-webster dictionary. http://www.merriam-webster.com/, 2012. Accessed January 26, 2012. [28] J.J. Moder, C.R. Phillips, and E.W. Davis. Project management with CPM, PERT, and precedence diagramming. Van Nostrand Reinhold, New York, NY, USA, 1983. 86 [29] Mountain Goat Software. Planning poker cards. http://store.mountaingoatsoftware.com, 2013. Accessed May 30, 2013. [30] TridasMukhopadhyay, StevenS.Vicinanza, andMichaelJ.Prietula. Examiningthefea- sibility of a case-based reasoning model for software effort estimation. MIS Q., 16:155? 171, June 1992. [31] L. H. Putnam. A general empirical solution to the macro software sizing and estimating problem. IEEE Trans. Softw. Eng., 4:345?361, July 1978. [32] M. Ruhe, R. Jeffery, and I. Wieczorek. Using web objects for estimating software devel- opment effort for web applications. In Software Metrics Symposium, 2003. Proceedings. Ninth International, pages 30 ? 37, sept. 2003. [33] Melanie Ruhe, Ross Jeffery, and Isabella Wieczorek. Cost estimation for web appli- cations. In Proceedings of the 25th International Conference on Software Engineering, ICSE ?03, pages 285?294, Washington, DC, USA, 2003. IEEE Computer Society. [34] R.Schoedel. PROxy Based Estimation (PROBE) for Structured Query Language (SQL). Technical note. Carnegie Mellon University, Software Engineering Institute, 2006. [35] Scrum Methodology. Scrum effort estimation and story points. http://scrummethodology.com/scrum-effort-estimation-and-story-points, 2008. Access January 30, 2012. [36] Martin Shepperd, Chris Schofield, and Barbara Kitchenham. Effort estimation using analogy. In Proceedings of the 18th international conference on Software engineering, ICSE ?96, pages 170?178, Washington, DC, USA, 1996. IEEE Computer Society. [37] Standish Group. Standish chaos report 2009. https://secure.standishgroup.com/ reports/reports.php, August 2011. Accessed August 14, 2011. [38] Russell Thackston and David Umphress. Individual effort estimating: Not just for teams anymore. CrossTalk: The Journal of Defense Software Engineering, 25(3):4?7, May/June 2012. [39] Russell Thackston and David Umphress. Micropreneurs: The rise of the microisv. IT Professional, 15(2):50?56, 2013. [40] David Umphress. Principle-centered software engineering. http://swemac.cse.eng. auburn.edu/~umphrda/PCSE, October 2011. Accessed October 12, 2011. [41] M. van Genuchten. Why is software late? an empirical study of reasons for delay in software development. Software Engineering, IEEE Transactions on, 17(6):582 ?590, jun 1991. [42] Fiona Walkerden and Ross Jeffery. An empirical study of analogy-based software effort estimation. Empirical Softw. Engg., 4:135?158, June 1999. 87 [43] Gerhard E. Wittig and Gavin R. Finnie. Using artificial neural networks and function points to estimate 4gl software development effort. Australasian J. of Inf. Systems, 1(2):87?94, 1994. [44] S. Yenduri, S. Munagala, and L.A. Perkins. Estimation practices efficiencies: A case study. InInformation Technology, (ICIT 2007). 10th International Conference on, pages 185 ?189, dec. 2007. 88 Appendices 89 Appendix A Output Listings The following output listings were produced using R. Listings 1 - 10 compare the average construction times for assignment pairs used in the relative sizing survey. Listings 11 - 20 compare the proportion of correct answers against the probability of random chance (proportion = 0.5). Output Listing 21 compares the proportion of correct survey answers to the total number of answers. Output Listing 22 compares the hit rates of SISE and PROBE via the number of hits for each model versus the total number of assignments. Output Listings23and24comparethePROBEandSISEpredictionintervalwidths. OutputListings 25 - 27 compare the respondents? perceptions of time investment, value, and complexity for SISE, PCSE, and expert judgment. > wilcox.test(TDist[,1], MPPS[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: TDist[, 1] and MPPS[, 1] W = 9923.5, p-value = 1.007e-09 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: 75.00005 Inf sample estimates: difference in location 105.0001 (1) 90 > wilcox.test(CriticalPath[,1], T-Dist[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: CriticalPath[, 1] and T-Dist[, 1] W = 19022.5, p-value = 0.0001667 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: 36.99997 Inf sample estimates: difference in location 66.99999 (2) > wilcox.test(TDist[,1], FiveSlot[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: TDist[, 1] and FiveSlot[, 1] W = 4701, p-value = 0.01516 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: 14.99994 Inf sample estimates: difference in location 62.00005 (3) > wilcox.test(Text[,1], MPPS[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: Text[, 1] and MPPS[, 1] W = 1133, p-value = 0.004951 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: 15.99999 Inf sample estimates: difference in location 47.40739 (4) 91 > wilcox.test(TDist2[,1], CalcCorr[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: TDist2[, 1] and CalcCorr[, 1] W = 451, p-value = 0.06837 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: -5.000015 Inf sample estimates: difference in location 45.99993 (5) > wilcox.test(FiveSlot[,1], MPPS[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: FiveSlot[, 1] and MPPS[, 1] W = 764, p-value = 0.05171 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: -5.4093e-05 Inf sample estimates: difference in location 43.78432 (6) 92 > wilcox.test(ComponentInfo[,1], FiveSlot[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: ComponentInfo[, 1] and FiveSlot[, 1] W = 397, p-value = 0.1399 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: -33.99998 Inf sample estimates: difference in location 37.99995 (7) > wilcox.test(Text[,1], AMS[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: Text[, 1] and AMS[, 1] W = 1739.5, p-value = 0.1012 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: -6.000062 Inf sample estimates: difference in location 22.00008 (8) > wilcox.test(Text[,1], CalcCorr[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: Text[, 1] and CalcCorr[, 1] W = 570.5, p-value = 0.1689 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: -20.00005 Inf sample estimates: difference in location 21.00005 (9) 93 > wilcox.test(AMS[,1], MPPS[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: AMS[, 1] and MPPS[, 1] W = 2184, p-value = 0.06977 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: -2.999972 Inf sample estimates: difference in location 26.99997 (10) M-P-P-S vs. T-Dist > prop.test(95,113, alternative="g") 1-sample proportions test with continuity correction data: 95 out of 113, null probability 0.5 X-squared = 51.115, df = 1, p-value = 4.355e-13 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.7712947 1.0000000 sample estimates: p 0.840708 (11) T-Dist vs. CriticalPath > prop.test(80,113, alternative="g") 1-sample proportions test with continuity correction data: 80 out of 113, null probability 0.5 X-squared = 18.7257, df = 1, p-value = 7.547e-06 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.6287827 1.0000000 sample estimates: p 0.7079646 (12) 94 5-Slot vs. T-Dist > prop.test(56,113, alternative="g") 1-sample proportions test with continuity correction data: 56 out of 113, null probability 0.5 X-squared = 0, df = 1, p-value = 0.5 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.4149116 1.0000000 sample estimates: p 0.4955752 (13) M-P-P-S vs. Text > prop.test(71,113, alternative="g") 1-sample proportions test with continuity correction data: 71 out of 113, null probability 0.5 X-squared = 6.9381, df = 1, p-value = 0.004219 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.546867 1.000000 sample estimates: p 0.6283186 (14) CalcCorr vs. T-Dist2 > prop.test(51,113) 1-sample proportions test with continuity correction data: 51 out of 113, null probability 0.5 X-squared = 0.885, df = 1, p-value = 0.3468 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.3584833 0.5475238 sample estimates: p 0.4513274 (15) 95 M-P-P-S vs. 5-Slot > prop.test(91,113) 1-sample proportions test with continuity correction data: 91 out of 113, null probability 0.5 X-squared = 40.9204, df = 1, p-value = 1.586e-10 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.7179120 0.8714467 sample estimates: p 0.8053097 (16) ComponentInfo vs. 5-Slot > prop.test(63,112) 1-sample proportions test with continuity correction data: 63 out of 112, null probability 0.5 X-squared = 1.5089, df = 1, p-value = 0.2193 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4656548 0.6550020 sample estimates: p 0.5625 (17) Text vs. A-M-S > prop.test(76,113) 1-sample proportions test with continuity correction data: 76 out of 113, null probability 0.5 X-squared = 12.7788, df = 1, p-value = 0.0003506 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.5770539 0.7561623 sample estimates: p 0.6725664 (18) 96 CalcCorr vs. Text > prop.test(77,113) 1-sample proportions test with continuity correction data: 77 out of 113, null probability 0.5 X-squared = 14.1593, df = 1, p-value = 0.000168 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.5861820 0.7641181 sample estimates: p 0.6814159 (19) M-P-P-S vs. A-M-S > prop.test(107,113) 1-sample proportions test with continuity correction data: 107 out of 113, null probability 0.5 X-squared = 88.4956, df = 1, p-value < 2.2e-16 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.8832757 0.9782325 sample estimates: p 0.9469027 (20) 97 Proportion of respondents answering correctly (Tasks 1-4) > prop.test(72,113,alternative="g") 1-sample proportions test with continuity correction data: 72 out of 113, null probability 0.5 X-squared = 7.9646, df = 1, p-value = 0.002385 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.5558603 1.0000000 sample estimates: p 0.6371681 (21) > hitMiss <- matrix(c(26,176,51,230), ncol=2) > colnames(hitMiss) <- c(?Hit?,?Miss?) > rownames(hitMiss) <- c(?SISE?,?PROBE?) > hitMiss Hit Miss SISE 26 51 PROBE 176 230 > prop.test(hitMiss, alternative="l") 2-sample test for equality of proportions with continuity correction data: hitMiss X-squared = 2.0652, df = 1, p-value = 0.07535 alternative hypothesis: less 95 percent confidence interval: -1.000000000 0.009330862 sample estimates: prop 1 prop 2 0.3376623 0.4334975 (22) 98 > wilcox.test(probePI[,1],sisePI[,1], conf.int="T") Wilcoxon rank sum test with continuity correction data: probePI[, 1] and sisePI[, 1] W = 14635, p-value = 0.008988 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: 18.00009 162.00005 sample estimates: difference in location 78.00007 (23) > wilcox.test(probePI[,1],sisePI[,1], conf.int="T" alternative="g") Wilcoxon rank sum test with continuity correction data: probePI[, 1] and sisePI[, 1] W = 14635, p-value = 0.004494 alternative hypothesis: true location shift is greater than 0 95 percent confidence interval: 26.00004 Inf sample estimates: difference in location 78.00007 (24) > prop.test(31,35) 1-sample proportions test with continuity correction data: 31 out of 35, null probability 0.5 X-squared = 19.3143, df = 1, p-value = 1.109e-05 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.7232023 0.9627436 sample estimates: p 0.8857143 (25) 99 > prop.test(34,35) 1-sample proportions test with continuity correction data: 34 out of 35, null probability 0.5 X-squared = 29.2571, df = 1, p-value = 6.338e-08 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.8338216 0.9985068 sample estimates: p 0.9714286 (26) > prop.test(33,35) 1-sample proportions test with continuity correction data: 33 out of 35, null probability 0.5 X-squared = 25.7143, df = 1, p-value = 3.959e-07 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.7947722 0.9900403 sample estimates: p 0.9428571 (27) 100 Appendix B Themes in Relative Sizing Rationale Inperformingalinguisticanalysisofeachrespondent?srationaleforselectingaparticular task as larger, the following common themes were detected. Data ? Referenced specific data structures, specific data types, or problems inherent in dealing with a particular data set. ?Again, task number two only needs to have values dumped into an array while task number one will need to process an indeterminate amount of data.? Familiarity ? Referenced prior experience with the problem space. ?I have done a project using this task and it seems more involved that basically an array with basic math applied to it. (Task) 2 would require less time to start/understand.? File ? Referenced the ease or difficulties involved with file I/O, processing file contents, etc. ?Sometimes unexpected errors can occur when scanning text files, not to mention task number one is very simple.? Math ? Referenced either the ease or difficulty of creating math-based software or algorithms. ?From my experience with math a t-distribution is more complicated to calculate.? Methods ? Referenced the number of methods/functions, amount of code or lines of code required to complete the assignment, number of operations to be performed by the software, steps involved in the algorithms, etc. ?Requires more equations to implement which means more functions, more logic to code.? Planning ? Referenced the amount of time that would be spent in planning and/or design. ?It requires more of analysis and design work before coding.? 101 Problem ? Directly referenced actual aspects of the problem space as defined by the assignment. ?Both require degrees of freedom and probability / x, but task number two also requires number of tails functionality to be implemented.? Reuse ? Referenced either the ability to reuse code the respondent has already written, reuse code within the assignment, or standard libraries available in the programming lan- guage. ?Dependent on which language you wrote in, but task 1 is a accomplished with built-in packages- which makes it quicker to finish.? Simplicity ? Referenced either the simplicity of the smaller assignment or the complex- ity of the larger assignment. Based on the general tone of the responses, this theme appears most closely linked with a non-empirical, unstructured, and intuitive guess. ?Task 2 is more complex and will require more effort to implement.? Testing ? Referenced the ease or difficulty of testing the assignment. ?Writing test code that would cover this code would be quite extensive.? Text ? Referenced the ease of difficulty inherent in dealing with text-based problems. ?In my experience, parsing of English/text documents and has always led to more work, task 2 includes parsing, but it is less complex and mathematics usually are too hard to do.? 102 Appendix C Relative Sizing Survey Questions 103 104 105 106 107 108 109 110 111 112 113 Appendix D Attitudinal Survey Questions 114 115 116 117 118 119 120 121 122