THE DEVELOPMENT AND VALIDATION OF THE AUBURN PSYCHOLOGY TERM TEST (APTT) Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. This thesis does not include proprietary or classified information. ________________________________________ Dale L. Smith Certificate of Approval: _________________________ _________________________ Martha Escobar Lewis Barker, Chair Assistant Professor Professor Psychology Psychology _________________________ _________________________ Adrian Thomas Stephen L. McFarland Associate Professor Acting Dean Psychology Graduate School DEVELOPMENT AND VALIDATION OF THE AUBURN PSYCHOLOGY TERM TEST (APTT) Dale L. Smith A Thesis Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Master of Science Auburn, Alabama August 7, 2006 iii DEVELOPMENT AND VALIDATION OF THE AUBURN PSYCHOLOGY TERM TEST (APTT) Dale L. Smith Permission is granted to Auburn University to make copies of this thesis at its discretion, upon request of individuals or institutions and at their expense. The author reserves all publication rights. ______________________________ Signature of Author ______________________________ Date of Graduation iv THESIS ABSTRACT DEVELOPMENT AND VALIDATION OF THE AUBURN PSYCHOLOGY TERM TEST (APTT) Dale L Smith Master of Science, August 7, 2006 (B.S., Olivet Nazarene University, 2001) 94 Typed Pages Directed by Lewis Barker The construction and investigation of the psychometric properties of the Auburn Psychology Term Test (APTT), a yes-no test designed to measure psychology knowledge, is described in this paper. The relationships between this instrument and more typical indicators of student performance, including students? ability to identify and define psychology vocabulary items, and students? introductory psychology course grade, was significant. Strong alternate form reliability with a second version of the test was found. A signal detection analysis of test scores showed that students who performed well on the test showed more conservative responding strategies, in that they made slightly more hits and substantially fewer false alarms. The internal properties of this test were also assessed through item analyses and an exploratory factor analysis, which demonstrated that some variance exists in the effectiveness of APTT items, and suggested that the dimensionality of the APTT may be difficult to determine. v ACKNOWLEDGEMENTS Above all, I thank God, for providing numerous opportunities for success, and my parents, whose encouragement and support were the foundation for my decision to continue my education. I would also like to acknowledge my committee members Dr. Martha Escobar and Dr. Adrian Thomas for their time and integral contributions to this project. Finally, I would like to thank Dr. Lewis Barker, whose insight and guidance has made this project not only possible, but enjoyable. vi Style manual used: Publication Manual of the American Psychological Association (5 th edition) Computer software used: Microsoft Word 2000? Microsoft Excel 2000? SPSS 11.5? vii TABLE OF CONTENTS LIST OF TABLES??????????????????????? LIST OF FIGURES ?????????????????????? Chapter I. INTRODUCTION???????????????????. Chapter II. EXPERIMENT 1???????????????????.. Method???????????????????????????. Participants????????????????................................... Materials?????????????????????????... Procedure?????????????????????????.. Results???????????????????????????.. Chapter III. EXPERIMENT 2??????????????????? Method???????????????????????????. Participants????????????????................................... Materials?????????????????????????... Procedure?????????????????????????.. Results???????????????????????????.. Chapter IV. DISCUSSION????????????????????. REFERENCES????????????????????????... APPENDICES????????????????.................................... APPENDIX A????????????????????????.. APPENDIX B????????????????????????.. APPENDIX C????????????????????????.. ix x 1 23 23 23 23 24 25 31 31 31 31 31 31 33 43 49 50 51 52 viii APPENDIX D????????????????????????. APPENDIX E????????????????????????.. APPENDIX F???????????????????.?????. APPENDIX G????????????????????????. APPENDIX H?????.???????????????????. APPENDIX I?.???????????????????????.. 54 58 67 76 79 82 ix LIST OF TABLES Table 1. Items grouping into the factor for foils??????????????.. 2. Items grouping into the three factors for key terms?????????... Page 28 29 x LIST OF FIGURES Figure 1. Responding to yes/no recognition tests??????????????.. 2. A hit rate of 0.5 on a unit square????????????????? 3. Using signal detection theory to describe data on a unit square ????.... 4. Nonparametric analyses using the unit square???.................................... 5. APTT performance and introductory psychology course grade?????. 6. APTT performance as a function of key term and foil performance???.. 7. Response bias and APTT performance??????????????... 8. APTT performance and written recall performance on part two????? 9. Performance comparison on versions one and two across students???... Page 9 16 16 17 25 26 27 30 32 1 Chapter I. INTRODUCTION A taskforce of the American Psychological Association recently addressed the need for assessment of Psychology major achievement (Halonen et al., 2002). This taskforce established numerous outcomes defining this achievement, ranging from technological literacy to sociocultural awareness. Research reported here will focus on the first stated assessment outcome, developing a knowledge base of the basic ideas, perspectives, and concepts in psychology. While the taskforce reviewed and concluded that a number of methods showed strong potential for in-class assessment, they warned against concentrating solely on classroom indices. Addressing the need for assessment outside of the classroom, they suggested that only use of an assessment center and locally developed tests showed strong potential for this purpose. A number of obstacles exist in the development and administration of such locally developed tests. Problems include determining what actually constitutes student ability, differences in course selection among majors, and the expense and time involved in developing a tool that adequately and objectively assesses what they have achieved. In addressing the development of such a test, the task force lists a number of student achievement goals, the first of which involves demonstrating ?familiarity with the major concepts, theoretical perspectives, empirical findings, and historical trends in psychology? (Halonen et al., 2002, ? 1). Such a test should also assess a student?s ability to use psychology?s ?concepts, language, and major theories? adequately (Halonen et al., 2 2002, ? 1). Other goals involve the ability to apply psychological knowledge, think critically about psychology, and adhere to psychology?s core values. Because of the wealth of relevant student outcomes, the APA warns against the use of only one or two measures in assessing majors. The research reported here focuses on the development of a test called the Auburn Psychology Term Test (APTT). This test assesses a student?s knowledge of psychology vocabulary, including key terms, people, theories, and perspectives. The test is based on the premise that the ability to recognize, identify, and use the language of psychology underlies the development of more complex thinking and application skills within the discipline. In brief, students taking the APTT are presented with a list of 100 terms, 50 key terms and 50 foils, and asked to specify which terms belong in each category. Given that the domain of psychology is comprised of numerous key terms, people, theories, and perspectives, and that some are more important than others, the first task was to identify key terms comprising the core of relevant psychology knowledge. Using several introductory psychology textbooks, 50 key terms representing fifteen different content areas in psychology common to introductory textbooks were selected such as learning and personality (see Appendix A for complete list). Griggs, Bujak- Johnson, and Proctor (2004) have recently addressed discrepancies between key terms across introductory textbooks, finding that 455 terms are in over half of the 44 current introductory textbooks, and 155 are in 80 percent or more. The researchers argued that because 6269 glossary terms exist in all introductory textbooks, with 74 percent of these terms appearing in three or fewer textbooks, little similarity can be found in this domain. In many cases the discrepancies between key terms amounts to changes in the way a 3 similar concept is phrased, such as the presence of the concept bell curve in one textbook and normal curve in another. While the researchers attempted to account for clear synonyms, a thorough analysis of all possible similarities would likely be a rather large undertaking. In the midst of such discrepancies, the prevalence of 155 to 455 terms across textbooks points to a core of such terms that are relatively common to introductory psychology. A number of APTT terms are amongst these core terms. Approximately 60 percent of each version?s key terms can be found in at least half of all available introductory psychology textbooks. In an attempt to eliminate the potential confound of participants in the present study having been exposed to different terms during their tenure as psychology students, all participants in experiment one were enrolled in an introductory psychology course being taught by the same professor in the same semester, and all terms used in the creation of the test were present in the required textbook. Based on material from the same content areas, 50 foils (pseudo-psychological terms) were created, designed to resemble true psychology terms. Foils differed along several dimensions from their key term counterparts. The most common differences were semantic. Such foils were created by modifying existing psychological concepts, clearly changing or reversing their meaning. While some of these changes may have been the result of altering only a few letters, such as ?gestation psychology?, the resulting meaning significantly differs from any known concept in psychology. In some cases these changes were morphological, and involved adding prefixes or suffixes to existing psychology terms that clearly altered their meaning, such as ?unnatural selection.? Other foils sound like potential psychology terms but are not found in any psychological literature, thus students could not have been previously exposed to them, including 4 ?animalism? and ?terminal stasis.? Finally, a few changes were phonetic, in which several letters of a key term were changed, resulting in a term with significant phonetic differences from the original term. An example of such a change is the term ?tetragen,? derived from the developmental term ?teratogen.? Such changes were applied following the commonplace recommendation that at least two letters be changed when creating such foils. (Beeckmans, Eyckmans, Janssens, Dufranne, & Van de Velde, 2001). Both for the sake of simplicity and because this is the way the concepts are generally used in the memory literature, items that subjects have previously been exposed to, key psychology terms in the APTT, will often be referred to as ?old? and items to which subjects could have not had previous exposure, or foils, as ?new.? Traditionally, the gold standard for student assessment in higher education has consisted of asking students to recall information in written form, thus demonstrating exactly what, and how much, they know about the topic. Those in higher education realize that a number of limitations exist in using this type of assessment procedure, including the time commitment involved in creating and grading such tests, potential biases involved in grading the multiple possible interpretations of a concept, and the difficulty in sampling from the wealth of material that may have been covered in a course (or courses). Though memory researchers have not wholly agreed on the nature of the relationship between recall and recognition, tests of recognition may be a more efficient method of accessing the knowledge base of a student. The existence of lively debates amongst memory theorists for several decades has resulted in the formulation of a number of models of recognition and memory (i.e. Mandler, 1980; Hintzman, 1988; Gillund & Shiffrin, 1984). 5 Although the models of early recognition theorists often postulated a single memory process accounting for both recognition and recall (Neath & Surprenant, 2003), most theorists generally contend that separate or additional processes or steps are involved (e.g. Gillund & Shiffron, 1984). While the theoretical frameworks behind these models is beyond the scope of this paper, most of these models are based on studies testing differences between subjects? ability to recognize presented terms as being part of a list that they were previously exposed to, and their ability to recall, or generate, such terms. In a typical study, words are initially presented for very short periods of time, typically measured in milliseconds or seconds, and time from initial exposure to testing is also quite short, most often measured in seconds or minutes. Several such studies have demonstrated similarities in performance between recognition measures and recall measures in a number of different types of memory tasks (Challis, Velichkovsky, & Craik, 1996), with study trials manipulating level of processing (Asthana & Nigrani, 1984) and with varying study times (Ratcliff & Murdock, 1976). The type of tasks used in these studies, however, differs from the present research in a number of important ways. While levels of processing may be manipulated, and in some cases participants may even be asked to read and manipulate a brief passage (i.e. Shaughnessy & Dinnell, 1999), the level of understanding of the terms is not on par with what happens in a classroom setting. The second distinction between traditional recognition verses recall research and the present study is the amount of time over which exposure to key terms has occurred. Rather than encountering a term once or twice over a period of seconds, students in a classroom setting may be intermittently exposed to a term over the course of several 6 days, weeks, or months. Few, if any, studies by memory theorists have attempted to measure the relationship between yes/no recognition and recall ability of material whose meaning has been highly emphasized, and which has been presented to subjects over an extended period of time. The nearest equivalent to this use of yes/no recognition tests has been their use in educational testing fields, research conducted by Stanovich and colleagues, and second language research. The methodology of the APTT was originally modeled after Stanovich?s ?Print Exposure Checklist? (Stanovich & West, 1989). Stanovich presented participants with lists of real and fabricated authors and publications and tested their ability to discriminate between them. An assessment of the reliability and validity of these tests found them to exceed many traditional literacy measures (Stanovich, 2000). Stanovich found strong relationships between performance on these tests and a number of cognitive abilities related to literacy such as spelling ability, verbal fluency (Stanovich & Cunningham, 1992), orthographic and phonological processing skill (Stanovich & West, 1989), vocabulary size, reading ability (Cunningham & Stanovich, 1997), cultural knowledge (West & Stanovich, 1991), declarative knowledge (Stanovich, West & Harrison, 1995), and real-world reading activity (West, Stanovich & Mitchell, 1993). Even when variability due to general cognitive ability, age, and education was statistically factored out, these strong relationships still existed. While Stanovich?s Print Exposure Checklist inspired the creation of the APTT, yes/no recognition tests have been used to assess student outcomes in the field of language testing for over twenty years (Beeckmans et al., 2001). Use of this methodology evolved from first language testing beginning in the late 1920s in which 7 students identified the words for which they knew the meaning from a checklist of terms. Foils were later added as a defense against possible overestimation (Anderson & Freebody, 1983). Meara and Buxton (1987) adapted this framework to second language testing. The primary focus of such tests was to determine vocabulary size in second language learners. Second language testing researchers have espoused numerous benefits of the yes/no recognition methodology, the most salient of which concerns the ease of and speed of testing. Kojic-Sabo and Lightbown (1999) praise the format for its ability to test ?a large number of words? within a very short period of time? (p. 180). Even critics of term-based vocabulary tests concede that, despite their simplistic structure, such tests can be a better indicator of participant?s vocabulary than an in-depth analysis of only a few items (Reed, 2000). This is particularly beneficial in the context of educational testing outside the classroom, where time and resources may be limited, or when the amount of material covered in a class necessitates the use of a more superficial and comprehensive format. Kojic-Sabo and Lightbrown (1999) further laud the tests? efficiency as a measure of vocabulary size in light of several studies finding strong relationships between performance on the test and several other indices of first and second language proficiency. One such study was Anderson and Freebody?s (1983) initial research with use of foils, which found that the yes/no format correlated more strongly with actual word knowledge (r = .85), measured through an interview process, than did multiple choice tests over the same words (r = .45). Loring (1995) compared yes/no vocabulary tests consisting of academic words to the Michigan Test of English Language Proficiency and its vocabulary subtest, finding strong correlations (rs = .70 and .68). Meara and 8 Buxton (1987) also studied the relationship between yes/no recognition tests and several established measures of second language proficiency such as the Cambridge First Certificate Examination, and also found strong correlations between these measures (e.g. r = .70). In addition, several studies have suggested that students prefer this testing methodology to other types of language tests (e.g. Cameron, 2002; Kojic-Sabo & Lightbrown, 1999). Use of the format in second language testing has not been without its critics. Beeckmans et al (2001) outlines several criticisms of the format, most relate to analyzing results of yes/no recognition tests. Due to its unique testing format, there exists a wealth of possible scoring methods and corresponding theoretical frameworks with which to analyze the results of a yes/no recognition test. They contend that an adequate method of scoring must be found to further the analysis of the yes/no testing design. The reason for the difficulty in scoring yes/no recognition tests is that such tests yield four possible outcomes (Green & Swets, 1966), outlined in Figure 1. If an item is a key term and the subject identifies it as such, a ?hit? is recorded. If the subject fails to identify the item as a true psychology term this is considered a ?miss.? If the item is a foil and is correctly identified as a foil, a ?correct rejection? is scored, though if the subject incorrectly identifies a foil as a key term, the student has made a ?false alarm.? Hit Miss Correct rejection False Alarm Yes No Key Term Foil Response Figure 1. Responding to yes/no recognition tests. A full analysis of test performance using this methodology involves more than simply tallying correct responses. Two components must be taken into consideration. The first is sensitivity, which is a participant?s actual accuracy or ability to discriminate between old and new items. The second is a participant?s response bias, or criterion. An unbiased criterion means that a participant ?always selects the alternative with the larger likelihood? (Pastore, Crawley, Berens, & Skelly, 2003; p. 558). A liberal or conservative bias leads a subject to be more likely to answer yes or no, respectively. Those with more liberal biases are more likely to score a hit, but also more likely to make a false alarm, while those with more conservative biases are more likely to correctly reject a foil but also more likely to miss a key term. Feenan and Snodgrass (1990) caution against simply treating bias as a nuisance variable, stating that it is important to understand this part of the recognition memory process and how it is manifested in test performance. This contention has been supported in memory research that looked at effects of different study times, and thus varying levels of familiarity with, terms on a yes/no recognition test (Ruiz, Soler, & Dasi, 2004). This 9 10 study indicated that while hit performance increased with study time, correct rejection performance increased even more significantly with study time. Other studies have found significant differences for both hits and correct rejections as familiarity with target material increases (e.g. Ratcliff, Clark, & Shiffrin, 1990). This phenomenon also speaks to the importance of response bias considerations, which will be addressed shortly. In developing an assessment tool researchers must determine how much of the student?s raw score is due to actual sensitivity and how much is a byproduct of responding criterion. The process of finding the most accurate way to measure how much the participant actually knows has led to a number of different models and formulas attempting to effectively measure sensitivity and bias. The following is not intended as a comprehensive analysis of methods of analyzing yes/no recognition tasks, but merely as a brief overview and explanation for the proposed use of several different measures to analyze APTT data. One such method of analysis involves use of ?thresholds.? Among the earliest threshold models was Blackwell?s (1953) high-threshold model, which was derived from early psychophysics (Luce, 1963). This model implies the existence of a threshold for stimulus detection, and suggests that the researcher?s primary task is to determine where this threshold lies. The key difference between threshold models and most other models lies in threshold models? rejection of the idea that a continuum exists on which different memory strengths lie. Threshold based models assume that either an item is encoded at study or it is not. Items that are not encoded should therefore be completely unavailable at test (Snodgrass, Volvovitz, &Walfish, 1972). The implications this has for computing scores will be discussed below. 11 Two common threshold models exist. The simplest is generally known as the one high threshold model, which assumes only two possible memory states: recognition and nonrecognition (Snodgrass & Corwin, 1988). According to this model, if an old item exceeds the subject?s memory threshold it will be correctly identified; failure to exceed the threshold will result in a ?miss.? While differing levels of encoding success easily account for differences in hit and miss rates, it would initially seem that threshold theories are unable to account for false alarms. If a participant has never been exposed to an item, it could never have been initially encoded, and it should be incapable of exceeding the memory threshold. Threshold models generally justify the presence of false alarms by stating that when participants do not recognize an item a level of response bias often leads them to guess. As previously mentioned, different responding criterions will result in participants being more or less likely to guess when they do not recognize the item, leading to differing numbers of false alarms. While the one high threshold model?s inability to adequately explain qualities of actual data have led to its disuse (see Bayen, Murnane, & Erdfelder, 1996; Murdock, 1974; Snodgrass & Corwin, 1988), another threshold model, known as the two high threshold model has received some degree of support (Corwin, 1994; Feenan & Snodgrass, 1990). The two high threshold model assumes that two thresholds exist, one separating a state of uncertainty with a state of certainty that the item is a target, and the other separating the uncertain state with a state of certainty that the item is a foil (Corwin, 1994). These two thresholds are assumed to be equal, an assumption supported by research on recognition mirror effects (Snodgrass & Corwin, 1988). The hit rate will 12 include a number of guesses, determined by the participant?s response criterion, while the false alarms will consist solely of guesses from the uncertain state. Threshold theories typically use participants? number of false alarms to determine their true probability of getting a hit. Statistics attempting to measure sensitivity based on threshold theories include P r , the probability of new or old items exceeding the threshold, which simply subtracts the probability of a false alarm from the probability of a hit, by: P r = P(h) ? P(f). (1) P*(h), an estimation of the true hit rate, is calculated by dividing Pr by one minus the probability of a false alarm, or: P*(h) = P(h) - P(f) / 1 - P(f). (2) Most formulas that attempt to correct for guessing also utilize the framework of threshold theories, assuming that either the participant knows the correct response or simply guesses at random (Huibregtse, Admiraal & Meara, 2002). Measuring bias using a two high threshold framework involves calculating B r , which Huibregtse et al.(2002) define as ?the probability of saying yes to an item when in the uncertain state.? B r is calculated by dividing the false alarm probability by 1 minus P r , or: B r = P(f) / [ 1 - (P(h) - P(f)) ] (3) If B r is equal to 0.5 the participant is said to have a neutral bias, anything above or below 0.5 is considered to be due to liberal or conservative bias, respectively. The primary criticism of threshold theories is that recognition data often suggest a continuum of memory strength, as all items are not generally assumed to be equally 13 familiar or unfamiliar (Murdock, 1974). For this reason, strength theories, the most prominent of which is signal detection theory, have abandoned the idea of thresholds. Signal detection theory?s theoretical origins can be traced back to Fechner and Thurstone?s era of psychophysics and their attempts to determine how adept subjects are at distinguishing between stimulus situations containing a signal and noise and those containing only noise (Green & Swets, 1963; Luce, 1963). In applying signal detection theory to recognition memory experiments, the signal is widely considered to be ?strength of evidence? (Pastore, Crawley, Berens & Skelly, 2003, p.560), though what actually constitutes the ?noise? component of a recognition task involving memory has recently come under question. Numerous articles since the introduction of signal detection theory have defined noise in terms of cognitive processes or neural activity interfering with retrieval (Levine & Schefner, 1991). Pastore et al. (2003) criticizes this description of noise, stating that referring to ?noise? as literal cognitive processes misses the original purpose of the concept of noise and negates the basic ideas behind signal detection theory. According to these theorists, noise refers solely to variability in statistical processes, and signal detection theory, rather than being concerned with sensory or cognitive processing, is a ?general model of decision processing of evidence? (p. 560). Regardless of the phenomenological bases behind the use of some of its concepts, the basic premise of signal detection theory as it applies to recognition is that two normal overlapping distributions exist along a continuum of familiarity. One distribution consists of new items and the other consists of old items, and the amount of overlap that exists between these two distributions determines how well a participant is able to 14 distinguish between items in each distribution. The measure used to determine the difference between the two distributions is D?, and is calculated by subtracting the standardized mean of the distribution of hits from the standardized mean of the distribution of false alarms, as: D? = Z f - Z h (4) Two measures of bias have seen widespread use in signal detection analysis, the earliest, ?, is computed as the height of the distribution of hits divided by the height of the distribution of false alarms, or: ? = ?(Z h ) / ?(Z f ) (5) The use of ? in measuring bias has been widely criticized for two key reasons. The first is that the very use of ? in some situations, particularly those involving stimuli that are heterogeneously memorable, assumes that a participant is able to accurately classify the stimulus as belonging to the either the distribution of new or old items, which is exactly what most memory studies are trying to test (Snodgrass & Corwin, 1988). Another problem is that while measures of bias and sensitivity may show a statistical relationship in some data sets, due either to factors acting on both measures or changes in sensitivity affecting bias, they should be computationally independent, a condition ? consistently fails to meet (Snodgrass & Corwin, 1988). For these reasons ? has largely been replaced with C, another measure of bias. Rather than focusing on the heights of the two distributions, C is measured as the distance from the intersection of the two distributions. C can be computed as the average of the standard scores for hits and false alarms, or: C =(Z h + Z f ) / 2. (6) 15 According to signal detection theory, for each participant a point will exist where the two distributions overlap, marking the point where new and old items are equally familiar. If this point also marks the participant?s criterion for responding, a neutral bias is said to exist, and C will be equal to 0. In order for the preceding calculations to be valid, the primary assumption underlying signal detection theory that both distributions are normal must be met. Pollack and Norman (1964) were among the first to call this and other statistical assumptions of signal detection theory into question, as well as offer a distribution-free, or nonparametric, method of analyzing results of yes/no recognition tasks. Because of the difficulty in determining equal variances, particularly if receiver operating characteristic curves can not be calculated due to testing participants only a small number of times, the assumption of normality may in some cases be unwarranted. To best illustrate how nonparametric measures are calculated, data can be plotted in a unit square with hit rate on the x axis and false alarm rate on the y axis. Figure 2 uses this format to show the data point (E) of a subject with a hit rate of 0.7 and a false alarm rate of 0.1. Signal detection theory assumes that because both old and new items are normally distributed, a curve can be created (see Figure 3) on which data point P falls that describes performance based on this one point. Nonparametric analyses instead attempt to determine the average area under a calculated curve denoting performance in an initial trial. Figure 4 illustrates that a curve based on a data point for a subject with a hit rate of .75 and a false alarm rate of .25 could be expected to pass through areas A1 and A2. Figure 2. A hit rate of 0.5 on a unit square. Note. Modified from I. Huibregtse, W. Admiraal, & P. Meara, 2002, Language Testing, 19, 227-245. Figure 3. Using signal detection theory to describe data on a unit square Note. From J. Snodgrass & J.Corwin, 1988, Journal of Experimental Psychology, 117, 34-50. 16 Figure 4. Nonparametric analyses using the unit square. Note. Modified from I. Huibregtse, W. Admiraal, & P. Meara, 2002, Language Testing, 19, 227-245. Several researchers have demonstrated that the area under the average curve created using areas A1 and A2 is a good indicator of memory performance (Pollack & Norman, 1964; Green & Moses, 1966). According to these researchers, such an index makes no assumption of normality or other statistical properties of the participants? distributions (Hodos, 1970). A? is this sensitivity measure for nonparametric tests and can be calculated in terms of Figure 3 as: A?= B + (A1 + A2) / 2. (7) Two actual computational formulas for exist for A? due to the fact that scores can possibly lie above or below the chance diagonal (Line AC in Figure 3). If the number of hits exceed the number of false alarms: A? = .5 + [ (P(h) - P(f)) * (1 + P(h) - P(f)) ] / [ (4 * P(h)) * (1 - P(f)) ]. (8) If the number of false alarms exceed the number of hits, the preceding formula can be modified by simply replacing each occurrence of hits with false alarms, and vice versa. If 17 18 number of hits equal the number of false alarms, A? = .5. Several computations exist for bias in a nonparametric model. Grier (1971) proposed the use of B??, which can be seen in Figure 3 as B??= A1-A2/A1+A2, and can be computed as: B??= [ P(h) * (1-P(h)) - P(f) * (1 - P(f) ] / [ P(h) * (1-P(h)) + P(f) * (1-P(f)) ] ( 9) when the number of hits is greater than or equal to number of false alarms, and can be reversed when false alarm exceed hits by switching all occurrences of hits and false alarms in the formula. Hodos (1970) also proposed a bias index, referred to as B? H , which can be seen in Figure 3 as B? H = A1-A2/A1, and is calculated as: B? H = 1 - { [ P(f) * (1 - P(f)) ]/[ P(h) * (1-P(h)) ]. (10) When hits exceed false alarms, B? H can again be modified by reversing all occurrences of hits and false alarms and subtracting one from the total when false alarms exceed hits. Both equations for bias suggest neutral bias when the measure equals 0, liberal bias when positive, and conservative bias when negative (Snodgrass & Corwin, 1988). For the past 35 years, a number of recognition memory researchers have espoused the use of nonparametric A? due to its supposed lack of assumptions about underlying distributions (Hodos, 1970; Donaldson, 1992; Rhodes, Parkin, & Tremewan, 1993; Pastore et. al, 2003). Recently, Pastore et al (2003) called into question the rejection of signal detection based on its underlying assumptions, criticizing those who laud nonparametric measures as a distribution free alternative. Pastore first comments that the assumption that A? measures the area under a theoretical average ROC curve falls apart at high levels of bias, underestimating sensitivity. Snodgrass and Corwin (1988) had previously made similar comments, and showed through several experiments that the fundamental assumption of independence between measures of bias and sensitivity does 19 not hold true for nonparametric A? and B? measures. Pastore also demonstrates that A? does indeed imply underlying distributions, suggesting that it is actually parametric. Problems such as these have led a number of other researchers to reject the use of A? and B? as well as the use of ? (Snodgrass & Corwin, 1988; Pastore et. al. 2003; Huibregtse, Admiraal & Meara, 2002), and have prompted others to suggest that all data be supported by several indexes, particularly lauding the independence of both P r and D? measures from their corresponding measures of bias, B r and C , and suggest use of both sets of indexes in analyzing recognition data (Snodgrass and Corwin, 1988; Corwin, 1994; Feenan & Snodgrass, 1990). This is particularly true in light of Feenan and Snodgrass? (1990) study which showed significant effects of context on recognition of pictures and words that were observable through the use of some of the above measures, but not others. This is not a recent proposition, for as early as 1970 Lockhart and Murdock warned against the assumption that there was only one ?correct? or ?neutral? way to analyze recognition memory data. The aforementioned paradigms have also served as the basis behind several new indexes. Meara (1992), developed an index which is a transformation of A?, estimating the hit rate that a participant would have scored had they not made any false alarms, calculated as: ?m = [ (P(h)-P(f)) * (1+P(h)-P(f)) ] / [ (P(h) * (1 ? P(f)) ] - 1, (11) This formula is simply the transformation A?(4A? - 3) and thus suffers from the same problems at high levels of bias. If a researcher does not wish to analyze bias separately, it may be factored out using equations such as I SDT , which is presented as being based on a signal detection model, though it shares more similarity with nonparametric A?. I SDT 20 was designed by Huibregtse et al (2002) to be used in analyzing tests of vocabulary, and can be computed by: I SDT = [4 * P(h) * (1 - P(f)) ] ? [2 * (P(h) - P(f)) * (1 + P(h) ? P(f)] ( 12) [4 * P(h) * (1 - P(f)) ] ? [(P(h) - P(f)) * (1 + P(h) ? P(f)]. Huibregtse et al. (2002) attempts to correct for bias by basing his measure on the nonparametric calculations of A? and determining the point at which the average ROC curve for a participant would intersect with the BD diagonal (see Figure 3). Any point on the BD diagonal is assumed to be free from bias, and Huibregtse cites Grier?s (1971) bias measure as the basis for determining where the ROC curve intersects with this diagonal. How effectively Huibregtse?s index incorporates bias correction into a nonparametric analysis has yet to be determined, though it initially appears that he has effectively eliminated the problems that A? encountered at extremely high levels of bias. Due to Snodgrass and Corwin?s aforementioned recommendations (1988), analysis of APTT data will be conducted using several indices. Sensitivity analyses will be conducted using D?, P r, and I SDT , and bias will be assessed through the use of C and B r . Conducting analyses without subscribing to a specific model is an attempt to obtain a well rounded picture of available data. Aside from addressing the problem of scoring yes/no recognition tests, Beeckmans et al. (2001) outlined a number of other methodological concerns regarding their use in assessing student outcomes. Their analysis of second language yes/no recognition tests showed negative correlations between student performance on key terms and foils. They suggest that such an inverse relationship, which is likely a product of response bias clouding the results, calls into question the tests? discriminant validity. 21 They also suggest an analysis of any differences in distribution variance between key terms and foils to attempt to establish whether similar processes and distributions exist for the different types of items. While some of Beeckmans et al.?s criticisms are the basis of procedures for assessing the APTT, their concerns may not apply to the APTT test for several reasons. First of all, Beeckmans et al?s examination was conducted using tests with unequal numbers of foils and key terms, a practice common in second language applications of the yes/no format. Differing numbers of key terms and foils necessitate several adjustments to resulting scores, and may confound some of the basic theoretical assumptions of key term and foil distributions inherent in some formulas. Beeckmans et al. also use a correction for guessing for part of their analysis that does not seem to meet the aforementioned requirements concerning independence of sensitivity and response bias. In an effort to answer some preliminary questions about the format, and assess some basic psychometric properties of the APTT, eight hypotheses were tested, each corresponding to an addressed concern over yes/no recognition tests, validity and reliability of such tests, and test bias: 1. A significant relationship will exist between student scores on the APTT and other performance measures in introductory psychology courses. 2. The correlation between hits and total performance will be equal to the correlation between correct rejections and total performance. 3. Students who perform better on the APTT will show more conservative response biases. 22 4. The APTT will show adequate psychometric properties in each of the following analyses: a. Item and scale means and standard deviations b. Item total correlations and item scale correlations, using total performance as well as hit and correct rejection performance c. Item characteristic curve analysis to determine how well each item discriminates at all levels of performance d. Split half reliability between key terms and foils, as well as alpha e. An exploratory factor analysis to determine the dimensionality of the APTT 5. Some gender differences will exist in APTT performance. 6. Gender differences mentioned in hypothesis 7 will disappear once class performance is taken into account. 7. A significant relationship will exist between performance on the APTT and ability to recall information about key psychology terms. 8. Administration of an alternate form of the APTT, created using the same methodology will yield similar scores, and strong alternate form reliability. 23 Chapter II. EXPERIMENT 1 Method Participants Participants were 259 Auburn University students over the age of 19, enrolled in an introductory psychology course. The instruments were administered at the end of the semester, during the week of final exams. Materials Each student received one of two versions of the Auburn Psychology Term Test (APTT), each consisting of 50 key terms in psychology and 50 foils (see Appendices A and B). In part two of the task, students were given another form consisting of 20 randomly selected items from the alternate version. Between 12 and 15 of these items were key terms, the remainder were foils. Students were asked to determine which of these terms were correct and which were foils in the same manner as they did on the APTT. On the back of this form students chose 10 of the 20 terms that they have identified as key terms in psychology and were asked to ?describe, define, or identify? the terms, giving as much information as they could recall in the space provided. Two versions of part two were created for each version of the APTT, totaling four distinct forms (see Appendix C). Procedure Introductory psychology students were given informed consent forms and 24 administered the APTT, recording their responses on a scantron form. After the completion of the APTT, students were given part two. The relationship between scores on the APTT, using raw scores, I SDT and D?, and course grades was assessed (hypothesis 1), as well as information concerning the relationship between performance on the APTT and hit and correct rejection performance (hypothesis 2), and bias (hypothesis 3). Several validity and reliability measures were assessed as discussed in hypothesis 4. Results were analyzed to assess any performance differences based on gender (hypothesis 5). After such differences were determined to exist, statistical analyses were conducted to determine if these differences could be accounted for by classroom performance (hypothesis 6). The relationship between a student?s APTT performance and his/her ability to recall information demonstrating a working knowledge of key psychology terms, as addressed in hypothesis 7, was also assessed. Responses in this section were graded on a five-point Likert-type scale, with scores (a) denoting that a student demonstrates an ability to recall a significant amount of correct information about the concept, or (b) demonstrates adequate recall ability of concept, consisting of correct statements or ideas that suggest a working knowledge of the item, or (c) demonstrates some knowledge of concept, recalling information that, while incomplete or only partially correct, suggests some knowledge of the core idea, or (d) does not demonstrate an adequate level of recall ability, but does seem to have some idea of the subject matter involved, or (e) demonstrates no recall ability of the term. Two raters independently assigned a score to each response. When the scores for an item were within one number value of each other, the response was scored as the mean of the two. When scores were two or more number values apart, the two raters discussed the item and agreed upon a score. Results Data were initially analyzed using raw scores, as well as indices D?, I SDT , P r , and A?. Because results for the following analyses were virtually identical using all of the above measures, only raw scores will be reported here. A significant relationship was found between student introductory psychology course grade and performance on the APTT, as the correlation between course grade and APTT score was r(257) = .63, p <.01. This finding supports hypothesis 1, that the APTT would show significant relationships with other established measures of student performance. Figure 5 shows this relationship, in which APTT performance is represented as six levels, each representing approximately 20 percent of participants. 1.00 2.00 3.00 4.00 5.00 6.00 APTT Performance 50.00 60.00 70.00 80.00 I n t r o d u c t o r y C o u r s e G r a d e Figure 5. APTT performance and introductory psychology course grade. 25 The correlation between total APTT score and hits, r(257) = .51, p < .01, while significant, was significantly lower than the correlation between total APTT score and correct rejections r(257) = .87, p < .01. Figure 6 illustrates these relationships. A Fisher Z test of the differences between the correlations yielded Z = 8.73, p < 0.01. The second hypothesis predicted equality of the two correlations. This hypothesis was therefore rejected. 1.00 2.00 3.00 4.00 5.00 6.00 APTT Performance 20.00 30.00 40.00 50.00 M e a n H i t T o t a l 1.00 2.00 3.00 4.00 5.00 6.00 APTT Performance 20.00 30.00 40.00 50.00 M e a n F o i l T o t a l Figure 6. APTT performance as a function of key term and foil performance. An analysis of the relationship between total score and response bias, measured using the bias index C, showed a strong correlation r(257) = .49, p < .01. This finding supports the assertions made in hypothesis 3, that a significant relationship would exist between participants? overall APTT performance and response bias. Figure 7 illustrates that as APTT score increases, C increases. An increase in C represents a more conservative responding strategy. 26 1.00 2.00 3.00 4.00 5.00 6.00 APTT Performance -0.30 -0.20 -0.10 0.00 0.10 0.20 M e a n C S c o r e Figure 7. Response bias (C) and APTT performance. The item analysis demonstrated several notable aspects of the test. The item total correlations of all items, shown in Appendix D, and the item characteristic curves, shown in Appendices E and F, suggest significant differences in the effectiveness of items in discriminating good and poor performers. Overall, as demonstrated in Figure 6, foils were far better discriminators of performance than key terms, with 48 of 50 item total correlations reaching significance at p < .05. Only 15 of 50 item total correlations for key terms reached significance at this level. Correlations reached significance more often when performance on items was compared with overall performance on the same class of items, as performance on 33 of 50 key terms significantly correlated with key term performance at p < .05, and all 50 foils significantly correlated with foil performance at p < .05. Scale correlations can be found in Appendices G and H. Cronbach?s Alpha was 27 28 .81 for the test, and a split half reliability analysis between key terms and foils yielded a non-significant result at r(257) = .02, p = .732. Means and standard deviations for individual items can be found in Appendix I. A principle components analysis was conducted to determine the number of factors among APTT items. Using parallel analysis criterion outlined by Lautenschlager (1989) one factor was determined to exist for foils. We then ran a one factor solution using maximum likelihood extraction with Obliman rotation. This produced a one factor model that accounted for 14.6 percent of the data. Items grouping into this factor (with a criterion of .35) are listed in Table 1. Using the same parallel analysis criterion three factors were determined to exist for key terms. We then ran a one factor solution using maximum likelihood extraction with Obliman rotation which produced three factors for key terms. The three factor model for key terms accounted for 13.19 percent of the data. Items grouping into these three factors (using the same criterion of 0.35) are listed in Table 2. Table 1. Items grouping into the factor for foils. Factor 1 somatic transmission post-modern structuralism conditional restriction intersubjective validity spontaneous salivation unconscious neuroticism schema taking score unsystematic sensitization interdependent variable toddler directed speech proto-operational stage neutral correlation biological watch California-Binet test retrograde amnesia phobic malingering instinctual deprivation functional flexibility threshold of non-relativity multiple deviation 29 Table 2 Items grouping into the three factors for key terms. Factor 1 Factor 2 Factor 3 bell-curve - inductive reasoning - unconditioned response bell-curve unconditioned response fundamental attribution error fundamental attribution error fixed action pattern cognitive dissonance just noticeable difference chunking episodic memory Sixteen students did not report gender on their response sheet and were dropped from this analysis. The correlation was significant between gender and APTT score, r(241) = .20, p < .01. This finding supports hypothesis 5, which stated that gender differences would exist in APTT performance. Females performed significantly better than males. When controlling for introductory psychology course grade this correlation was reduced to r(241) = .15, p = .021, remaining significant at the .05 level. A Fisher Z test of differences between these two correlations was not significant at Z = .58, p = .56. Hypothesis 6, which stated that gender differences would be accounted for by differences in introductory psychology class performance, was therefore rejected. A strong relationship was found between ability to recall information about psychology key terms and APTT performance, r(229) = .60, p < .01. Twenty-eight students did not complete the written section, and were subsequently dropped from the analysis. These results support hypothesis 6, which stated that a significant relationship would exist between APTT performance and recall ability. Figure 8 illustrates this relationship. Each of ten written items was worth between one and five points, bringing the total to 50 possible points. 1.00 2.00 3.00 4.00 5.00 6.00 APTT performance 22.50 25.00 27.50 30.00 32.50 W r i t t e n P e r f o r m a n c e o n K e y T e r m s Figure 8. APTT performance and written recall performance on part two. 30 31 Chapter III. EXPERIMENT II A second version of the APTT was created, consisting of 50 different key terms and 50 different foils, using the same procedure outlined in study one to determine alternate form reliability between the two instruments. Method Participants Participants were students enrolled in a research methods course for credit at Auburn University. All participants had previously completed an introductory psychology course, though neither time elapsed from the completion of the course nor introductory course professor or content were controlled. Measures Both versions of the APTT (see Appendices A and B). Procedure Students (n = 40) enrolled in a research methods course in which no pre-testing had occurred were administered both versions of the APTT in random order. Data was analyzed to assess alternate form reliability in the two groups. Results Individual scores on the alternate form of the APTT correlated strongly with APTT performance r(38) = .81, p < .01, which was significant despite the smaller sample size. This supported the assertions of hypothesis 9, which stated that administration of an alternate form of the APTT would yield adequate alternate form reliability. Student scores on the two versions are graphically illustrated on Figure 9. 0 10 20 30 40 50 60 70 80 90 100 1 4 7 1013161922252831343740 Students Pe r f o r m a n c e Version 1 Version 2 Figure 9. Performance comparison on versions one and two across students 32 33 Chapter IV. DISCUSSION Preliminary analyses of the Auburn Psychology Term Test (APTT) suggest that it has strong potential for use in assessing psychology vocabulary knowledge. The significant relationship between classroom performance and APTT score suggests that the APTT is testing basic psychology knowledge. While classroom performance may not be a perfect indicator of student knowledge in the subject matter, the value placed on classroom performance in the educational system suggests that it must be considered to be among the indexes with the strongest potential for the assessment of the material covered. In Pilot studies, performance on the APTT prior to the start of an introductory psychology class has been shown to be at chance levels, suggesting that learning psychology in a classroom setting leads to better performance on the APTT. The results of this study suggest that performance on the APTT may be dependent on the amount of, and depth of understanding of, material learned Beeckmans et al?s (2001) criticism concerning the invariance of the contributions of foils and key terms to overall scores in yes/no recognition tests of learning may apply to the APTT. While the relationship between hit performance and total performance was strong, it was significantly lower than the relationship between foil performance and total performance, suggesting that performance on foils contributed to a student?s total score more than hit performance. A ceiling effect may be observable for many of the key terms, as 22 of 50 key terms were correctly identified by 90% or more of the students 34 taking the test, and the mean percentage correct for key terms was 77%. In comparison, only 5 of 50 foils were correctly identified by 90% or more students, and the mean percentage correct was 72%. In light of memory research on yes/no recognition tests, however, this result could be expected (i.e. Ruiz, Soler, & Dasi, 2004). Whether this invariance is an expected artifact of this testing methodology or should be cause for concern may be open for debate, though the analyses outlined in hypothesis four, which will be discussed shortly, do further our understanding of how each item contributes to overall performance. The third hypothesis sought to confirm the findings in recognition memory literature (i.e. Ruiz, Soler, & Dasi, 2004) concerning the relationship between study time and response bias on yes/no recognition tasks, as well as provide additional evidence for the relationship between familiarity with psychology vocabulary and performance on the APTT. Because Ruiz et al. demonstrated that as study time increased, the propensity of a subject to reject terms that he/she was unsure about also increased, we expected to find a relationship between response bias and overall performance. Results indeed showed a strong relationship between the two measures (r = .49), hence, Ruiz et al?s finding concerning the relationship between the amount of time spent with the material and performance on yes/no recognition tests may be generalizable to classroom settings, and thus the APTT. This also demonstrates that response bias on the APTT is not simply a random artifact of the test, but can be useful along with test performance in assessing 35 student knowledge. Future research on this relationship may help researchers better understand student test taking strategies on yes/no recognition tests and how these strategies relate to actual vocabulary familiarity and knowledge. An analysis of the psychometric properties of the APTT showed several notable points. Cronbach?s Alpha for the test was .81, well above the .70 standard that Nunnaly (1978) deemed an acceptable reliability coefficient, which indicates high internal consistency. However, the most salient result of this analysis was the previously mentioned discrepancy between student performance on key terms and foils. While performance on most foils was significantly correlated with overall test performance (96%), performance on far fewer key terms (30%) showed significant correlations. This invariance can also be seen in split half reliability between key terms and foils, which was not significant at r(257) = .02, p = .732. In light of the previously mentioned finding demonstrating a significantly higher correlation between foils than key terms and overall test performance these findings are hardly surprising. Again, it is possible that we are observing a ceiling effect on key term performance, as 22 key terms were answered correctly by 90% of participants. Also of note was the variability across items in item total correlelations. Because of the nature of the learning environment, this effect on key terms could have been caused by either heterogeneously memorable items or differences in the amount of emphasis placed on concepts during the semester. However, as 36 mentioned previously, all items used in this study were covered during the introductory psychology course, and all were contained in the required textbook. Variability in foil performance is difficult to assess. Phonetic changes were far less common than semantic changes on the APTT administered to the participants, making analysis of differences along this dimension difficult. Difference in word length or number of syllables do not appear to be a factor (see foils in Appendices B and C). Since participants should have had no previous exposure to foils, their relationship to existing vocabulary items would be difficult to determine. The exploratory factor analysis conducted seemed to suggest that the nature of the testing methodology did not lend itself to a salient grouping of items into identifiable factors. A low percentage of items grouped into factors during the analysis of hits and foils (see tables 1 and 2), and no discernable relationship could be established among those that did group into factors. While the dimensionality of the APTT could not be easily determined, the significance of this finding is unclear. Because participants are required to make yes/no decisions, as opposed to a Lickert or multiple choice testing format, and perhaps due in part to the presence of foils whose precise relationship to items in a participant?s existing vocabulary cannot be determined, assessing the dimensionality of the test may not be possible at the present time through use of any available analyses. 37 Gender differences in performance were initially found on the APTT, with gender correlating with APTT performance at r(241) = .20, p < .01, and classroom performance r(241) = .134, p < 0.05. Gender differences were then analyzed by assessing the relationship between gender and APTT performance while controlling for classroom performance. When classroom performance was held constant the relationship between gender and APTT performance did decrease from r = 0.20 to r = 0.15, though this reduction was not significant. While the APTT may contain some gender differences, these could potentially be the result of other factors, such as differing study habits. The nature of the relationship between recognition and recall may be particularly relevant in determining the effectiveness of recognition tests for assessing student knowledge. How effectively students were able to demonstrate general psychology vocabulary knowledge in an essay-type recall task was assessed by giving students random blocks of terms from the alternate version and asking them to provide ?as much information as they know? about each term. While testing each student ?s recall ability using the same terms he/she received on his/her version of the APTT would certainly have provided useful results, it was our intention to separate the assessment of students? overall psychology vocabulary recall ability from their knowledge of the particular terms in the version of the APTT that each student received. The inclusion of foils, which consisted of 5 to 8 out of the 20 items, also could have resulted in some artifacts of the APTT?s testing methodology clouding the results. However, the strong correlations 38 between written performance of psychology vocabulary items and APTT performance could not likely be seen wholly as the result of these methodological issues. Several items were found to be poor predictors of student knowledge on both the essay and APTT portions of the study, and eliminating these items from the analysis resulted in correlations which were higher than those reported. The strong relationship between the recall and recognition portions of the test suggests similar processes or abilities at work in recognition on the APTT and recall of psychology vocabulary items, and may suggest a blurring of the distinction between the underlying processes. As in most educational tests, both consist of decontextualized vocabulary items, and may be testing the same basic ability. If so, the convenience of the yes/no format and sophistication of signal detection analyses provide further support for the usefulness of the test. Administration of both an alternate form of the APTT and the original version to a group of student established a strong relationship between the two versions (r = .81). Aside from demonstrating strong alternate form reliability between the two versions, as well as establishing a viable second version of the test, the relationship between the two tests may speak to the stability of this testing methodology. Despite the fact that both versions contained entirely different key terms and foils, and (unlike in experiment one) that exposure to different key terms was not controlled for by participants having been enrolled in the same introductory psychology class at the time, performance was fairly 39 reliable across versions. This may suggest that the testing methodology used is more important than the particular terms contained in the test, though, as found in experiment one, some terms were better than others at discriminating strong and weak performers. Any analysis of the APTT as a test of psychology vocabulary knowledge should take into consideration certain theoretical differences inherent in educational and language testing discourse. Chapelle (1998) describes the division between trait and interactionalist approaches to second language acquisition research, a division that outlines some of the criticisms of yes/no recognition tests in that field. Trait theorists generally contend that test performance reflects relatively stable ?underlying processes or structures? (Messick, 1989, p. 15). Such theorists view language performance along four dimensions of use, including vocabulary size, knowledge of word features and characteristics, organization in the mental lexicon, and use of fundamental semantic, phonological, and morphological vocabulary processes (Chapelle, 1998). Performance along these dimensions of general knowledge and cognitive processes are considered to represent a stable, measurable ability to use the target language. Interactionalist theorists? most consistent criticism of trait theories, and thus, vocabulary tests as measures of language ability, stems from what they perceive as a disregard of context (Chapelle, 1998). Such theorists assert that tests of language should take the pragmatic and contextual features of the word into consideration. Several researchers suggest that subjects? ability to recognize words, or even their comprehension 40 of these words, may not demonstrate an ability to use them in context (Laufer & Paribakht, 1998; Reed, 2000). However, other researchers have expressed concerns over the use of context, suggesting that some tests of language proficiency may measure inferencing skills as much as actual word knowledge (i.e. Laufer, 2004). While recognizing that not all terms that a participant recognizes may be fully understood, it is unlikely that participants will be capable of using in context, or recalling information about, terms that they are unable to recognize. Some of the most frequent criticisms of the use of yes/no recognition tests in language testing do not necessarily apply to the proposed research. Many of these concerns involve factors such as phonotactic probability differences between languages (Beekmans et al., 2001). Cameron (2002) laments the possibility that students, having encountered unfamiliar words throughout the educational process, may become accustomed to such encounters and have more difficulty distinguishing words from non- words. Read (1997) expressed similar sentiments, arguing against the use of foils because low-level learners have more difficulty with the use of non-words. However, these concerns may in fact demonstrate the strength of the format, rather than its weaknesses. Those who may be considered ?low-level learners? should perform more poorly on this test, and those who are more familiar with psychology terms and concepts involved should have a better knowledge of what they do not know as well as what they know, leading to better performance on foils. This contention is supported 41 by Ruiz et al.?s (2004) studies on study times and response bias discussed earlier, as well as the results of this study. Another criticism of the yes/no format involves instructions given to test takers in language research (Beeckmans et al, 2001; Laufer & Paribakht, 1998). Typically, these tests ask participants to identify words for which they know the meaning, a standard that may have different implications for different test takers. By giving instructions in this manner, those in the language field are separating what many memory researchers contend are two types of recognition memory judgments (i.e. Atkinson & Juola, 1974; Mandler, 1980; Wixted & Stretch, 2004). These theorists suggest that one recognition process simply involves a sense of familiarity, and another involves a ?conscious recollection? or identification of the information involved (Neath & Surprenant, 2003, p. 210). In an attempt to avoid this dichotomy, the instructions of the APTT simply ask the participant to discriminate actual psychology terms from foils. Either process, if such a distinction truly exists, could thus lead to the participant?s response. The APA task force on student assessment encouraged the use of locally developed tests to supplement in-class indices of student performance. The instrument designed and tested in this study, the Auburn Psychology Term Test (APTT), has been demonstrated to be reliable and valid as well as economical in terms of time and resources. This study looked at the relationship between this instrument and several indicators of student performance, including introductory course grade and ability to 42 identify and define psychology vocabulary items, and found strong relationships between these variables. The internal properties of this test were also assessed through item analyses and an exploratory factor analysis, which demonstrated that some variance exists in the effectiveness of APTT items, and suggested that the dimensionality of the APTT may be difficult to determine. An alternate form was also created, and the two tests showed strong alternate form reliability, indicating the formats consistency. Other researchers using similar tests have found them to be good measures of a number of student characteristics, most notably; vocabulary knowledge. Such research has also found that students like the format in comparison with other testing formats. Additionally, the signal detection analysis encourages integration with an extensive literature on recognition memory. For these and other reasons, it is hopeful that other educators and researchers will find the APTT useful. 43 REFERENCES Anderson,. R.C., & Freebody, P. (1983). Reading comprehension and the assessment and acquisition of word knowledge. In B. Hutson (Ed.), Advances in Reading/Language Research: A Research Annual. Greenwich, CT: JAI Press, 231-256. Asthana, B., & Nagrani, S. (1984). Recall and recognition as a function of levels of processing. Psycho-Lingua, 14, 85?94. Atkinson, R. C., & Juola, J. F. (1974). Search and decision processes in recognition memory. In D. H. Krantz, R. C. Luce, & P. Suppes (Eds.) Contemporary developments in mathematical psychology (Vol. 1). San Francisco: Freeman. Banks, W. P. (1970). Signal detection theory and human memory. Psychological Bulletin, 74, 81-99. Bayen, U. J., Murnane, K., & Erdfelder, E. (1996). Source discrimination, item detection, and multinomial models of source monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 197-215. Beeckmans, R., Eyckmans, J., Janssens, V., Dufranne, M., & Van de Velde, H. (2001). Examining the yes/no vocabulary test: some methodological issues in theory and practice. Language Testing, 18, 235-274. 44 Cameron, L. (2002). Measuring vocabulary size in English as an additional language. Language Teaching Research, 6, 145-173. Challis, B. H., Velichkovsky, B. M., & Craik, F. (1996). Levels-of-Processing Effects on a Variety of Memory Tasks: New Findings and Theoretical Implications. Consciousness and Cognition, 5, 142-164. Chapelle, C. (1998). Construct definition and validity inquiry in SLA research. In L. Bachman, A. Cohen (Eds.), Interfaces between second language acquisition and language testing research. Cambridge: Cambridge University Press. Corwin, J. (1994). On measuring discrimination and bias: Unequal numbers of targets and distractors and two classes of distractors. Neuropsychology, 8, 110-117. Cunningham, A., & Stanovich, K. (1997). Early reading acquisition and its relation to reading experience and ability 10 years later. Developmental Psychology, 33, 934- 945. Donaldson, W. (1992). Measuring recognition memory. Journal of Experimental Psychology: General, 121(3), 275-277. Feenan, K., & Snodgrass, J. (1990). The effect of context on discrimination and bias in recognition memory for pictures and words. Memory & Cognition, 18(5), 515 -527. Gillund, G., & Shiffrin, R. (1984). A retrieval model for both recognition and recall. Psychological Review, 91, 1-67. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. 45 Green, D. M., & Moses, F. L. (1966). On the equivalence of two recognition measures of short term memory. Psychological Bulletin, 66, 228-234. Griggs, R., Bujak-Johnson, A., & Proctor, D., (2004) Using common core vocabulary in text selection and teaching the introductory course. Teaching of Psychology, 31, 265-269. Grier, J. B. (1971). Nonparametric indexes for sensitivity and bias: Computing formulas. Psychological Bulletin, 75(6), 424-429. Halonen et. al. (2002) The Assessment CyberGuide for Learning Goals and Outcomes in the Undergraduate Psychology Major. http://www.apa.org/ed/guide_outline.html Hintzman, D. L. (1988). Judgments of frequency and recognition in a multiple-trace memory. Psychological Review 95, 528-551. Hodos, W. (1970). Nonparametric index of response bias for use in detection and recognition experiments. Psychological Bulletin, 74, 351-354. Huibregtse, I., Admiraal, W., & Meara, P. (2002). Scores on a yes-no vocabulary test: correction for guessing and response style. Language Testing, 19, 227-245. Kojic-Sabo, I., Lightbown, P. (1999). Students? approaches to vocabulary knowledge and their relationship to success. The Modern Language Journal, 83, 176-192. Laufer, B. (2004). Size and strength: do we need both to measure vocabulary knowledge? Language Testing, 21, 202-226. Laufer, B., & Paribakht, T. S. (1998). The relationship between passive and active vocabularies: Effects of learning context. Language Learning, 48, 365-391. 46 Lautenschlager, G. (1989). A comparison of alternatives to conducting Monte Carlo analyses for determining parallel analysis criteria. Mulitvariate Behavioral Research, 24, 365- 396. Levine, M. W., & Shefner, J. M. (1991). Fundamentals of sensation and perception (2 nd ed.). Pacific Grove, CA: Brooks & Cole. Lockhart, R., & Murdock, B. (1970). Memory and the theory of signal detection. Psychological Bulletin, 74, 100-109. Loring, T. (1995) A Yes/No Vocabulary Test in a University Placement Setting. Unpublished masters thesis. Luce, D. R. ( 1959). Individual Choice Behavior. New York: John Wiley & Sons. Luce, D. R., Bush, R., & Galanter, E. (1963) Handbook of mathematical psychology (Vol. 1). New York: John Wiley & Sons. Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87, 252-271. Meara, P. M., & Buxton, B. (1987). An alternative to multiple choice vocabulary tests. Language Testing, 4, 142-154. Meara, P. M. (1992). EFL Vocabulary Test. Swansea, UK. Centre for Applied Language Studies. Murdock, B. B. (1974). Human memory: Theory and data. Potomac, MD: Lawrence Erlbaum Associates. Neath, I., Surprenant, A. (2003). Human Memory: An introduction to research, data, and theory. Belmont: Thompson & Wadsworth. 47 Nunnaly, J. (1978). Psychometric theory. New York: McGraw-Hill. Pastore, R., Crawley, E., Berens, M., & Skelly, M. (2003). ?Nonparametric? A? and other modern misconceptions about signal detection theory. Psychnomic Bulletin & Review, 10, 556-569. Pollack, I., Norman, D. A. (1964). A non-parametric analysis of recognition experiments. Psychonomic Science, 1, 125-126. Ratcliff, R., & Murdock, B. B. (1976). Retrieval processes in recognition memory. Psychological Review, 83, 190-214. Reed, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press. Rhodes, G., Parkin, A. J., & Tremewan, T. (1993). Semantic priming and sensitivity in lexical decision. Journal of Experimental Psychology: Human Perception and performance, 15, 154-165. Ruiz, J. C., Soler, M. J., & Dasi, C. (2004). Study time effects in recognition memory. Perceptual and Motor Skills, 98, 638-642. Snodgrass, J., & Corwin, J. (1988). Pragmatics of measuring recognition memory: Applications to dementia and amnesia. Journal of Experimental Psychology, 117, 34-50. Snodgrass, J., Levy-Berger, G., & Haydon, M. (1985). Human experimental psychology. New York: Oxford University Press. Snodgrass, J., Volvovitz, R., & Walfish, E. R. (1972). Recognition memory for words, pictures, and words + pictures. Psychonomic Science, 27, 345-347. Stanovich, K. E. (2000). Progress in understanding reading: Scientific foundations and new frontiers. New York: Guilford. 48 Stanovich, K. E. & Cunningham, A. E. (1992). Studying the consequences of literacy within a literate society: The cognitive correlates of print exposure. Memory & Cognition, 20, 51-68. Stanovich, K. E. & West, R. F. (1989). Exposure to print and orthographic processing. Reading Research Quarterly, 24, 402-433. Stanovich, K. E., West, R. F. & Harrison, M. (1995). Knowledge growth and maintenance across the life span: The role of print exposure. Developmental Psychology, 31, 811-826. Tulving, E., & Thompson, D. M. (1971). Retrieval processes in recognition memory: Effects of associative context. Journal of Experimental Psychology, 87, 116 124. West, R., & Stanovich, K. (1991). The incidental acquisition of information from reading. Psychological Science, 2, 325-330. West, R. F., Stanovich, K. E. & Mitchell, H. R. (1993). Reading in the real world and its correlates. Reading Research Quarterly, 28, 34-50. Wixted, J. T., & Stretch, V. (2004). In defense of the signal detection interpretation of remember/know judgments. Psychonomic Bulletin & Review, 11(4), 616-41 49 APPENDICES 50 Appendix A Auburn Psychology Term Test Version 1 (*Bold items are key terms) Below, 100 terms are listed. Some of them are key psychological terms that you encountered in lectures and reading the textbook. Others will be unfamiliar to you, because they are bogus, fabricated terms that sound like psychological terms, but are not ?real? psychology terms. Your task is to identify which of the terms are real and which are fabricated. For example, terms such ?memory? and ?Ivan Pavlov? are both associated with psychology, so you would mark ?A? on the scantron. Likewise, ?intestinal myopia? and ?terminal distress? are not part of psychology, so for these terms you would mark ?B.? Please look at each item, then bubble in ?A? if you recognize it as a real term, and ?B? if you think the term is bogus. 1 adolescent amnesia 34 big 5 personality factors 68 law of effect 2 transduction 35 hapless motivation 69 unconditioned response 3 action potential 36 sleep activation 70 dark adaptation 4 comfort touch 37 multiple deviation 71 unsystematic sensitization 5 schema taking score (STS) 38 Shaping 72 operational definition 6 sexual identity 39 general intelligence (g) 73 threshold of non-relativity 7 secondary reinforcer 40 proto-operational stage 74 bystander apathy effect (BAE) 8 James Farber 41 James-Lange theory 75 insensitive period 9 cognitive dissonance 42 neutral correlation 76 circadian rhythm 10 critical period 43 retrograde memory 77 paradoxical sleep 11 token economy 44 species-typical behavior 78 spontaneous salivation 12 chunking 45 Wernicke?s area 79 fundamental attribution error 13 alpha-wave effect 46 latitudinal study 80 unipolar disorder 14 ghost limb 47 somatic transmission 81 Festinger-Maslow effect 15 empiricism 48 Synapse 82 just noticeable difference(JND) 16 gestation psychology 49 Psychotransference 83 William James 17 standard deviation 50 biological watch 84 California-Binet test 18 Jean Piaget 51 inductive reasoning 85 interdependent variable 19 language acquisition device 52 instinctual deprivation 86 sensorimotor stage 20 dendritic hypo-potential 53 indifferent schizophrenia 87 introspection 21 longitudinal study 54 unconscious neuroticism 88 duozygotic twins 22 negative feedback 55 null hypothesis 89 phobic malingering 23 libido 56 successful approximation 90 ego complex 24 superstitious relaxation 57 psychogenic amnesia 91 episodic memory 25 bell curve 58 reaction range 92 cognitive-behavioral therapy 26 antisocial facilitation 59 toddler-directed speech (TDS) 93 conditional restriction 27 animalism 60 obsessive compulsive disorder (OCD) 94 activation-synthesis hypothesis 28 functional flexibility 61 proactive interference 95 intersubjective validity 29 neurostasis 62 terminal stasis 96 operant encoding 30 fixation 63 distance IQ 97 systematic desensitization 31 dendrite 64 Bronski?s area 98 post-modern structuralism 32 motivational intelligence 65 test-retest reliability 99 latent gratification 33 attachment 66 Temperament 100 fixed action pattern (FAP) 67 objective well-being 51 Appendix B Auburn Psychology Term Test Version 2 (*Bold items are key terms) Below, 100 terms are listed. Some of them are key psychological terms that you encountered in lectures and reading the textbook. Others will be unfamiliar to you, because they are bogus, fabricated terms that sound like psychological terms, but are not ?real? psychology terms. Your task is to identify which of the terms are real and which are fabricated. For example, terms such ?memory? and ?Ivan Pavlov? are both associated with psychology, so you would mark ?A? on the scantron. Likewise, ?intestinal myopia? and ?terminal distress? are not part of psychology, so for these terms you would mark ?B.? Please look at each item, then bubble in ?A? if you recognize it as a real term, and ?B? if you think the term is bogus. 1 blindsight 34 ecological validity 68 Stanford-WAIS 2 id therapy 35 Genomotypic 69 bystander effect 3 anterograde amnesia 36 Thalamus 70 Wilhelm Wundt 4 aphagia 37 Structuralism 71 zeitgeiber 5 homeostasis 38 stimulus generalization 72 transference 6 tri-delta waves 39 group-actualization theory 73 language imprinting device (LID) 7 physiological clock 40 unnatural selection 74 transdifferentation 8 discontinuous reinforcement 41 Fractionalism 75 social loafing 9 dissociation 42 confounding variable 76 arm-in-the-door technique 10 "Big Ten" Personality Factors 43 activation-synthesis hypothesis 77 frustration-repression hypothesis 11 adaptation 44 invalidation therapy 78 self-actualization 12 phenotype 45 polar cells 79 psychosomatic disorder 13 work memory 46 Assimilation 80 synaptic contusion 14 convergence 47 Maslow's Hierarchy of Emotion 81 parallel amnesia 15 indiscriminate learning 48 split-cell research (SCR) 82 person esteem 16 replicated repetition 49 RPM Sleep 83 factor analysis 17 retinal disparity 50 myelin sheath 84 DSM-IV 18 conservation of volume 51 narcissistic schizophrenia 85 crystalized intelligence 19 observational validity 52 set point theory 86 Flynn defect 20 Intellectual Quotient (IQ) 53 somatosensory cortex 87 telegram speech 21 neurosis 54 variable ration schedule 88 phenome 22 involutional study 55 serial position effect 89 inprinting 23 liquid intelligence 56 sensitization cycle 90 mental set 24 psychosexual stages 57 unconditional negative regard 91 group mind 25 linguistic relativity hypothesis 58 Schema 92 retroactive interference 26 semantic loop 59 bottom-down processes 93 somalization 27 kin selection 60 general activation syndrome (GAS) 94 hypochondriasis 28 inheritability 61 Biofeedback 95 free association 29 need-for-improvement theory 62 attribution theory 96 tetrogen 30 Edward Dubranski 63 type C Personality 97 algorerhythm 31 hedonism 64 monozygotic twin 98 conversion disorder 32 Cannon-Bard theory 65 learned helplessness 99 Stroop defect 33 experimenter bias 66 collective conscience 100 spontaneous recovery 67 delay theory 52 Appendix C Materials for the written recall portion of Experiment I On this concluding portion of the study, pick any of the ten terms that you marked ?real? on the reverse side, and briefly identify them. Write the term in the space provided, and describe/define/identify that term in one or two sentences in the space provided. Version 1 terms Form A replicated repetition retinal disparity conservation of volume intellectual quotient (IQ) neurosis involutional study liquid intelligence psychosexual stages linguistic relativity hypothesis kin selection semantic loop myelin sheath bottom down process factor analysis conversion disorder RPM sleep Cannon-Bard theory sensitization cycle serial position effect natural selection Form B blindsight id therapy anterograde amnesia aphagia homeostasis tri-delta waves structuralism physiological clock learned helplessness dissociation attribution theory adaptation phenotype indiscriminate learning work memory convergence Thalamus schema spontaneous recovery DSM-IV 53 Version 2 terms Form A transduction adolescent amnesia action potential sexual identity cognitive dissonance comfort touch critical period token economy chunking alpha-wave effect empiricism standard deviation Jean Piaget longitudinal study bell curve superstitious relaxation dendrite attachment shaping hapless motivation Form B Big 5 personality factors sleep activation general intelligence (g) James-Lange theory Wernicke?s area attitudinal study synapse biological watch inductive reasoning proactive interference reaction range terminal stasis test-retest reliability objective well-being dark adaptation operational definition bystander-apathy effect (BAE) circadian rhythm bipolar disorder independent variable 54 Appendix D Item Total Correlations Correlations TOTAL Pearson Correlation .008 Sig. (2-tailed) .923 Q2 N 133 Pearson Correlation .282(**) Sig. (2-tailed) .001 Q3 N 133 Pearson Correlation -.157 Sig. (2-tailed) .071 Q4 N 133 Pearson Correlation -.140 Sig. (2-tailed) .109 Q6 N 133 Pearson Correlation .127 Sig. (2-tailed) .144 Q9 N 133 Pearson Correlation .163 Sig. (2-tailed) .061 Q10 N 133 Pearson Correlation .063 Sig. (2-tailed) .474 Q11 N 133 Pearson Correlation .256(**) Sig. (2-tailed) .003 Q12 N 133 Pearson Correlation .135 Sig. (2-tailed) .122 Q15 N 133 Pearson Correlation .150 Sig. (2-tailed) .084 Q17 N 133 Pearson Correlation .296(**) Sig. (2-tailed) .001 Q18 N 133 Pearson Correlation .159 Sig. (2-tailed) .068 Q19 N 133 Pearson Correlation -.045 Sig. (2-tailed) .610 Q21 N 133 Pearson Correlation -.008 Sig. (2-tailed) .930 Q22 N 133 Pearson Correlation -.165 Sig. (2-tailed) .057 Q23 N 133 Pearson Correlation .157 Sig. (2-tailed) .071 Q25 N 133 Pearson Correlation -.164 Sig. (2-tailed) .059 Q30 N 133 Pearson Correlation -.015 Sig. (2-tailed) .867 Q31 N 133 Pearson Correlation .267(**) Sig. (2-tailed) .002 Q33 N 133 Pearson Correlation .103 Sig. (2-tailed) .239 Q34 N 133 Pearson Correlation .005 Sig. (2-tailed) .952 Q38 N 133 Pearson Correlation .114 Sig. (2-tailed) .191 Q39 N 133 Pearson Correlation .351(**) Q41 Sig. (2-tailed) .000 55 N 133 Pearson Correlation .132 Sig. (2-tailed) .130 Q45 N 133 Pearson Correlation .030 Sig. (2-tailed) .728 Q48 N 133 Pearson Correlation .078 Sig. (2-tailed) .373 Q51 N 133 Pearson Correlation .211(*) Sig. (2-tailed) .015 Q55 N 133 Pearson Correlation .136 Sig. (2-tailed) .119 Q58 N 133 Pearson Correlation .112 Sig. (2-tailed) .200 Q60 N 133 Pearson Correlation -.015 Sig. (2-tailed) .866 Q61 N 133 Pearson Correlation .219(*) Sig. (2-tailed) .011 Q65 N 133 Pearson Correlation -.017 Sig. (2-tailed) .848 Q66 N 133 Pearson Correlation .270(**) Sig. (2-tailed) .002 Q68 N 133 Pearson Correlation .121 Sig. (2-tailed) .166 Q69 N 133 Pearson Correlation .070 Sig. (2-tailed) .424 Q70 N 133 Pearson Correlation .116 Q72 Sig. (2-tailed) .182 N 133 Pearson Correlation .258(**) Sig. (2-tailed) .003 Q76 N 133 Pearson Correlation -.056 Sig. (2-tailed) .524 Q77 N 133 Pearson Correlation .245(**) Sig. (2-tailed) .005 Q79 N 133 Pearson Correlation .134 Sig. (2-tailed) .124 Q80 N 133 Pearson Correlation .521(**) Sig. (2-tailed) .000 Q82 N 133 Pearson Correlation .204(*) Sig. (2-tailed) .018 Q83 N 133 Pearson Correlation .375(**) Sig. (2-tailed) .000 Q86 N 133 Pearson Correlation .133 Sig. (2-tailed) .127 Q87 N 133 Pearson Correlation .248(**) Sig. (2-tailed) .004 Q91 N 133 Pearson Correlation .004 Sig. (2-tailed) .962 Q92 N 133 Pearson Correlation -.053 Sig. (2-tailed) .541 Q94 N 133 Pearson Correlation -.089 Sig. (2-tailed) .310 Q97 N 133 Pearson Correlation .174(*) Q100 Sig. (2-tailed) .045 56 N 133 Pearson Correlation .294(**) Sig. (2-tailed) .001 Q1 N 133 Pearson Correlation .344(**) Sig. (2-tailed) .000 Q5 N 133 Pearson Correlation .017 Sig. (2-tailed) .842 Q7 N 133 Pearson Correlation .199(*) Sig. (2-tailed) .022 Q8 N 133 Pearson Correlation .190(*) Sig. (2-tailed) .028 Q13 N 133 Pearson Correlation .207(*) Sig. (2-tailed) .017 Q14 N 133 Pearson Correlation .175(*) Sig. (2-tailed) .044 Q16 N 133 Pearson Correlation .160 Sig. (2-tailed) .066 Q20 N 133 Pearson Correlation .257(**) Sig. (2-tailed) .003 Q24 N 133 Pearson Correlation .337(**) Sig. (2-tailed) .000 Q26 N 133 Pearson Correlation .364(**) Sig. (2-tailed) .000 Q27 N 133 Pearson Correlation .330(**) Sig. (2-tailed) .000 Q28 N 133 Pearson Correlation .476(**) Q29 Sig. (2-tailed) .000 N 133 Pearson Correlation .299(**) Sig. (2-tailed) .000 Q32 N 133 Pearson Correlation .260(**) Sig. (2-tailed) .003 Q35 N 133 Pearson Correlation .318(**) Sig. (2-tailed) .000 Q36 N 133 Pearson Correlation .356(**) Sig. (2-tailed) .000 Q37 N 133 Pearson Correlation .384(**) Sig. (2-tailed) .000 Q40 N 133 Pearson Correlation .393(**) Sig. (2-tailed) .000 Q42 N 133 Pearson Correlation .379(**) Sig. (2-tailed) .000 Q43 N 133 Pearson Correlation .473(**) Sig. (2-tailed) .000 Q44 N 133 Pearson Correlation .271(**) Sig. (2-tailed) .002 Q46 N 133 Pearson Correlation .438(**) Sig. (2-tailed) .000 Q47 N 133 Pearson Correlation .241(**) Sig. (2-tailed) .005 Q49 N 133 Pearson Correlation .349(**) Sig. (2-tailed) .000 Q50 N 133 Pearson Correlation .292(**) Q52 Sig. (2-tailed) .001 57 N 133 Pearson Correlation .411(**) Sig. (2-tailed) .000 Q53 N 133 Pearson Correlation .401(**) Sig. (2-tailed) .000 Q54 N 133 Pearson Correlation .256(**) Sig. (2-tailed) .003 Q56 N 133 Pearson Correlation .512(**) Sig. (2-tailed) .000 Q57 N 133 Pearson Correlation .287(**) Sig. (2-tailed) .001 Q59 N 133 Pearson Correlation .468(**) Sig. (2-tailed) .000 Q62 N 133 Pearson Correlation .231(**) Sig. (2-tailed) .007 Q63 N 133 Pearson Correlation .441(**) Sig. (2-tailed) .000 Q64 N 133 Pearson Correlation .477(**) Sig. (2-tailed) .000 Q67 N 133 Pearson Correlation .369(**) Sig. (2-tailed) .000 Q71 N 133 Pearson Correlation .272(**) Sig. (2-tailed) .002 Q73 N 133 Pearson Correlation .457(**) Sig. (2-tailed) .000 Q74 N 133 Pearson Correlation .312(**) Q75 Sig. (2-tailed) .000 N 133 Pearson Correlation .396(**) Sig. (2-tailed) .000 Q78 N 133 Pearson Correlation .216(*) Sig. (2-tailed) .012 Q81 N 133 Pearson Correlation .387(**) Sig. (2-tailed) .000 Q84 N 133 Pearson Correlation .420(**) Sig. (2-tailed) .000 Q85 N 133 Pearson Correlation .542(**) Sig. (2-tailed) .000 Q88 N 133 Pearson Correlation .394(**) Sig. (2-tailed) .000 Q89 N 133 Pearson Correlation .166 Sig. (2-tailed) .056 Q90 N 133 Pearson Correlation .411(**) Sig. (2-tailed) .000 Q93 N 133 Pearson Correlation .410(**) Sig. (2-tailed) .000 Q95 N 133 Pearson Correlation .483(**) Sig. (2-tailed) .000 Q96 N 133 Pearson Correlation .432(**) Sig. (2-tailed) .000 Q98 N 133 Pearson Correlation .219(*) Sig. (2-tailed) .011 Q99 N 133 ** Correlation is significant at the 0.01 level (2- tailed). * Correlation is significant at the 0.05 level (2- tailed). Appendix E Item Characteristic Curves for Key Terms ABILITY 6.005.004.003.002.001.00 Me a n Q 2 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mea n Q 3 1.0 .9 .8 .7 .6 .5 ABILITY 6.005.004.003.002.001.00 Mean Q 4 .7 .6 .5 .4 .3 .2 .1 ABILITY 6.005.004.003.002.001.00 Mean Q 6 1.0 .9 .8 .7 58 ABILITY 6.005.004.003.002.001.00 Mean Q 7 1.0 .9 .8 .7 .6 .5 ABILITY 6.005.004.003.002.001.00 Mean Q 9 1.01 1.00 .99 .98 .97 .96 .95 .94 ABILITY 6.005.004.003.002.001.00 Mean Q 1 0 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 1 1 .16 .14 .12 .10 .08 .06 .04 .02 0.00 ABILITY 6.005.004.003.002.001.00 Mean Q 1 2 1.1 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 1 5 1.1 1.0 .9 .8 .7 59 ABILITY 6.005.004.003.002.001.00 Mean Q 1 7 1.1 1.0 .9 .8 ABILITY 6.005.004.003.002.001.00 Mean Q 1 8 1.1 1.0 .9 .8 ABILITY 6.005.004.003.002.001.00 Mean Q 1 9 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 2 1 .6 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 2 2 .98 .97 .96 .95 .94 .93 .92 .91 ABILITY 6.005.004.003.002.001.00 Mean Q 2 3 .8 .7 .6 .5 .4 60 ABILITY 6.005.004.003.002.001.00 Mean Q 2 5 1.02 1.00 .98 .96 .94 .92 .90 .88 ABILITY 6.005.004.003.002.001.00 Mean Q 3 0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 3 1 1.1 1.0 .9 .8 ABILITY 6.005.004.003.002.001.00 Mean Q 3 3 1.1 1.0 .9 .8 ABILITY 6.005.004.003.002.001.00 Mean Q 3 4 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 3 8 .8 .7 .6 .5 61 ABILITY 6.005.004.003.002.001.00 Mean Q 3 9 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 4 1 1.0 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 4 5 1.01 1.00 .99 .98 .97 .96 .95 .94 ABILITY 6.005.004.003.002.001.00 Mean Q 4 8 1.02 1.00 .98 .96 .94 .92 .90 ABILITY 6.005.004.003.002.001.00 Mean Q 5 1 1.1 1.0 .9 .8 ABILITY 6.005.004.003.002.001.00 Mean Q 5 5 1.1 1.0 .9 .8 .7 62 ABILITY 6.005.004.003.002.001.00 Mean Q 5 8 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 6 0 1.01 1.00 .99 .98 .97 .96 .95 .94 ABILITY 6.005.004.003.002.001.00 Mean Q 6 1 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 6 5 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 6 6 1.1 1.0 .9 .8 ABILITY 6.005.004.003.002.001.00 Mean Q 6 8 1.0 .9 .8 .7 .6 .5 63 ABILITY 6.005.004.003.002.001.00 Mean Q 6 9 1.02 1.00 .98 .96 .94 .92 .90 .88 .86 .84 ABILITY 6.005.004.003.002.001.00 Mean Q 7 0 .6 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 7 2 .64 .62 .60 .58 .56 .54 .52 .50 ABILITY 6.005.004.003.002.001.00 Mean Q 7 6 1.1 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 7 7 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 7 9 1.02 1.00 .98 .96 .94 .92 .90 .88 64 ABILITY 6.005.004.003.002.001.00 Mean Q 8 0 .4 .3 .2 .1 0.0 ABILITY 6.005.004.003.002.001.00 Mean Q 8 2 1.1 1.0 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 8 3 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 8 6 1.1 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 8 7 1.1 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 9 1 1.1 1.0 .9 .8 65 ABILITY 6.005.004.003.002.001.00 Mean Q 9 2 .9 .8 .7 .6 .5 ABILITY 6.005.004.003.002.001.00 Mean Q 9 7 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 9 4 .6 .5 .4 .3 .2 .1 ABILITY 6.005.004.003.002.001.00 Mea n Q100 1.02 1.00 .98 .96 .94 .92 .90 .88 66 Appendix F Item Characteristic Curves for Foils ABILITY 6.005.004.003.002.001.00 Mean Q 1 1.0 .9 .8 .7 .6 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 5 1.0 .9 .8 .7 .6 .5 ABILITY 6.005.004.003.002.001.00 Mean Q 8 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 1 3 .9 .8 .7 .6 .5 .4 .3 67 ABILITY 6.005.004.003.002.001.00 Mean Q 1 4 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 1 6 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 2 0 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 2 4 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 2 6 .9 .8 .7 .6 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 2 7 1.0 .9 .8 .7 .6 .5 .4 68 ABILITY 6.005.004.003.002.001.00 Mean Q 2 8 1.0 .9 .8 .7 .6 .5 ABILITY 6.005.004.003.002.001.00 Mean Q 2 9 1.0 .8 .6 .4 .2 0.0 ABILITY 6.005.004.003.002.001.00 Mean Q 3 2 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 3 5 1.1 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 3 6 1.0 .9 .8 .7 .6 .5 ABILITY 6.005.004.003.002.001.00 Mean Q 3 7 .9 .8 .7 .6 .5 .4 69 ABILITY 6.005.004.003.002.001.00 Mean Q 4 0 1.2 1.0 .8 .6 .4 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 4 2 1.1 1.0 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 4 3 .7 .6 .5 .4 .3 .2 .1 0.0 ABILITY 6.005.004.003.002.001.00 Mean Q 4 4 1.0 .8 .6 .4 .2 0.0 ABILITY 6.005.004.003.002.001.00 Mean Q 4 6 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 4 7 1.2 1.0 .8 .6 .4 .2 70 ABILITY 6.005.004.003.002.001.00 Mean Q 4 9 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 5 0 .9 .8 .7 .6 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 5 2 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 5 3 1.1 1.0 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 5 4 1.0 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 5 6 1.1 1.0 .9 .8 71 ABILITY 6.005.004.003.002.001.00 Mean Q 5 7 1.2 1.0 .8 .6 .4 .2 0.0 ABILITY 6.005.004.003.002.001.00 Mean Q 5 9 1.0 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 6 2 1.1 1.0 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 6 3 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 6 4 1.0 .9 .8 .7 .6 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 6 7 1.0 .9 .8 .7 .6 .5 .4 .3 72 ABILITY 6.005.004.003.002.001.00 Mean Q 7 1 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 7 3 1.1 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Mean Q 7 4 1.0 .9 .8 .7 .6 .5 .4 .3 .2 ABILITY 6.005.004.003.002.001.00 Mean Q 7 5 1.1 1.0 .9 .8 .7 ABILITY 6.005.004.003.002.001.00 Mean Q 7 8 .9 .8 .7 .6 .5 .4 .3 .2 .1 ABILITY 6.005.004.003.002.001.00 Mean Q 8 1 1.0 .9 .8 .7 .6 .5 73 ABILITY 6.005.004.003.002.001.00 Mean Q 8 4 1.0 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 8 5 1.0 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 8 8 1.1 1.0 .9 .8 .7 .6 .5 .4 .3 ABILITY 6.005.004.003.002.001.00 Mean Q 8 9 1.1 1.0 .9 .8 .7 .6 .5 ABILITY 6.005.004.003.002.001.00 Mean Q 9 0 .9 .8 .7 .6 .5 .4 .3 .2 .1 ABILITY 6.005.004.003.002.001.00 Mean Q 9 3 1.1 1.0 .9 .8 .7 .6 .5 74 ABILITY 6.005.004.003.002.001.00 Mean Q 9 5 1.1 1.0 .9 .8 .7 .6 ABILITY 6.005.004.003.002.001.00 Me an Q 9 8 1.0 .9 .8 .7 .6 .5 .4 ABILITY 6.005.004.003.002.001.00 Mean Q 9 6 1.2 1.0 .8 .6 .4 .2 0.0 ABILITY 6.005.004.003.002.001.00 Mean Q 9 9 1.0 .9 .8 .7 .6 .5 75 76 Appendix G Scale Correlations for Key Terms Correlations HITTOTAL Pearson Correlation .135 Sig. (2-tailed) .123 Q2 N 133 Pearson Correlation .233(**) Sig. (2-tailed) .007 Q3 N 133 Pearson Correlation .168 Sig. (2-tailed) .053 Q4 N 133 Pearson Correlation .154 Sig. (2-tailed) .077 Q6 N 133 Pearson Correlation .039 Sig. (2-tailed) .655 Q7 N 133 Pearson Correlation .213(*) Sig. (2-tailed) .014 Q9 N 133 Pearson Correlation .110 Sig. (2-tailed) .207 Q10 N 133 Pearson Correlation .141 Sig. (2-tailed) .106 Q11 N 133 Pearson Correlation .199(*) Sig. (2-tailed) .022 Q12 N 133 Pearson Correlation .326(**) Sig. (2-tailed) .000 Q15 N 133 Pearson Correlation .309(**) Sig. (2-tailed) .000 Q17 N 133 Q18 Pearson .104 Correlation Sig. (2-tailed) .234 N 133 Pearson Correlation .230(**) Sig. (2-tailed) .008 Q19 N 133 Pearson Correlation .291(**) Sig. (2-tailed) .001 Q21 N 133 Pearson Correlation .202(*) Sig. (2-tailed) .020 Q22 N 133 Pearson Correlation .053 Sig. (2-tailed) .546 Q23 N 133 Pearson Correlation .208(*) Sig. (2-tailed) .016 Q25 N 133 Pearson Correlation .184(*) Sig. (2-tailed) .034 Q30 N 133 Pearson Correlation .109 Sig. (2-tailed) .210 Q31 N 133 Pearson Correlation .186(*) Sig. (2-tailed) .032 Q33 N 133 Pearson Correlation .365(**) Sig. (2-tailed) .000 Q34 N 133 Pearson Correlation .301(**) Sig. (2-tailed) .000 Q38 N 133 Pearson Correlation .192(*) Sig. (2-tailed) .027 Q39 N 133 77 Pearson Correlation .386(**) Sig. (2-tailed) .000 Q41 N 133 Pearson Correlation .005 Sig. (2-tailed) .953 Q45 N 133 Pearson Correlation -.102 Sig. (2-tailed) .243 Q48 N 133 Pearson Correlation .120 Sig. (2-tailed) .167 Q51 N 133 Pearson Correlation .342(**) Sig. (2-tailed) .000 Q55 N 133 Pearson Correlation .332(**) Sig. (2-tailed) .000 Q58 N 133 Pearson Correlation .027 Sig. (2-tailed) .756 Q60 N 133 Pearson Correlation .279(**) Sig. (2-tailed) .001 Q61 N 133 Pearson Correlation .222(*) Sig. (2-tailed) .010 Q65 N 133 Pearson Correlation .233(**) Sig. (2-tailed) .007 Q66 N 133 Pearson Correlation .339(**) Sig. (2-tailed) .000 Q68 N 133 Pearson Correlation .062 Sig. (2-tailed) .482 Q69 N 133 Pearson Correlation .277(**) Sig. (2-tailed) .001 Q70 N 133 Pearson Correlation .331(**) Sig. (2-tailed) .000 Q72 N 133 Pearson Correlation .160 Sig. (2-tailed) .065 Q76 N 133 Pearson Correlation .139 Sig. (2-tailed) .111 Q77 N 133 Pearson Correlation .245(**) Sig. (2-tailed) .005 Q79 N 133 Pearson Correlation .205(*) Sig. (2-tailed) .018 Q80 N 133 Pearson Correlation .185(*) Sig. (2-tailed) .033 Q82 N 133 Pearson Correlation .219(*) Sig. (2-tailed) .011 Q83 N 133 Pearson Correlation .187(*) Sig. (2-tailed) .031 Q86 N 133 Pearson Correlation .194(*) Sig. (2-tailed) .025 Q87 N 133 Pearson Correlation .198(*) Sig. (2-tailed) .022 Q91 N 133 Pearson Correlation .279(**) Sig. (2-tailed) .001 Q92 N 133 Pearson Correlation .323(**) Sig. (2-tailed) .000 Q94 N 133 Pearson Correlation .170 Sig. (2-tailed) .051 Q97 N 133 78 Pearson Correlation .173(*) Sig. (2-tailed) .046 Q100 N 133 ** Correlation is significant at the 0.01 level (2- tailed). * Correlation is significant at the 0.05 level (2- tailed). 79 Appendix H Scale Correlations for Foils Correlations FOILTOTA Pearson Correlation .356(**) Sig. (2-tailed) .000 Q1 N 133 Pearson Correlation .404(**) Sig. (2-tailed) .000 Q5 N 133 Pearson Correlation .298(**) Sig. (2-tailed) .000 Q8 N 133 Pearson Correlation .274(**) Sig. (2-tailed) .001 Q13 N 133 Pearson Correlation .227(**) Sig. (2-tailed) .009 Q14 N 133 Pearson Correlation .182(*) Sig. (2-tailed) .036 Q16 N 133 Pearson Correlation .212(*) Sig. (2-tailed) .014 Q20 N 133 Pearson Correlation .256(**) Sig. (2-tailed) .003 Q24 N 133 Pearson Correlation .402(**) Sig. (2-tailed) .000 Q26 N 133 Pearson Correlation .376(**) Sig. (2-tailed) .000 Q27 N 133 Pearson Correlation .392(**) Sig. (2-tailed) .000 Q28 N 133 Pearson Correlation .493(**) Sig. (2-tailed) .000 Q29 N 133 Pearson Correlation .351(**) Sig. (2-tailed) .000 Q32 N 133 Pearson Correlation .371(**) Sig. (2-tailed) .000 Q35 N 133 Pearson Correlation .276(**) Sig. (2-tailed) .001 Q36 N 133 Pearson Correlation .380(**) Sig. (2-tailed) .000 Q37 N 133 Pearson Correlation .392(**) Sig. (2-tailed) .000 Q40 N 133 Pearson Correlation .378(**) Sig. (2-tailed) .000 Q42 N 133 Pearson Correlation .422(**) Sig. (2-tailed) .000 Q43 N 133 Pearson Correlation .479(**) Sig. (2-tailed) .000 Q44 N 133 Pearson Correlation .321(**) Q46 Sig. (2-tailed) .000 80 N 133 Pearson Correlation .471(**) Sig. (2-tailed) .000 Q47 N 133 Pearson Correlation .236(**) Sig. (2-tailed) .006 Q49 N 133 Pearson Correlation .383(**) Sig. (2-tailed) .000 Q50 N 133 Pearson Correlation .349(**) Sig. (2-tailed) .000 Q52 N 133 Pearson Correlation .445(**) Sig. (2-tailed) .000 Q53 N 133 Pearson Correlation .434(**) Sig. (2-tailed) .000 Q54 N 133 Pearson Correlation .270(**) Sig. (2-tailed) .002 Q56 N 133 Pearson Correlation .525(**) Sig. (2-tailed) .000 Q57 N 133 Pearson Correlation .407(**) Sig. (2-tailed) .000 Q59 N 133 Pearson Correlation .489(**) Sig. (2-tailed) .000 Q62 N 133 Pearson Correlation .254(**) Sig. (2-tailed) .003 Q63 N 133 Pearson Correlation .464(**) Sig. (2-tailed) .000 Q64 N 133 Pearson Correlation .503(**) Q67 Sig. (2-tailed) .000 N 133 Pearson Correlation .397(**) Sig. (2-tailed) .000 Q71 N 133 Pearson Correlation .376(**) Sig. (2-tailed) .000 Q73 N 133 Pearson Correlation .467(**) Sig. (2-tailed) .000 Q74 N 133 Pearson Correlation .346(**) Sig. (2-tailed) .000 Q75 N 133 Pearson Correlation .426(**) Sig. (2-tailed) .000 Q78 N 133 Pearson Correlation .334(**) Sig. (2-tailed) .000 Q81 N 133 Pearson Correlation .355(**) Sig. (2-tailed) .000 Q84 N 133 Pearson Correlation .371(**) Sig. (2-tailed) .000 Q85 N 133 Pearson Correlation .532(**) Sig. (2-tailed) .000 Q88 N 133 Pearson Correlation .397(**) Sig. (2-tailed) .000 Q89 N 133 Pearson Correlation .227(**) Sig. (2-tailed) .009 Q90 N 133 Pearson Correlation .420(**) Sig. (2-tailed) .000 Q93 N 133 Pearson Correlation .419(**) Q95 Sig. (2-tailed) .000 81 N 133 Pearson Correlation .517(**) Sig. (2-tailed) .000 Q96 N 133 Pearson Correlation .465(**) Sig. (2-tailed) .000 Q98 N 133 Pearson Correlation .249(**) Sig. (2-tailed) .004 Q99 N 133 ** Correlation is significant at the 0.01 level (2- tailed). * Correlation is significant at the 0.05 level (2- tailed). 82 Appendix I Descriptive Statistics N Minimum Maximum Mean Std. Deviation Q2 133 .00 1.00 .6165 .48807 Q3 133 .00 1.00 .7519 .43355 Q4 133 .00 1.00 .4135 .49433 Q6 133 .00 1.00 .8346 .37296 Q9 133 .00 1.00 .9925 .08671 Q10 133 .00 1.00 .9248 .26469 Q11 133 .00 1.00 .0677 .25213 Q12 133 .00 1.00 .9023 .29809 Q15 133 .00 1.00 .9023 .29809 Q17 133 .00 1.00 .9173 .27648 Q18 133 .00 1.00 .9774 .14905 Q19 133 .00 1.00 .8872 .31752 Q21 133 .00 1.00 .3910 .48981 Q22 133 .00 1.00 .9474 .22414 Q23 133 .00 1.00 .6165 .48807 Q25 133 .00 1.00 .9624 .19093 Q30 133 .00 1.00 .7519 .43355 Q31 133 .00 1.00 .9624 .19093 Q33 133 .00 1.00 .9624 .19093 Q34 133 .00 1.00 .7218 .44980 Q38 133 .00 1.00 .6692 .47229 Q39 133 .00 1.00 .8496 .35879 Q41 133 .00 1.00 .6241 .48620 Q45 133 .00 1.00 .9774 .14905 Q48 133 .00 1.00 .9925 .08671 Q51 133 .00 1.00 .9624 .19093 Q55 133 .00 1.00 .9323 .25213 Q58 133 .00 1.00 .6165 .48807 Q60 133 .00 1.00 .9850 .12216 Q61 133 .00 1.00 .3609 .48208 Q65 133 .00 1.00 .8722 .33515 Q66 133 .00 1.00 .9173 .27648 Q68 133 .00 1.00 .7669 .42439 Q69 133 .00 1.00 .9774 .14905 Q70 133 .00 1.00 .4662 .50074 Q72 133 .00 1.00 .5789 .49559 Q76 133 .00 1.00 .8647 .34338 Q77 133 .00 1.00 .5940 .49294 Q79 133 .00 1.00 .9774 .14905 Q80 133 .00 1.00 .2105 .40922 Q82 133 .00 1.00 .9023 .29809 83 Q83 133 .00 1.00 .6917 .46352 Q86 133 .00 1.00 .9173 .27648 Q87 133 .00 1.00 .8271 .37962 Q91 133 .00 1.00 .9699 .17144 Q92 133 .00 1.00 .7068 .45697 Q94 133 .00 1.00 .3158 .46659 Q97 133 .00 1.00 .7895 .40922 Q100 133 .00 1.00 .9699 .17144 Q1 133 .00 1.00 .6316 .48420 Q5 133 .00 1.00 .8647 .34338 Q7 133 .00 1.00 .6992 .46032 Q8 133 .00 1.00 .6090 .48981 Q13 133 .00 1.00 .6842 .46659 Q14 133 .00 1.00 .5489 .49949 Q16 133 .00 1.00 .6541 .47745 Q20 133 .00 1.00 .9248 .26469 Q24 133 .00 1.00 .9323 .25213 Q26 133 .00 1.00 .5789 .49559 Q27 133 .00 1.00 .7218 .44980 Q28 133 .00 1.00 .7970 .40376 Q29 133 .00 1.00 .4662 .50074 Q32 133 .00 1.00 .5940 .49294 Q35 133 .00 1.00 .8571 .35125 Q36 133 .00 1.00 .7895 .40922 Q37 133 .00 1.00 .6466 .47983 Q40 133 .00 1.00 .7368 .44201 Q42 133 .00 1.00 .7970 .40376 Q43 133 .00 1.00 .3008 .46032 Q44 133 .00 1.00 .6391 .48208 Q46 133 .00 1.00 .6842 .46659 Q47 133 .00 1.00 .5714 .49674 Q49 133 .00 1.00 .8120 .39217 Q50 133 .00 1.00 .6617 .47494 Q52 133 .00 1.00 .8947 .30805 Q53 133 .00 1.00 .7293 .44599 Q54 133 .00 1.00 .7744 .41953 Q56 133 .00 1.00 .9098 .28759 Q57 133 .00 1.00 .6391 .48208 Q59 133 .00 1.00 .6917 .46352 Q62 133 .00 1.00 .7820 .41448 Q63 133 .00 1.00 .8647 .34338 Q64 133 .00 1.00 .6842 .46659 Q67 133 .00 1.00 .6992 .46032 Q71 133 .00 1.00 .8797 .32654 Q73 133 .00 1.00 .8722 .33515 Q74 133 .00 1.00 .6090 .48981 84 Q75 133 .00 1.00 .9098 .28759 Q78 133 .00 1.00 .4887 .50176 Q81 133 .00 1.00 .7594 .42906 Q84 133 .00 1.00 .8120 .39217 Q85 133 .00 1.00 .7368 .44201 Q88 133 .00 1.00 .8722 .33515 Q89 133 .00 1.00 .8797 .32654 Q90 133 .00 1.00 .4962 .50188 Q93 133 .00 1.00 .9023 .29809 Q95 133 .00 1.00 .8722 .33515 Q96 133 .00 1.00 .5714 .49674 Q98 133 .00 1.00 .7143 .45346 Q99 133 .00 1.00 .7218 .44980 Valid N (listwise) 133