Long-term Effectiveness of the Reading Recovery Early Reading Intervention Program in a Rural School District by Adam Warren Vaughan A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama December 12, 2011 Reading, Reading Recovery, Reading Programs, Interventions, End-of-Grade Tests Copyright 2011 by Adam Warren Vaughan Approved by Joseph A. Buckhalt, Chair, Professor of Special Education, Rehabilitation and Counseling Craig Darch, Professor of Special Education, Rehabilitation and Counseling Bruce A. Murray, Associate Professor of Curriculum and Teaching ii Abstract There are many programs that specialize in teaching students the necessary strategies for reading. But which ones will have the greatest impact and provide lasting skills to struggling students? The purpose of this study was to assess the effectiveness of the Reading Recovery early intervention program on the lowest performing first grade students in a rural North Carolina school district. This was accomplished by assessing their pre- and post performance in the program using the Reading Recovery assessment An Observation Survey of Early Literacy Achievement, specifically the Text Reading Level subtest (Clay, 2002) and tracking their subsequent progress via the NC EOG (End-of-Grade) test of reading. Students who participated in the program increased from a mean Text Reading level of 3 at the start of the program to a mean Text Reading level of 14 at the time of program completion. Long-term effectiveness of the program was less encouraging. A little less than half (44%) of the participants continued to maintain performance at the level of proficiency on the EOG in third grade and subsequent grades showed lower percentages. Once the participants had completed the Reading Recovery program in first grade they were performing on average with their peers. However, in third and subsequent grades the students were not performing equivalent to the average performance of the school district. iii Acknowledgments The author would like to extend his gratitude to Dr. Joseph Buckhalt for his instruction through the years and his assistance and continued support during this extended dissertation process. He has been a pillar for the School Psychology program at Auburn University. The author would also like to thank the Granville County school system and especially Dr. Gerri Mart?n, Assistant Superintendent for Curriculum and Instruction in Granville County schools, for her willingness to allow this important topic to be studied in her school system. I would especially like to thank Alan Lydiard and the technology staff in Granville County for their efforts in working to provide the appropriate data for this study. And finally, a debt of gratitude is owed to the author?s wife and children who have been with him since the beginning and in every sense of the word have ?attended? graduate school together with him. iv Table of Contents Abstract ......................................................................................................................................... ii Acknowledgments ....................................................................................................................... iii List of Tables ................................................................................................................................. v List of Figures ............................................................................................................................. vii List of Abbreviations .................................................................................................................... ix Introduction ................................................................................................................................. 1 Reading Instruction .......................................................................................................... 3 Reading Recovery .......................................................................................................... 11 Assessments ................................................................................................................... 15 Literature Review ....................................................................................................................... 19 Hypothesis and Research Question ............................................................................................ 33 Methodology .............................................................................................................................. 34 Results ........................................................................................................................................ 41 Conclusion .................................................................................................................................. 70 References ................................................................................................................................. 78 Appendix A: Data from Figure 4 presented in two separate scatterplots .................................. 85 Appendix B: Scatterplots for each of the year cohorts and their respective grades .................. 87 v List of Tables Table 1. Five cohort years spanning the 2002-03 to 2008-09 school years and years in which they participated in RR and the subsequent years/grades for which EOG test data are available. ????????????????????.???????..41 Table 2. EOG Scale Score ranges shifted for Achievement Levels prior to 2007-08 and then after the shift starting with the 2006-07 school year. Data retrieved from the NC Dept. of Public Instruction website (NCDPI-DAS/NCTP, 2007). ?????.???43 Table 3. Entry, Exit, & Year-End Mean Text Reading Levels Divided into Year Cohorts. ........46 Table 4. Third (3rd) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. ????????????????????...49 Table 5. Fourth (4th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. ????????????????????..50 Table 6. Fifth (5th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. ????????????????????..50 Table 7. Sixth (6th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. ????????????????????..50 Table 8. Seventh (7th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. ?????????????????...51 Table 9. Third-grade Through Seventh Grade EOG Results for RR Students (All Cohorts Combined): Percentage of Students Scoring At or Above Grade Level (Levels III and IV). ????????????????????????????.52 Table 10. 2002-03 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). ??????53 Table 11. 2003-04 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). ??????53 Table 12. 2004-05 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). ??????54 vi Table 13. 2005-06 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). ??????54 Table 14. 2006-07 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). ??????55 Table 15. Third-Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). ???????????????????????.56 vii List of Figures Figure 1. Mean Scores on the Text Reading Level Subtest of the Observation Survey at Entry into the RR program, upon Exit and then at Year-End. ...................................... 45 Figure 2. Third Grade EOG results: Percentage of RR students at or above grade level (Levels III and IV)................................................................................................................. 56 Figure 3. Third Grade EOG results district wide: Percentage of all students scoring proficient (Levels III and IV)............................................................................................... 57 Figure 4. Scatterplot of Text Reading Level and 3rd Grade EOG Scores for All Year Cohorts. 59 Figure 5. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? All Year Cohorts. .. 60 Figure 6. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? All Year Cohorts. .. 61 Figure 7. Scatterplot of Text Reading Level and 6th Grade EOG Scores ? All Year Cohorts. .. 62 Figure 8. Scatterplot of Text Reading Level and 7th Grade EOG Scores ? All Year Cohorts. .. 63 Figure 9. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2002-03 Year Cohorts. ............................................................................................................... 65 Figure A1. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? Pre- Re-Norm Cohorts. .............................................................................................................. 81 Figure A2. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? Post- Re-Norm Cohorts. .............................................................................................................. 82 Figure B1. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2002-03 Year Cohort. ................................................................................................................ 83 Figure B2. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2002-03 Year Cohorts. .............................................................................................................. 84 Figure B3. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? 2002-03 Year Cohorts. .............................................................................................................. 85 viii Figure B4. Scatterplot of Text Reading Level and 6th Grade EOG Scores ? 2002-03 Year Cohorts. .............................................................................................................. 86 Figure B5. Scatterplot of Text Reading Level and 7th Grade EOG Scores ? 2002-03 Year Cohorts. .............................................................................................................. 87 Figure B6. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2003-04 Year Cohorts. .............................................................................................................. 88 Figure B7. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2003-04 Year Cohorts. .............................................................................................................. 89 Figure B8. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? 2003-04 Year Cohorts. .............................................................................................................. 90 Figure B9. Scatterplot of Text Reading Level and 6th Grade EOG Scores ? 2003-04 Year Cohorts. .............................................................................................................. 91 Figure B10. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2004-05 Year Cohorts. .............................................................................................................. 92 Figure B11. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2004-05 Year Cohorts. .............................................................................................................. 93 Figure B12. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? 2004-05 Year Cohorts. .............................................................................................................. 94 Figure B13. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2005-06 Year Cohorts. .............................................................................................................. 95 Figure B14. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2005-06 Year Cohorts. .............................................................................................................. 96 Figure B15. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2005-06 Year Cohorts. .............................................................................................................. 97 ix List of Abbreviations RR Reading Recovery Program TRL Text Reading Level EOG End-of-Grade Test CBM Curriculum Based Measure DRA Developmental Reading Assessment DIBELS Dynamic Indicators of Basic Early Literacy Skills RTI Response To Intervention or Instruction 1 Introduction Learning to read is a process that all children must go through in order to acquire the skills necessary not only to be successful in school, but also in life. Learning to read contributes to our children?s overall well being. When a child fails to learn to read ?hope for a fulfilling productive life diminishes? (Lyon, 2001, p. 14). Reading is not, however, as natural as learning to speak. Research over 25 years has not supported the idea that reading development is a natural process. In other words, natural exposure to literature does not naturally foster good readers the same way that natural exposure to speech and the spoken word (Lyon, 2001). In the end, it is apparent that we must explicitly teach our children to read. In the course of teaching students to read many do not automatically develop the strategies necessary to become good readers. These reading strategies must be taught, often in addition to what is taught in the regular curriculum. Interventions must teach what Pinnell (1989) describes as the goal of helping children develop ?in the head? processes that most students who are good readers develop and use naturally. At any given time there are numerous reading instruction programs running concurrently in the schools. Each has its own approach to remediating those students who are not developing the strategies that will help them to become good readers. There are numerous programs that have been initiated with great fanfare over the years, often in response to a particular trend in education. They are sometimes maintained because the materials are paid for and the teachers have been hired or, as is more common, the programs fade into oblivion with much less fanfare than they were introduced (Slavin, 1989). Reading programs are especially plentiful and varied. 2 Teachers attend workshops and kits are purchased and in the end there are more than a few ways to address the problem of learning to read?many of which are effective. The difficulty arises in selecting a program which will best meet the needs of the broadest group of students struggling with reading. If their difficulty lies in letter-sound knowledge, fluency or comprehension or some combination of all three, which program, whether in or outside of the classroom, will most effectively address these problems, providing the most gain in the shortest amount of time? As educational budgets become tighter and programs are eliminated, it is increasingly more important to determine which programs can be used to address the broadest array of areas of difficulty and do so in the most effective and efficient way. What is required is that each program be evaluated to determine what skills are being taught and how effectively students are learning them. What effect do these programs have on student reading ability when compared to students that remain in the regular classroom receiving instruction from their teacher? Do more intensive programs, in which the student is removed from the classroom, help those students who are the worst off? The purpose of the study is to assess the effectiveness over time of one reading intervention program, Reading Recovery, on the lowest performing first grade students in a rural school district in North Carolina. Reading Recovery is an intensive one-to-one tutoring program for beginning first-graders, which focuses on reading and writing instruction that addresses the problems of struggling readers early on before these students become acutely poor readers. Lessons are taught by highly trained teachers over 15 to 20 weeks, with the goal of bringing low performing readers up to grade level (Pinnell, 1989). Specifically, this study will assess the students? overall reading achievement levels upon successfully completing the program and then in later grades to determine the sustained effects of this program as compared to the on-grade 3 level performance of their peers in the classroom. If Reading Recovery is effective in increasing low performing readers? reading levels and overall reading achievement, then one would also expect there to be an increase in classroom grades, benchmark, and standardized tests scores. Further, a majority of RR students would not need to be referred to special education for reading difficulties when compared to other low performing students in the classroom who were not enrolled in the RR program. This study will track student progress via the Text Reading Level subtest on the Reading Recovery assessment An Observation Survey of Early Literacy Achievement, (Clay, 2002) and gather extant data collected by the school district in the form of standardized tests such as the EOG (End-of-Grade) test of reading in third-grade and each subsequent grade afterwards through seventh grade. The Reading Recovery program is a popular and widely used intervention program (Pinnell,1989, Pinnell, 1988, Slavin, 1987, Slavin & Madden ,1987) and yet it has its critics (Center, Wheldall, & Freeman, 1992, Grosson & Coulter, 2002). This research is an attempt to observe the effect of the Reading Recovery program on low performing students using the program?s own measures and data collected by the school district in subsequent grades. Specifically, the purpose is to look at the program?s effect on student reading achievement over time as compared to peers? on-grade level performance. Reading Instruction Reading is no more a natural process than riding a bike. We are not born to ride a machine, nor are we born to utilize a language which is both spoken and written with difficult rules and spelling or that associates specific sounds with arbitrary squiggly lines (phonetic as opposed to pictographic). Reading must be learned and therefore it must be taught. There are any 4 number of ways to teach reading (AFT, 1999) and for most students they often learn to read despite what is taught. However, when a student struggles, when they do not ?get? what is being taught or the way it is being taught, that is when instruction must be intensified and expertise increased. Whether it is a teacher, a parent, or school psychologist?the goal is to find out where the student is struggling. It is not enough to say that ?the student has a problem reading? because they have dyslexia or because they have a learning disability (Joseph, 2008). Rather, the goal is to find out what specifically the student is missing. Are they missing time spent on a particular task/skill? Are they missing a fundamentally key component that must be taught again or for the first time? Or do they specifically have difficulty with a particular concept, such as phonemic awareness, in which case a new and different way must be taught that will allow them to understand what many students were able to learn without explicit instruction. We do not all learn to swim the same way. Some are thrown in and learn to get by, enough to stay afloat. Others are taught from day one the proper form and stroke and they learn later on to truly swim when their muscles develop. Still others never learn because of fears of the water. No child learns to read in the exact same way. But many learn to get by and keep afloat? often with great effort. The point is that there is no single solution to how to teach all of our children how to read. There is no one way to move them from learning-to-read to reading-to- learn (Joseph, 2008), but there are many programs that can be adapted and modified to meet the needs of many students. Often our teachers follow a curriculum and infuse their own philosophies and ideals about literature and many of their students learn to read?at least as well as their peers. For those who struggle, good teachers often have a trick or two up their sleeve or an able-bodied assistant 5 who can spend some extra time helping them in weak areas. If the student continues to struggle, the resources of the school are taken into consideration. In many cases, there might be tutoring available, a small group with a reading teacher, or a referral may be made to the school based team that comes together to address student needs when a student is struggling. The question is how to determine what component the student is exactly struggling with and how best to address that need. Historically there have been several approaches to explaining and classifying student difficulties and subsequent instructional approaches to address them. In the area of reading difficulties McEneaney (2006) highlights three approaches through the years, including the most recent, RTI (Response to Intervention or Response to Instruction) which is continuing to evolve. The first approach early on, grew out of studies involving students with specific brain injuries resulting in reading difficulties. It was assumed that with developmental problems there was an underlying dysfunction with the brain. Out of this mode of thought, grew the process deficit models, that relied on the idea of information-processing; reading was described as ?the flow of information across cognitive processing systems? related to reading (e.g. visual perception, word recognition, and phonemic analysis) (McEneaney, 2006, p. 361). The idea of process deficit models was that this allowed educators and researchers to distinguish between various types of disabilities and thereby design instruction based on the type of disability. The research however, did not support this. Attempts to divide students into disabled readers and everyday poor readers on the basis of processing deficits were unsuccessful (Lovell, Gray, & Oliver, 1964) according to McEneaney (2006). In fact, McEneaney reports that despite three decades of research the process deficit model is a case of ?beautiful theory and ugly facts? (McEneaney, 2006, 362). There is not enough evidence to support the idea that process 6 deficits explain difficulties in reading and thereby provide instructional guidance for students with reading difficulties. The discrepancy model came after the process deficit model, as a more statistically based approach. This approach views reading achievement as distributed normally (bell shaped) along a continuum. The majority of students fall in the middle and smaller numbers of readers are on the ends (tails) in the classic bell shape (Pearson 1999, Snow et al. 1998). In an attempt to improve upon the identification of students who are Learning Disabled, the Department of Education issued a regulation stating that students who are learning disabled would be those that have a severe discrepancy between their achievement and performance (IQ) scores (U.S. Office of Ed., 1977). However, due to the many IQ and achievement tests available and the interpretations that states have given this regulation (difference between IQ and achievement, regression of IQ on achievement, amount of discrepancy?e.g. 1 SD vs. 2 SDs) has led to varying rates of identification between states (Dunn, 2007, Scruggs & Mastropieri, 2002). Again the research fails to support this model. There is little evidence that there is a difference between even children who have severe reading difficulties. In other words, that they also represent readers in the lower range of the normal distribution and are indistinguishable from students identified as having a learning disability (Stanovich, 1991). Additional studies also have not shown significant differences between those readers diagnosed as low achieving and learning disabled (Fletcher et al., 1994, Vellutino, Scanlon, & Lyon, 2000). Concerns, or at least notable limitations, of the discrepancy approach were present even when it was first widely adopted in the Education of All Handicapped Children Act of 1975 and continue to be written about to the present (McEneaney, 2006, Stanovich, 2008). 7 The discrepancy model with its accompanying assessment and classification is flawed because it does not provide a clear, consistently reliable method of identifying LD students. In order to address this issue, the President?s Commission on Special Education was convened (U.S. Department of Education, Office of Special Education and Rehabilitation Services [OSERS], 2002). The OSERS?s focus was on the problem with IQ tests as assessment measures for special education eligibility and the practice of wait-to-fail. Studies found no difference in reading skills of students with reading disabilities with high IQs and low IQs (Tal & Siegal, 1996). The IQ tests did not help predict which students would gain from remediation (Kershner, 1990). This is further compounded by the Matthew Effect, which theorizes that reading difficulties may influence the development of language, knowledge, and vocabulary skills and therefore affecting performance on traditional IQ tests (Stanovich, 1988). The practice of waiting until third grade to see if a student is grasping/learning-retaining content material is making it more difficult for students in later grades (Dunn 2007, Lyon, Fletcher, Shaywitz, Torgesen, Wood et al., 2001). RTI (Response To Intervention, also sometimes referred to as Response To Instruction) is offered as an alternative to the discrepancy model in order to identify students as having a learning disability. In the 2004 update to the law, the Individuals With Disabilities Education Improvement Act of 2004; IDEIA, introduced RTI as an alternative approach, but retained the process deficit model as its framework and discrepancy as an indicator of reading disabilities. Within the RTI model, if a student is learning disabled then they are considered dually discrepant, in other words, they are not only low achievers, but are also making little-to-no progress within the three-tiered intervention program. The three tier programs typically consist of tier one, which is basically regular education and is targeted at all students and is based on the 8 school?s core reading curriculum. At this first tier it is presumed that the instruction is research- based and student progress is monitored via benchmarks or at minimum in the beginning, middle and end of the school year on all students. A student who is dually discrepant (low achieving and making little progress) is considered at risk and moved to tier two. Tier 2 consists of data collected on interventions conducted individually or in small groups?essentially more intense instruction in the form of targeted programs to address specific needs. Progress monitoring is more frequent in Tier 2 than Tier 1. If progress is made then he/she returns to the regular education setting and is no longer considered at risk. If no progress is made?continues to fail to make adequate progress, then he/she most likely has a true disability (intrinsic, not lack of instruction) and thus needs to be moved to the third tier. The third tier represents special education and additional evaluations are conducted to determine disability identification and placement (Fuchs, Mock, Morgan, & Young 2003, Dunn 2007, Joseph 2008). Even within the RTI model there are distinctions between various approaches. RTI can broadly be defined as ?any set of activities designed to evaluate the affect of instruction, or intervention, on student achievement? (Christ, Burns, & Ysseldyke, 2005, p. 2). Even this fairly succinct definition is evolving and has been divvied up by those that take slightly different approaches. In their article on conceptual confusion, Christ et al. (2005) expound on several vernacular distinctions first noted by Fuchs et al. (2003), which represent varying philosophical approaches to RTI. According to the authors there are early interventionists advocating for a more standardized and validated treatment approach (Standard Protocol, RTI-SP) and there are those with a more behavioral approach who equate RTI with a more problem-solving approach (Problem Solving, RTI-PS). As the authors state, the confusion begins with the verbiage. The reality is that both fit within a problem-solving model, as do most RTI approaches. RTI, although 9 not entirely synonymous with problem-solving, both do represent ?processes which may converge? (Christ, Burns, & Ysseldyke, 2005, p. 1). The distinction between the two models lies in what occurs prior to the selection of the intervention and the subsequent steps. The RTI-SP approach relies more on standard protocols or procedures ?empirically supported instructional approaches? that attempt to remediate and prevent problems with little analysis of the problem skill area (Christ, Burns, & Ysseldyke, 2005, p. 2). In this respect, RR is similar to this approach. Each RR lesson is standardized and includes a prescribed set of parts that are included in each session. Its goal also, is to remediate and prevent a student?s reading difficulties from becoming a more severe problem. The RTI-PS is more flexible and focuses on individualizing the intervention after analyzing the instructional environment and the target skill area. This process is systemized and its goal is to isolate the ?skill deficits and shape targeted interventions? (Christ, Burns, & Ysseldyke, 2005, p. 2). RR also works in this manner by ?roaming around the known? essentially getting to know where the student is in their reading development first, then modifying the lessons within the prescribed parameters to focus on areas of need focusing on a particular weakness such as phonemic awareness (Clay, 2005a, p. 33). Given that research indicates 80% of student with a learning disability are learning disabled in the area of reading (Roush, 1995), a reading intervention program like Reading Recovery, fits within the RTI model and format. RTI and RR both entail one teacher and one student working together on activities for part of the school day over a specified period of time during which progress is monitored generating chartable data?the goal being to improve the student?s academic performance enough to be able to return to the regular education classroom (Dunn, 2007). RR offers schools a research based intervention that may already be in use. 10 RR assesses student book levels (A, B, 1-30), when students do not progress sufficiently through the individually tailored daily lessons over the 20 weeks?it is determined that the student has impaired reading skills and further special education services are needed (Dunn, 2007, p. 34). RR is also able to point out areas of strength and continued weakness based on Observation Survey assessments. Research conducted by Dunn (2007) was a retrospective study involving 155 students from third to fifth grade who had previously been enrolled in RR in first grade. Students were identified as having a reading disability or not and reading achievement data was analyzed compared to their RR scores from first grade (beginning text level, ending text level, and number of weeks? participation in RR) and free/reduced lunch status. The study looked at whether a connection existed between the scores obtained in the RR program and student?s being identified as having a reading disability later on in third through fifth grade. Results indicated that ending text level was the most predictive?meaning a higher ending text level score, the less likely a student was to be later identified as having a learning disability. The inverse would also be true; students whose ending text level was low were more likely to be identified as having a reading disability. However, ending text level only explained 7 to 17% of the variance, therefore ending text level failed to explain the concept of having a reading disability in its entirety, but indicated the need for further evaluation (Dunn 2007). The RR program is a good match for the goals of RTI?s level 2: ?prevent reading difficulty by delivering an intensive, and presumably effective, intervention that improves reading development? and ?to assess the level of responsiveness to an instructional intensity from which most students? performance should improve? (Dunn 2007, p. 43). Although additional assessments would be needed, low RR ending Text Reading levels appear to be an 11 indicator of a reading disability. In other words, despite a scientifically research-based intervention the student continues to demonstrate a need in the area of reading and therefore should be evaluated to move on to tier three (special education). This also highlights the usefulness for schools to incorporate RR ending Text Reading levels in their procedures for identification of learning disabilities. This would allow students to be referred at the end of first grade instead of waiting for further difficulties in later grades. Reading Recovery Reading Recovery (RR) was developed by Dr. Marie M. Clay and is an early reading intervention for struggling readers. Targeted towards first-grade students who have the lowest reading achievement as they enter first grade, RR provides intensive one-to-one tutoring by specially trained RR teachers for 30 minutes each day for 12-20 weeks. RR focuses on providing students with reading and writing skills in order to bring them up to the average level of their peers. RR has also begun implementing a Spanish language version with initial reading instruction being in Spanish; Descubriendo la Lectura (DLL) (Reading Recovery, 2008). Teachers who are involved in RR are specially trained and must participate in university level training for a year and then continue their training periodically with additional supports and in-service training. This is an important factor to the success of RR. Each school-based teacher is trained and supported, thereby providing each school with well-trained reading specialists for their schools. Reading Recovery is currently being implemented each year in over 10,000 U.S. schools (National Data Evaluation Center?NDEC, 2008). Studies that have replicated the effects of RR are consistent with RR data collected and reported on each year and are published in reports 12 found on the RR data collection website (NDEC, 2008). The majority of students who successfully complete or are ?successfully exited? from the program are able to reach levels of achievement of their average peers and several studies have shown that these effects are sustained over time (Center, Wheldall, Freeman, Outhred, & McNaught, 1995; Iversen & Tunmer, 1993; Pinnell, 1989; Pinnell, Lyons, DeFord, Bryk, & Seltzer, 1994; Quay, Steele, Johnson, & Hortman, 2001; Schwartz, 2005). There are three main components of RR that are at the core of its effectiveness: lessons, teacher training, and assessment. First, the tutoring sessions or lessons are structured in order to maintain consistency and focus, but allow for the RR teacher to adapt to the student?s individual areas of need. The RR intervention program is intended to serve the lowest 20% of readers in first grade (RR, 2008). Selection for RR is based in part on recommendations from school staff?using prior reading achievement performance, diagnostic testing (Clay Observation Survey of Early Literacy Achievement) and teacher recommendations. RR teachers then compile a list of the lowest performing 20% and begin working with a few at a time (Clay, 2005a, Clay, 2005b). The first ten one-on-one tutoring sessions act as screening and diagnostic tools. This is called roaming around the known. The following components are addressed in each lesson and form a flexible lesson framework for teachers to adapt on-the-fly. Each daily lesson begins with familiar rereading. A previously read book that the student is already familiar with is selected for the student to read. A running record analysis is conducted in which each student reads the previous day?s new book (the familiar rereading) and the teacher codes reading behaviors using a running record. Reading is done by the student independently. The next component involves writing a message. The teacher assists the student in composing and then writing a message (1-2 13 sentences), which is then written word-for-word. This step provides opportunities for the teacher to assist the student in constructing words, analyzing sounds and representing them with letters. The message is read many times thereby increasing use and knowledge of high-frequency words. This is followed by putting together a cut-up sentence. The teacher then writes the message on a strip of paper and cuts it up, asking the child to reconstruct the message. This encourages rereading and searching for visual information. Finally, a new book is read in reading a new book. A new book is selected (slightly more challenging) and the pictures are reviewed and the story introduced. Focus is on meaning, but the student may be required to locate some key words based on predicting the first letter. Then the teacher and child read the story, which then provides the basis for the familiar reading for the next day?s lesson. (Clay, 2005a; Clay, 2005b; Pinnell et al., 1988; Pinnell et al., 1990). Students progress through a maximum of 60 lessons. A student is said to have successfully completed RR (lessons are discontinued) when they are able to read on the average level for their peers in first grade (based on local averages)?occurring between 12 to 20 weeks in RR. Those students who make progress but are still not on average grade level after 20 weeks are referred for further evaluation (i.e. special education) (AFT, 1999). ?Children who seem likely to fail, despite tutoring in RR?those not progressing at the desired pace after 10 lessons? may be referred to special education and removed from the program? (AFT, 1999). The next critical component in Reading Recovery is its extensive teacher training. Each teacher involved in RR must participate in university level training for a year. This is key to the success of RR. Each school-based teacher is trained and supported by district- or site-level teacher leaders, who in turn have been trained by university trainers. Training occurs while teachers are continuing to work with children. One-way mirrors are used to observe and discuss 14 proper instruction with teachers. Teachers are taught to be sensitive to student?s reading and writing behaviors in order to be able to make moment-by-moment analyses that inform teaching decisions (AFT, 1990; Clay, 2005a; Clay, 2005b). Training continues after the initial year of training in the form of ongoing professional development called continuing contact. Observations of one another continue, as do discussions of intervention practices. This continuing contact provides opportunities to collaborate and continue to hone their skills, as well as receive support for especially difficult children and receive knowledge of new research (Smith-Burke, 1996). This extensive training requirement of the RR program is sometimes seen as a critique of RR and an undue burden placed on schools who may already be poorly funded. However, this is actually one of its strengths. Two important points regarding this training were emphasized in a recent report from the National Research Council (1998): First, the program demonstrates that, in order to approach reading instruction with deep and principled understanding of the reading process and its implications for instruction, the teachers need opportunities for sustained professional development. Second, it is nothing short of foolhardy to make enormous investments in remedial instruction and then return children to classroom instruction that will not serve to maintain the gains they made in the remedial program. (p. 258) The third component of Reading Recovery is its assessment tool, the Observation Survey of Early Literacy Achievement (Clay, 2006). The Observation Survey is used to assess the progress of students in the RR program and to determine appropriate discontinuation upon reaching the average level of reading for their peers. The Observation Survey measures decoding, letter knowledge and concepts about print. These tasks are designed to measure knowledge about 15 reading and writing in relation to literacy learning. Information from many of these subtests provides guidance in adapting instruction to the student?s strengths and needs. Assessments The RR program utilizes multiple sources of data. A running record is kept for each student, which is maintained to monitor the student?s oral reading and involves the student reading a passage of text and the teacher monitoring his/her accuracy. Progress is monitored on text reading, daily lesson records, students? writing, and change over time in reading and writing vocabulary. The RR teacher receives specific training on how to take a running record and record miscues. The Observation Survey of Early Literacy Achievement measures six literacy tasks that describe each student?s reading and writing progress: Letter Identification, Word Test (vocabulary), Concepts About Print, Writing Vocabulary, Hearing and Recording Sounds in Words (phonemic awareness, representing sounds in graphic form), and Text Reading. Letter Identification assessment is accomplished by having the student correctly identify both upper- and lower-case letters. The student responds to 54 letter forms (26 uppercase and 28 lowercase). Extra lowercase forms include alternate forms of a and g. The student is allowed to respond with either a letter sound, letter name, or a word that begins with that letter. The maximum possible score is 54, Cronbach?s alpha was .95. (Clay, 2002). The Word Test assesses vocabulary knowledge and word recognition and is based on the Ohio Word Test. The student must correctly identify 20 sight (Dolch) words from graded lists compiled from basic reading texts and are scored on accuracy (i.e. number of words read correctly). The maximum possible score is 20, Cronbach?s alpha was .92. (Clay, 2002). 16 Students are assessed on Concepts About Print in which they demonstrate knowledge of 22 printed language concepts/conventions (e.g. front of book, back of book, text direction, word concepts, etc.). This is achieved in the form of booklets that the teacher reads and the child responds to questions or requests to manipulate a book. The maximum possible score is 24, Cronbach?s alpha was .78, split-half reliability was .95. (Clay, 2002). Writing vocabulary is assessed by having the student write as many words as he/she knows. Students are given 10 minutes to write as many words as they can on a blank sheet of paper. When needed, a standard set of prompts is used to encourage additional attempts to write. This activity is scored by counting the number of correctly spelled words. Test-Retest reliability was .62 and .97. (Clay, 2002). Hearing and Recording Sounds in Words (HRSW) task is a dictation task. It assesses the student?s phonemic awareness and representing sounds in graphic form. The student must correctly write (encode) words from a dictated passage. The teacher reads one of five passages, and then asks the student to write the words as it is read again by the teacher. If a child does not know a word, they are prompted to say the word slowly, thinking about what they heard and how to write it. The score is based on the number of correctly written phonemes (smallest unit of a word). The maximum possible score is 37, Cronbach?s alpha was .96. (Clay, 2002). Text Reading levels are obtained by having students read books leveled by difficulty and text characteristics (Peterson 1991). Books are drawn from a basal reading series. Students are assessed on error rate and self-correction, while a running record is kept. Leveled passages are read until their accuracy falls below 90%. This is used to compare the percentage of students scoring at the first grade level or higher with those who score lower than first grade. This performance is converted into a numerical reading level (level 1 to 30) and corresponding grade 17 levels (Clay, 2006). The goal of RR is that the child achieve the average reading level of his/her peers, however, Schwartz (2005) outlines several sources for average reading level for first grade. For example, the Ohio Stanines for text level indicated an average of Level 2 for the fall of first grade and an average of Level 9 to 12 in the spring (Clay, 2002). However, a Level 20 was the average level indicated by the National Data Evaluation Center?s random sample (G?mez- Belleng? & Thompson, 2005). Text level decisions are based on the running record, for which Clay (2002) reports are reliable over a two year interval across two scorings with a trained recorder (r = .98). This presents varying exit criteria depending on the average reading performance of a particular school. Validity and reliability for all tasks of the Observation Survey have been documented (Clay, 2006; Denton, Ciancio, & Fletcher, 2006) and the Observation Survey highly correlates with the Iowa Test of Basic Skills (Rodgers, G?mez-Belleng?, Wang, & Schultz, 2005) and (Tang & G?mez-Belleng?, 2007). National norms have been developed to assist in interpreting scores (G?mez-Belleng? & Thompson, 2005). In this study, in addition to the Observation Survey that is a part of the Reading Recovery program, data from the NC End-of-Grade (EOG) test in reading, was included in the analyses. The school district routinely administers the EOG annually for both Reading and Math. The EOG reading served as an objective measure of the students? reading achievement. The EOG measures reading achievement and comprehension via a series of reading passages followed by multiple choice questions on the content of the passages. The EOG is designed to measure reading and comprehension that align with the NC Standard Course of Study. In this school district, teachers use the End-of-Grade assessment (EOG) to evaluate and monitor student 18 reading progress each year (NCDPI). Each student is given the EOG at the end of each school year, beginning in third grade, in order to determine their level of reading achievement, as well as measure progress made throughout the school year. The EOG produces two main scores and a percentile. The first is the Developmental Scale Score and the second is the Achievement Levels. The scale score is derived from the raw score which is composed of the number of questions the student answered correctly. Scale score ranges vary from grade to grade and depending on the year of the EOG, can vary from year to year (e.g. starting with 2007-08 school year, third grade Level I - <330, Level II ? 331-337, Level III ? 338-349, and Level IV - >350). The scale score represents growth in reading achievement from year to year which allows parents and the school system to measure each child?s growth in reading. The second score produced by the EOG is an Achievement Level. The Achievement Levels are a method of dividing the range of scale scores on the EOG into four levels (i.e. levels I, II, III, and IV). The levels are predetermined performance standards used to compare the performance of the students to grade-level expectations (NCDPI-DAS/NCTP, 2007). 19 Literature Review Since its inception Reading Recovery has been the focus of many studies, some better than others. In an attempt to put rhyme and reason into the selection of studies for which to review for this research, two common criteria were utilized. The first set of criteria were those suggested by the United States Department of Education?s Quality of Research Decision Tree published in a 2002 report. The report suggested states and local schools should view evidence for reading programs based on the following criteria: the theoretical base, evidence of effects, and evidence of replicability. These criterion were applied by the Reading Recovery Council of North America, North American Trainers Group Research Committee?RRCNA (2004). In an article authored by the research committee, a group of studies involving RR that followed the criteria set forth by the DOE was compiled and evaluated for effectiveness (RR Council, 2002). The DOE published a Quality of Research Decision Tree, highlighted in the research committee?s article, in which they outline criteria that should be applied when reviewing research studies. The RRCNA compiled a list of five research studies that conformed to these criteria (Center, Wheldall, Freeman, Outhred & McNaught, 1995; Iversen & Tunmer, 1993; Pinnell, 1989; Pinnel, Lyons, DeFord, Bryk & Seltzer, 1993/94; Quay, Steele, Johnson, & Hortman, 2001). The second criteria used in the selection of research articles, was the What Works Clearinghouse?s review of Reading Recovery in which they reviewed 78 studies. The What Works Clearinghouse is a branch of the United States Department of Education (DOE) and the 20 Institute of Education Sciences (IES). The What Works Clearinghouse released a 3-year independent review of the experimental research on Reading Recovery in March 2007. These studies were selected based on 4 factors: quality of research design, statistical significance of results, size of difference between participants in the intervention and comparison group, and consistency of findings across studies (WWC, 2007). The result of the WWC applying their criteria to approximately 78 studies was 5 studies that met WWC?s stringent evidence-based standards criteria. Four of the studies met the full WWC evidence standards (Baenen, Berhole, Dulaney, & Banks 1997, Pinnell, DeFord, & Lyons 1988, Pinnell, Lyons, DeFord, Bryk & Seltzer 1993/94, and Schwartz 2005) and one study (Iverson & Tunmer 1993) met WWC evidence standards with reservations. When the RRCNA and WWC lists were compared 2 studies (Iverson & Tunmer 1993, and Pinnell, Lyons, DeFord, Bryk & Seltzer 1993/94) were selected by both groups for review. Three studies were unique to WWC group, and four were unique to the RRCNA group. In the following section, the two studies selected by both groups will be reviewed along with comments from each of the reviewers (WWC and RRCNA). Then the remaining five studies will be reviewed with commentary from their respective groups. The Rhode Island school district study conducted by Iverson and Tunmer (1993) examined the progress of three matched groups of first-graders at-risk for reading difficulties. The study was reviewed by both RRCNA and WWC. The study used a quasi-random assignment to three groups: RR, a modified RR group, and a standard intervention group (small group Title I). There were a total of 96 students, 64 from 34 classrooms and 23 schools in the two RR groups and 32 from 7 schools for the standard small group intervention. The students were administered a battery of tests in the beginning and end of the school year, and once at the midpoint when RR 21 students were being discontinued. In addition, average students from the same classrooms were tested at the discontinue (midpoint) point. The outcomes of students who received RR (n = 32) were compared with non-Reading Recovery students who received the ?standard small group, out-of-class support services? (n = 32). Students were matched based on pretest scores. The third group (n = 32) received a modified version of RR which added explicit instruction in letter-phoneme patterns in lieu of RR?s letter identification segment. This modification highlighted for students the fact that words with common sounds share patterns of spelling. This group was not included in the WWC review since it represented a modified version of RR, however, research which features modifications to RR are often enlightening as to which components are essential to RR?s success. In this particular case this modification made little difference to RR?s effectiveness. Researchers for this study did include a former RR teacher; however this person was not presently involved in ongoing professional development with RR. The second investigator was a university researcher conducting independent, critical evaluation of RR. The affiliations of each researcher are important under these circumstances given the potential bias of a researcher who is closely involved with RR prior to beginning research of the program. Iverson and Tunmer (1993) took advantage of the instruments for gathering data that RR utilizes, namely the six tasks from the Observation Survey. In addition, they used measures of phoneme segmentation, phoneme deletion, and phonological recoding. The study found all three groups to be equally low on the pretest measures. Once they had successfully discontinued the program, both RR groups, the standard RR and the modified RR group, scored significantly higher on the outcome measures. On the Text Reading Level subtest from the Observation Survey (one of the assessments which had no ceiling) the 22 differences were even larger (over eight standard deviations on Text Reading Level and over two standard deviations on the Dolch Word Recognition Task). At this stage, students in both RR groups had profiles similar to average students in the classroom. The review article by the RRCNA also noted that the two RR groups scored similarly? even on tasks that were included in the study to measure the effectiveness of the modified RR intervention?the phonemic awareness measures. The modified RR group did not actually do as well as the regular RR group on phoneme deletion measure. The article further stated that the true measure of benefit from the modified RR group came from the students? being able to successfully discontinue in fewer lessons (41.75 = RR modified, 57.31 = Standard RR). RR?s standard program has since been modified in a similar manner as the modified RR group, not as a result of this particular study, but rather as a response to changes in the field of reading. At the end of the school year the standard and modified groups were similar, the modified group being only slightly higher on the test reading measure (19.56 = RR and 18.38 = modified RR) (Iverson & Tunmer 1993). In the end, this study compared two versions of RR and small group Title I instruction. The students in the two RR one-to-one tutoring groups showed an advantage from having participated in the groups, including earlier discontinuation for the modified RR group. Both sets of procedures in the RR groups fostered phonemic awareness learning and applying that knowledge to text reading and writing. This next study, Pinnell et al. (1993/94), was also selected for review by both the WWC and RRCNA. Dr. Pinnell, who is a leading researcher on RR along with her colleagues, designed a randomized controlled study involving 324 students in 33 schools. The study involved four groups: RR students (individual tutoring), Reading Recovery-like intervention (individual 23 tutoring with teacher trained in an alternative and shortened setting), RR-like small group, and basic skills small group, plus a randomized comparison group at each school. Students were assessed at the beginning of the year, midyear, end of year, and at the beginning of the following year (Pinnell et al., 1993/94). Pinnell?s study was funded by a grant from the John D. and Catherine T. MacArthur Foundation. In addition, the research was supervised by a national advisory board which was actively involved in every phase of the research (RRCNA, 2004). This provides good evidence that any potential conflict of interest or past experience/opinion of Pinnell?s having researched RR would be checked by this advisory board. Pinnell et al. (1993/94) measured intervention effectiveness with the Gates-MacGinitie Reading Test, the Woodcock Reading Mastery Test, and Text Reading Level and Hearing and Recording Sounds in Words (RR Observation Survey). They found statistically significant positive effects of RR for the RR group (individual tutoring with trained teachers) on the Gates- MacGinitie, the Dictation subtest of the Observation survey (Hearing and Recording Sounds in Words), Text Reading Level, and the Woodcock Reading Mastery Test ? Revised. The following fall, significant mean effects were found on Text Reading with smaller effects on Hearing and Recording Sounds in Words. The RRCNA reviewers explained this as possible ceiling effects of the measures (RRCNA, 2004). This study was well designed, was conducted over a year?s time with random assignment to treatment and comparison groups with large numbers of students. The end result was that ?Reading Recovery emerged as the most powerful of the interventions tested from the beginning of Year 1 through the beginning of Year 2 of the study? (RRCNA, 2004). The studies reviewed by both the WWC and the RRCNA, (Iverson & Tunmer, 1993; 24 Pinnell et al., 1993/94) all used random assignment (although Iverson?s and Tunmer?s design was quasi-random). This is very difficult to achieve in the context of an educational setting such as the schools. All three showed positive effects of RR and used a variety of assessments to obtain these results. The next three studies were ones that met the WWC?s stringent evidence standards criteria, but were not included in the RRCNA?s review. These three were among the final five selected from the original 78 reviewed by the WWC. The first study (Pinnell, DeFord, & Lyons, 1988) was a randomized controlled study involving 187 first-graders distributed across 14 urban Columbus, Ohio schools. The students were randomly assigned either to a group that received regular classroom instruction and RR (n = 38) or a control group (n = 53) which received an alternate compensatory program. The study also included a third group which received RR, but was also taught in their regular classroom instruction by an RR teacher (n = 96). Although this was not included in the WWC ratings, it is notable as another attempt by researchers to constantly find ways to improve the delivery and content of RR. Researchers utilized a writing assessment, the Reading Vocabulary subtest of the Comprehensive Test of Basic Skills (CTBS), and the Reading Comprehension subtest of the CTBS, in addition to five subtests of the Observation Survey (Letter Identification, Word Recognition, Concepts about Print, Writing Vocabulary, and Dictation). The study was reviewed by the WWC based on how student outcomes were addressed in four domains: alphabetics, reading fluency, comprehension, and general reading achievement. The Pinnell et al. (1988) study looked at the effects of RR on the following constructs within the alphabetics domain: print awareness and phonics. The researchers found statistically significant 25 positive effects, but not on the Letter Identification subtest of the Observation Survey, also in the alphabetics domain. In the comprehension domain, researchers found a positive statistically significant effect of RR on the Reading Comprehension subtest of the Comprehensive Test of Basic Skills (CTBS) and on the vocabulary construct on the Reading Vocabulary subtest of the CBTS. Fluency was not addressed, however, on the general reading achievement domain. Pinnell (1988) found positive and statistically significant effects of RR on Hearing and Recording Sounds in Words (Dictation) and Writing Vocabulary, two subtests of the Observation Survey. The second study by Baenen, Bernhole, Dulaney, & Banks ( 1997) was another randomized controlled study. This one was conducted in Wake County, North Carolina and involved a total of 772 first-grade students. The study spanned from 1990 to 1994 and included four cohorts. Students who qualified were randomly assigned to either a RR group or comparison group. Students were evaluated at the end of first, second, and third grade. Only one cohort (1990-91) met criteria for WWC. This was due to the comparison group being made up of students who were no longer similar to RR with regard to achievement level. After attrition, the final sample included 147 first-grade students (72 = RR and 75 = non RR group) in the 1990-91 cohort. All 147 of the students were followed into second grade and 127 were included in the third-grade analysis. All students in the study conducted by Baenen et al. (1997) were assessed for eligibility using three subtests from the Observation Survey?Text Reading Level (running record), Dictation, and Writing Vocabulary. At the end of first and second grade, grade retention was also measured and at the end of third grade the North Carolina End-of-Grade test in reading was used. The study also measured referrals to special education and Title I services, and a gauge of 26 teacher perception of student achievement. Results for the single cohort (1990-91) reviewed by the WWC fell under the general reading achievement domain, however no statistically significant effects were found on grade retention. The Schwartz (2005) study was reviewed by the WWC and involved a randomized controlled study. The study included students from 14 states, 37 of whom were randomly assigned to RR the first-half of the year and were compared to the other group of 37 who were randomly assigned to receive RR the second-half of the year. The groups were then compared at mid-year before the second group had begun RR. Data was also collected on low average and high average students from the same classrooms as the at-risk students for comparison. The WWC excluded the low-average and high-average comparison groups as it only focused on those students eligible for RR in its review. However these groups provide a reference since the goal of RR is to assist participants in attaining average levels of performance (RR, 2008). Students in the Schwartz (2005) study were assessed at the beginning of the year, at the transition from the first round of RR to the second, and at the end of the year. Measures used included six tasks from the Observation Survey and then at the transition period and end of the year students were assessed using the Yopp-Singer Phonemic Segmentation task (a sound deletion task), the Degrees of Reading Power test, and the Slosson Oral Reading test. The Schwartz (2005) study found that the at-risk RR students performed significantly better at the end of their intervention period than students assigned to receive services later in the year. Large effect sizes were found especially during the transition period between first- and second-round intervention services, for Text Reading Level, the Ohio Word test, Concepts About Print, Writing Vocabulary, Hearing and Recording Sounds in Word and the Slosson Oral Reading Test-Revised. Also, upon comparing the RR students with the low-average and high- 27 average groups at the transition period, RR students had closed the achievement gap with the average group which is the goal of RR (Schwartz, 2005). In reviewing the early reading interventions, as noted previously, the WWC developed a beginning reading protocol which looks at four domains for addressing student outcomes: alphabetics, reading fluency, comprehension, and general reading achievement. Each study reviewed by the WWC covers these domains and most constructs contained therein. WWC?s review found that in the alphabetics domain, two studies met evidence standards and showed statistically significant positive effects (Pinnell, DeFord, & Lyons, 1988 and Schwartz 2005). An additional study met WWC evidence standards with reservations and reported significantly positive effects (Iverson & Tunmer, 1993). One study in the fluency domain was found to have demonstrated statistically significant positive effects (Schwartz 2005), as was one in the domain of comprehension and another study in the domain of comprehension resulted in an indeterminate effect ((Pinnell, DeFord, & Lyons, 1988; Schwartz, 2005; WWC, 2007). Three studies in the general reading achievement domain had strong designs and statistically significant positive effects (Pinnell, DeFord, & Lyons, 1988; Pinnell et al., 1994; Schwartz, 2005). An additional study had indeterminate effects (Baenen et al., 1997) and another met WWC evidence standards with reservations (Iverson & Tunmer, 1993), while demonstrating statistically significant positive effects (WWC, 2007). The RR group that reviewed the same RR studies as the WWC (Iverson & Tunmer, 1993; Pinnell et al., 1993/94) also reviewed three additional RR studies based on criteria from the Department of Education?s quality of research decision tree (Center et al., 1995; Pinnell, 1989; Quay et al., 2001). The first study reviewed by the RR Committee (Center et al., 1995) included a random 28 assignment of subjects into either a RR group (n = 31) or a no-intervention control group (n = 39) over 10 schools. Researchers also followed a comparison group of low-progress students (n = 39) from five matched schools which did not have RR. Students were assessed pre- and after 15 weeks (post-test), after another 15 weeks (short-term maintenance), and again 12 months after post-test (medium-term maintenance). Researchers assessed the students using Clay?s book level test (Observation survey), Burt Word Reading Test, Neale Analysis of Reading Ability, Passage Reading Test, Waddington Diagnostic Spelling Test, Phonemic Awareness Test, Cloze Test, and the Word Attack Skills Test. After 15 weeks (post-test), researchers reported that RR students ?significantly outperformed? students in the control group on ?tests measuring words read in context and in isolation, but not on some tests of metalinguistic skills? (Center et al., 1995, p. 252). At the short-term maintenance stage (end of first grade) researchers reported that RR students continued to perform better than control group students on tests measuring word reading in context and phonemic awareness tasks, but did not perform significantly better on phonological recoding or syntactic awareness. According to the authors, however, these areas were not specifically addressed by the RR program. One year after RR intervention (medium- term maintenance) RR students were still performing higher than both the comparison group and the control group. In this study we find additional support for the effectiveness of RR from researchers not connected with or affiliated with RR, whose research is a high-quality, independent evaluation of RR, showing significant, long-lasting effects (Center et. al., 1995). In the next study Pinnell (1989) included six urban schools with a high number of low- income students. Essentially the study consisted of a group of RR students (n = 55) who were the 29 lowest in the RR designated classrooms and were taught by a RR-trained teacher and a comparison group (n = 55) who were the lowest students in the comparison classrooms and were taught by a non-RR trained teacher. Assessment measures for the study included Text Reading Level, Observation Survey and the Stanford Achievement Test. Students were assessed in October, mid-year, end of year, and end of year following treatment. Analyses were conducted on four groups: RR students in program classroom, RR students in regular classroom, comparison students, random sample students (May assessments only). RR students from regular classrooms did better (p < .05) than comparison students on 7 out of the 9 assessments. Two assessments (Letter-Identification and the Word Test) resulted in ceiling effects. RR students in program classrooms did better (p < .05) than students in the comparison group on all assessments. RR students, both program and regular classroom, did equally well?in other words, whether they were instructed by a RR teacher in the classroom or not. Follow-up results a year later (end of year following treatment) showed RR students scoring significantly higher (p < .05) on all assessments than the comparison students. These results were promising and were conducted by a research team implementing RR in the U. S. in its first year. Researchers from the Center for the Study of Reading, University of Illinois independently audited the results reporting to the Ohio Department of Education (Pinnell, 1989). Although this research was conducted some time ago, during RR?s earliest introduction into the U.S. these results are similar to previously mentioned studies that were conducted more recently and with only relatively minor modifications to RR through the years. The final study reviewed by the RRCNA (Quay et al., 2001) included a quasi-random assignment procedure in which a group of students in 34 schools were assigned to two groups: RR and a control group. Each school contained a classroom that was randomly designated the 30 classroom from which RR students would come and another classroom which was the control classroom. Both groups were given the Observation Survey and ITBS (Iowa Test of Basic Skills) in the fall and spring. The Gates-MacGinitie Reading Test and Classroom Teacher Assessment of Student Progress were administered in spring as well (Quay et al., 2001). Analysis was conducted to show that the two groups had few differences on the ITBS (Iowa Test of Basic Skills) scales in the fall?this confirmed the groups? initial equivalence on reading achievement (Quay et al., 2001). RR students were in regular classroom with exception of 30 minutes for RR. The control group, not served in RR, was given access to other programs available at the school. Within the control group 66% participated in daily literacy groups conducted by the RR teachers which were offered at the school. Researchers in this study relied on data from the RR Observation Survey, Iowa Test of Basic Skills (ITBS), Gates-MacGinitie Reading Test, Classroom Teacher Assessment of Student Progress, as well as retention rates. Researchers noted that the Classroom Teacher Assessment of Student Progress is ?an instrument developed and used extensively in large-scale evaluations and demonstrating high test-retest reliability? (Quay et al., 2001, p.14). At the end of the school year, RR students in the study significantly outperformed the control group students on four out of the six subtests on the ITBS, all subtests of the Gates-MacGinitie, all tasks on the RR Observation Survey and all nine measures of the Classroom Teacher Assessment of Student Progress. A significantly higher percentage of RR students were also promoted at the end of first grade than control group students. Despite the relative shortcomings of the Quay study (lack of truly random assignment, disruptions during the study) results not only showed that the RR children performed significantly better on standard measures over the control groups, but the researchers also made efforts to assure equivalence of the groups prior to beginning the intervention and in 31 their analyses measured retention rates, which ultimately equal economic savings for schools. In the end, the RRCNA committee, after applying the criteria set forth by the Department of Education (2002), concluded that all five studies showed positive effects for RR, were in-line with evaluation data gathered and reported each year by the Reading Recovery National Data Evaluation Center (NDEC) and were published in peer-reviewed journals. It should also be noted that two of the studies were conducted by researchers who have been critical of Reading Recovery and three by researchers associated with Reading Recovery. WWC has reviewed approximately 20 interventions to date using the following categories to rate an intervention?s effectiveness in a particular outcome domain: positive, potentially positive, mixed, no discernable effects, potentially negative, or negative. The WWC based these ratings on 4 factors: quality of research design, statistical significance of results, size of difference between participants in intervention and comparison group, and consistency of findings across studies. Only one intervention, Reading Recovery, had qualifying research evidence in all four domains (alphabetics, reading fluency, comprehension, and general reading achievement). Reading Recovery received the highest ratings of any of the 20 programs with two positive effects (+) ratings for alphabetics and general reading achievement and two potentially positive effects (+?) ratings for reading fluency and comprehension (WWC, 2007). This authoritative, independent assessment presented ample evidence that suggests Reading Recovery is an effective intervention based on the current scientific evidence. This study?s purpose was to compare data collected by RR and the school district over time on students who have been successfully discontinued from the RR program in order to measure the sustained effects of the RR program on students? with low reading achievement in subsequent grades. The previously reviewed research presents a strong case for the program?s 32 effectiveness, however, additional assessments beyond what is administered by the RR program are needed to validate its effectiveness in raising first grade students? reading achievement levels in first grade and to answer the question of whether those gains are maintained over time. In addition, if RR is a good fit for RTI (Lose, Schmitt, G?mez-Belleng?, Jones, Honchell, & Askew, 2007), then this study will help in determining RR?s effectiveness in raising poor readers? reading achievement levels in subsequent grades by way of an early reading intervention (i.e. Reading Recovery). This study adds to previous research in the following ways. First, there have been few RR related studies that have focused on longitudinal effects and district wide assessments (e.g. EOG)?in other words real-world, schoolwide measures of reading achievement/improvement and the question of sustained effects in later grades. Additional studies are needed to track students after participating in reading intervention programs and determine if the gains in the program are maintained over time. This would therefore result in poor readers becoming better readers and continuing to be improved readers in later grades maintaining those gains over time. Fuchs et al. (2003) laments the scarcity of research on intervention programs already in practice, including programs like Reading Recovery. Finally, this study further elucidates how RR and RTI might fit together. As Lose et al. (2007) noted, RR research provides anecdotal evidence of the need that RR fills as a scientifically research based intervention as part of the RTI process. This study shows the potential for Reading Recovery to be an effective intervention in the RTI process that could reduce future costs to school districts by reducing the need for additional services and referrals to special education in later grades. 33 Hypothesis and Research Question Hypothesis: Participation in the Reading Recovery intervention program will have a statistically significant positive effect on the reading achievement of students performing in the bottom 20% in reading in first grade, resulting in a statistically significant increase on the Text Reading Level subtest of the Reading Recovery Observation Survey and successfully exiting the Reading Recovery program. Research Question: Are the improved reading achievement effects of the Reading Recovery program maintained in third grade and subsequent grades, as measured by performance on the North Carolina End-of-Grade test in Reading when compared to district-wide average performance on the EOG? 34 Methodology The study is a retrospective longitudinal research study of the sustained effects of the Reading Recovery early intervention program. The reading achievement of RR participants who successfully discontinued the RR program their first-grade year was measured relative to their peers beginning in third grade. The purpose of the measurement was to assess the sustained effects of the RR intervention program on the students? reading performance. Former RR students who successfully completed the reading program, had their reading performance measured using their scores on the Reading portion of the state End-of-Grade test (EOG). The group consisted of five cohort years (school years 2002-03 to2006-07) each group?s data ranged from third grade up to seventh grade spanning the 2004-05 to 2008-09 school years. Former RR student EOG scores were compared to average EOG scores for the school district for each respective year which reflected the average performance of their peers. Participants The target population for the study were students who successfully exited the Reading Recovery program in first grade from the 2002-03 to 2006-07 school years. The students were either currently, or were at one time, enrolled in the rural North Carolina school district. Elementary and middle schools in the study were comprised of a diverse population from varying socioeconomic backgrounds. District data identifying gender, race, and ethnicity collected along with the state administered EOG information, was also be included. Study participants were drawn from the available pool of students who successfully 35 discontinued the RR program in first-grade in the school district and who were either still enrolled or were enrolled during the study period in a school within the school district between the 2004-05 and 2008-09 school years. Measures The dependent measure in this research study was reading achievement and was measured using two assessments: the Text Reading Level subtest of the Reading Recovery assessment the Observation Survey of Early Literacy Achievement and the North Carolina End- of-Grade test of reading administered during the 2004-05 to 2008-09 school years (Beaver, 2006; Clay, 2006; NCDPI-DAS/NCTP, 2007). During the students? participation in the RR program, they were administered the Observation Survey of Early Literacy Achievement, first developed by Clay (2002). The Observation Survey is made up of a series of six subtests that describe each student?s reading and writing progress: Letter Identification, Word Test (vocabulary), Concepts About Print, Writing Vocabulary, Hearing and Recording Sounds in Words (phonemic awareness, representing sounds in graphic form), and Text Reading. This study focused primarily on the Text Reading level subtest. On the Text Reading subtest levels are obtained by having the students read leveled books based on difficulty and characteristics of the text (Peterson, 1991). Teachers assess students on rate of errors and utilization of self-correction?all the while a running record is kept to monitor their progress. The leveled passages are read by the student until their accuracy falls below 90%. Their performance is then compared to the number of students who scored at the first grade level or higher and those who perform lower than first grade. Their accuracy score is 36 then changed into a numerical reading level (level 0, 1 to 30) and corresponding grade levels (Clay, 2006). The teachers administering the Reading Recovery program utilize the Observation Survey to assess the progress of students in the program and to determine appropriate discontinuation upon reaching the average level of reading for their peers. A running record is kept on each student?s progress on text reading, daily lesson records, students? writing, and change over time in reading and writing vocabulary. Text Reading level decisions in the RR program are based on the running record. Clay (2002) reports a reliability over a two year interval across two scorings with a trained recorder of .98. The Text Reading levels present varying exit criteria depending on the average reading performance of a particular school. Validity and reliability for all tasks of the Observation Survey have been documented (Clay, 2006; Denton, Ciancio, & Fletcher, 2006) and the Observation Survey highly correlates with the Iowa Test of Basic Skills (Rodgers, G?mez-Belleng?, Wang, & Schultz, 2005) and (Tang & G?mez-Belleng?, 2007). National norms have been developed to assist in interpreting scores (G?mez-Belleng? & Thompson, 2005). It was decided the Text Reading Level subtest would be the most indicative of actual reading ability at the time of successfully completing the RR program for the purposes of this study, based on previous studies that have successfully utilized the Text Reading level as an outcome measure. Several studies have utilized a number of subtests from Clay?s Observation Survey. The Text Reading subtest is one of the more frequently used tools to measure the effectiveness of the RR program (Baenen, Berhole, Dulaney, & Banks 1997; Iversen & Tunmer, 1993; Pinnell, 1989; Pinnel, Lyons, DeFord, Bryk & Seltzer, 1993/94; Quay, Steele, Johnson, & Hortman, 2001; Schwartz 2005). This measure represents a real world measure of reading as it 37 involves the student actually reading a leveled passage of text in which they must combine the skills they have learned in the RR program. This skill, or set of skills are what the RR program is attempting to improve?helping the students become more proficient readers. In North Carolina, beginning at the end of third-grade, students are required to take the state designed End-of-Grade test (EOG). The North Carolina End-of-Grade tests are designed to measure student performance on the goals, objectives, and grade-level competencies specified in the North Carolina Standard Course of Study (NCSCS). There are two portions: Reading and Math. The reading portion measures reading comprehension components of each grade?s curriculum and English/Language Arts North Carolina Standard Course of Study (NCSCS). The test is made-up of eight reading passages, each followed by six to nine multiple-choice questions for each passage and was designed to measure reading, thinking and comprehension skills. There are four literary selections (two fiction, one nonfiction, one poem), three informational selections (two content and one consumer), and one embedded experimental selection, that may be fiction, nonfiction, poetry, consumer, or content (Baenen, Dulaney & Banks, 1997; NCSCS; NCDPI- DAS/NCTP, 2007). The EOG test is administered to each student the last three weeks of the school year and contains 50 items (plus 8 experimental items). It is a measure of reading achievement, as well as a measure of progress made throughout the school year. The two sets of scores, the Developmental Scale Score and Achievement Levels, are derived from the raw scores which are composed of the number of questions the student answered correctly. Scale score ranges vary from grade to grade and also may vary from year to year depending on the year of the EOG administration (e.g. starting with 2007-08 school year, third grade Level I - <330, Level II ? 331- 337, Level III ? 338-349, and Level IV - >350). Achievement Levels are essentially a method of 38 dividing the range of scale scores on the EOG into four levels (i.e. levels I, II, III, and IV) (NCDPI-DAS/NCTP, 2007). Both measures, Text Reading Level and the EOG in reading, attempt to measure the reading achievement of the student. However, they differ in several important ways. The Observation Survey is individually administered to each RR student prior to, during and immediately upon completing the RR program. Therefore, the assessment is given very closely upon completing the instruction in the RR program. The EOG, on the other hand, is administered in a group format at the end of the school year following a year of classroom instruction. Much of the material on the EOG may have been taught several months earlier with only minimal review prior to the assessment. Also, at the time the End-of-Grade test in reading is administered the EOG test in math and in later grades science are also administered just days before or soon thereafter. Finally, the Observation Survey was designed to compliment and direct the RR program instruction and its purpose is to measure the progress of individual students during their participation in the RR program. Its function was to inform instruction and not necessarily measure group change, but rather give RR instructors the information necessary to adapt instruction to meet the needs of the student (Clay, 2006; Denton et al, 2006). The EOG is not directly tied to the instruction in the classroom. The EOG is an attempt to measure reading comprehension components of a particular grade?s English/Language Arts NC Standard Course of Study (NCSCS). Procedures Data collection consisted of obtaining permission from the school district to gain access to the RR data for this study but without identifying information. Archival data from RR?s National Data Evaluation Center in Columbus, Ohio was used by the school district to identify 39 Reading Recovery students who have successfully completed the RR program during the 2002- 03 to 2006-07 school years. The school district compared this list to school records to identify students who were still attending schools in the district at the end of the 2008-09 school year. The treatment group consisted of students enrolled in first grade during the 2002-03 to 2006-07 school years and who continued to attend an elementary school in the district through the 2008- 09 school years. Data Analysis Analysis of the research data involved a series of comparisons of the mean scores and performances, correlations, as well as t test for independent means and equivalence testing using one-sample t tests in order to compare the means and confidence intervals of test scores with district-wide yearly averages. These analyses will ultimately be used to compare the Reading Recovery students? performance on the Observation Survey subtest, Text Reading Levels, and the EOG score performance for subsequent grades, beginning in third grade. Data collected will be compiled into a database and statistically analyzed by the researcher using Microsoft Excel and the statistical program IBM? SPSS? Statistics (Statistical Package for the Social Sciences). Descriptive statistics (e.g. frequencies), correlations, t tests and equivalence testing were applied to the data to determine whether any observed differences in the means were statistically significant (i.e. not due to chance) between the RR students pre- and post program assessments and if student mean End-of-Grade scores in third grade and beyond are equivalent to the school district mean EOG scores. Limitations This study was limited by its scope of students. Only RR students who successfully completed the RR program were included in the dataset. This means that potentially, students 40 who began the program, but were not making adequate progress, may have been prematurely removed from the program. The assumption is of course that these students are removed from the RR program with the intent of providing a different, better fitting intervention to meet their needs. This has been a common criticism of RR and much of the research involving RR, due to the removal of students who are not successful in the program prior to analyzing the program?s effectiveness based on the success of those who complete the program. However, it should be clear that for those lower performing students who do remain in the program, the program has an impact on their reading achievement. It was the goal of this research to determine the extent of that impact. The second limitation was found in the lack of a formal comparison or control group. Because the data for this study was extant data and was not collected for the sole purpose of this research study, a control group of either low performing peers or average performing otherwise equivalent peers was not included in this research. This data is routinely collected by the school district, but other more custom data was not available. In addition to the nature of the data collection, the multiple school years and various years of matriculation in the RR program in first grade and subsequent grades among the participants would have made for an overly complex and convoluted experimental design, in addition to the difficulty of matching control groups to this study population using only the data available from the school district. As it stands the data was limited in scope between the cohorts due to attrition and lack of sufficient data on every initial RR student. Summary This research provided a measure of the effectiveness of the Reading Recovery program with low performing first grade students in a rural elementary school. More specifically, the level 41 of the program?s effectiveness and sustained effects over time through performance on third- grade and beyond End of Grade test reading achievement. In addition, the study provided a comparison between the internal measures that are utilized by the Reading Recovery program to measure progress and ultimately successful completion (discontinuation) of the program (e.g. Text Reading Level on the Observation Survey) and End of Grade test reading performance. This component of the research study helped address one of the critiques of the Reading Recovery program?that although Reading Recovery instruments show progress and an improvement in reading ability, often independent standardized measures do not show the same gains (Center, Wheldall, & Freeman, 1992). Another critique is that many of the gains that students make upon completing RR soon dissipate and are not sustained over time (Hiebert, 1994; Shanahan & Barr, 1995). This study tracked the reading achievement of students successfully exited from RR and monitored their reading achievement and subsequently measured their performance on the End of Grade test in third-grade and beyond. Results The intent of this research was to examine the effectiveness of the RR early intervention reading program?a program which is implemented only in first grade as a preventative measure for the lowest performing 20 percent of students in reading. The aim was to examine the students who had successfully completed the RR program in order to determine the short-term effectiveness of the program (e.g. the reading performance of the students upon completing the program as measured by the Observation Survey, specifically the Text Reading Level subtest) and then to track the reading achievement of the RR students in subsequent grades (third through seventh) using the standardized state assessment the NC End-of-Grade test in reading. The 42 students in this study consisted of 177 school age students from 19 schools in a rural North Carolina school district, who were enrolled in and successfully discontinued (i.e. successfully completed) the Reading Recovery program in first grade between the 2002-03 to 2006-07 school years. Subsequent North Carolina End-of-Grade (EOG) reading scores were also collected for the students during grades three through seven of the 2004-05 to 2008-09 school years. The students were studied in five cohorts pertaining to the school year in which they were enrolled in first grade between the 2002-03 and 2006-07 school years and then followed longitudinally into third grade through seventh grade (e.g. The 2002-03 cohort received Reading Recovery instruction during first grade which occurred that school year and End-of-Grade test scores were collected for subsequent grades, third grade for this cohort occurred in the 2004-05 school year, etc). Table 1 demonstrates the stratification of the cohorts visually, so that the reader might better understand the data collection schedule and how the year in which a student is enrolled in first grade and participates in the RR program affects the year in which that student is first administered the End-of-Grade test in third and subsequent grades. 43 Table 1 Five cohort years spanning the 2002-03 to 2008-09 school years and years in which they participated in RR and the subsequent years/grades for which EOG test data are available. EOG Scores Re-Normed 2002-2003 2003-2004 2004-2005 2005-2006 2006-2007 2007-2008 2008-2009 2002-03 COHORT 1 1st Grade Reading Recovery No Data (2nd Grade) 2002-03 Cohort 3rd Grade EOG Data 2002-03 Cohort 4th Grade EOG Data 2002-03 Cohort 5th Grade EOG Data 2002-03 Cohort 6th Grade EOG Data 2002-03 Cohort 7th Grade EOG Data 2003-04 COHORT 2 1st Grade Reading Recovery No Data (2nd Grade) 2003-04 Cohort 3rd Grade EOG Data 2003-04 Cohort 4th Grade EOG Data 2003-04 Cohort 5th Grade EOG Data 2003-04 Cohort 6th Grade EOG Data 2004-05 COHORT 3 1st Grade Reading Recovery No Data (2nd Grade) 2004-05 Cohort 3rd Grade EOG Data 2004-05 Cohort 4th Grade EOG Data 2004-05 Cohort 5th Grade EOG Data 2005-06 COHORT 4 1st Grade Reading Recovery No Data (2nd Grade) 2005-06 Cohort 3rd Grade EOG Data 2005-06 Cohort 4th Grade EOG Data 2006-07 COHORT 5 1st Grade Reading Recovery No Data (2nd Grade) 2006-07 Cohort 3rd Grade EOG Data EOG Scores Re-Normed The analysis sample ranged in size from 177 to 22 depending on the analysis conducted or cohort year due to missing or unavailable data and the fact that some student cohorts had only recently completed third, fourth, fifth, sixth, and seventh grade at the time of this study and subsequent grade information did not exist. The entire sample was made up of 39.5% female, 60.5% male, 46.9% African American, 31.1% Caucasian, 20.3% Hispanic, and 1.7% multi- racial. The school district?s racial composition is made up of 57.4% Caucasian, 35.2% African American, 5.0% Hispanic, and 0.9% multi-racial (NCDPI, n.d.). 44 It should be noted, that the NC State Department of Public Instruction allows districts to administer the EOG to students more than once in the same grade based on their initial performance, in an effort to improve a student?s test performance. For example, if a student scores a level II, then the school might decide to remediate those students and have them retake the EOG a few days later. This often results in a higher EOG score than their original performance, although a few student scores decreased slightly. In this sample of students, this was not always the case in every school (some students who scored a level II were not retested), but many of the students who initially scored a level II on the EOG were retested resulting in two, sometimes three separate EOG scores for a particular year/grade. In this study it was important to capture the students? best performance once they had exited the RR program, therefore it was decided to use their highest score on the EOG for each grade whether that was their last score for a given year after re-testing or in rare cases an earlier score for that grade that they performed better on. Before moving on to the results, it should also be noted that this particular set of data encompasses an anomaly in the End-of-Grade test scores and Achievement Levels. The NC Department of Public Instruction rescaled the EOG scoring prior to the 2007-08 school year (NCDPI-DAS/NCTP, 2007). Score ranges were rescaled for the four Achievement Levels (e.g. I, II, III, & IV). The result was that the calculation of the number of correct items render a different score changing the score range for a III in third grade, for example, from a 240-249 to 338-349 (see Table 2). Table 2 presents the range of scores and their corresponding Achievement Levels and grades prior to the score change (prior to 2007-08) and then after the score changes (starting in the 2007-08 school year). This is important to note since, depending on the grade a student is in and the year in which they were in that grade, their developmental scale score on the EOG 45 may be a Level II or a Level III. This not only makes comparison of the EOG scores from year- to-year difficult, but also makes analyses across year cohorts more complex than simply comparing scores. Table 2 EOG Scale Score ranges shifted for Achievement Levels prior to 2007-08 and then after the shift starting with the 2006-07 school year. Data retrieved from the NC Dept. of Public Instruction website (NCDPI-DAS/NCTP, 2007). Subject/Grade Level I Level II Level III Level IV Reading Prior to 2007-08 School Year 3 < 229 230-239 240-249 > 250 4 < 235 236-243 244-254 > 255 5 < 238 239-246 247-258 > 259 6 < 241 242-251 252-263 > 264 7 < 242 243-251 252-263 > 264 8 < 243 244-253 254-265 > 266 Reading 3 < 330 331-337 338-349 > 350 Starting with 2007-08 School Year 4 < 334 335-342 343-353 > 354 5 < 340 341-348 349-360 > 361 6 < 344 345-350 351-261 > 362 7 < 347 348-355 356-362 > 363 8 < 349 350-357 358-369 > 370 Program Effectiveness The first research question is whether low performing students improved in their overall reading achievement after participating in the Reading Recovery program. To answer this 46 question we first looked at the assessment tool that is used in the Reading Recovery program. The Observation Survey of Early Literature Achievement is used in the RR program to measure student progress while in the program. This study specifically focused on the Text Reading level subtest which is one of several indicators used by researchers to measure the efficacy of the RR program. Scores on the Text Reading level subtest range from 0, 1 to 30. Mean Text Reading levels for the group of RR students in this study are presented in Figure 1. Figure 1. Mean Scores on the Text Reading Level Subtest of the Observation Survey at Entry into the RR program, upon Exit and then at Year-End. The mean Text Reading level for the students as a whole in this study at the time of entry into the RR program was a level 3 (M = 3.58, SD = 2.78). Individual student scores at the time of entry varied from 0 to 14 with most (81.1%) receiving a level of 5 or lower. Students? Text Mean Scores on Text Reading Level subtest of the Observation Survey 3.58 14.78 17.73 .00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00 Entry Text Reading Level Exit Text Reading Level Year-End Text Reading Level 47 Reading levels were monitored throughout their participation in the program. Upon successfully exiting the RR program the group mean Text Reading level had jumped to a level 14 (M = 14.78, SD = 3.88). This is an increase of 11 levels in the 20 weeks or less that the participants were in the RR program. This consisted of 16.0%, 28.2%, and 24.9% of the students ending on levels 10, 16, and 18 respectively. The students were evaluated again at the end of the school after exiting the program. The mean Text Reading level for the group was a level 17 (M = 17.73, SD = 3.46), an increase of on average three additional levels from the time of exiting the program until the end of the year and an average increase of 14 levels since they first entered the RR program. At year?s end 75.9% of the students had scores at level 16 or level 18. This finding demonstrates that in the short-term the RR students appeared to continue improving their reading achievement by applying the skills they learned in the RR program. When we look at the individual year cohorts similar results emerged. Table 3 summarizes the results for the individual cohorts spanning the 2002-03 school year through the 2006-07 school year. 48 Table 3 Entry, Exit, & Year-End Mean Text Reading Levels Divided into Year Cohorts. 2002-03 Cohort Mean Std. Deviation Minimum Maximum N Entry Text Reading Level 3.00 2.612 0 7 35 Exit Text Reading Level 13.77 3.647 8 24 35 Year-End Text Reading Level 17.37 2.157 14 24 35 2003-04 Cohort Mean Std. Deviation Minimum Maximum N Entry Text Reading Level 4.03 3.640 0 12 29 Exit Text Reading Level 14.76 4.050 8 24 29 Year-End Text Reading Level 17.52 2.487 12 24 29 2004-05 Cohort Mean Std. Deviation Minimum Maximum N Entry Text Reading Level 3.19 3.167 0 14 32 Exit Text Reading Level 14.75 4.355 8 26 32 Year-End Text Reading Level 17.84 4.978 9 30 32 2005-06 Cohort Mean Std. Deviation Minimum Maximum N Entry Text Reading Level 3.78 1.874 0 7 49 Exit Text Reading Level 15.53 3.641 9 24 49 Year-End Text Reading Level 18.29 3.596 14 30 48 2006-07 Cohort Mean Std. Deviation Minimum Maximum N Entry Text Reading Level 3.90 2.820 0 12 30 Exit Text Reading Level 14.77 3.775 9 22 30 Year-End Text Reading Level 17.33 3.417 8 30 30 Each cohort increased on average 10 to 14 Text Reading levels from their initial reading levels while in the Reading Recovery program. This is in addition to improvement on other areas of the Observation Survey not examined here. This finding demonstrates consistent performance across cohort years, indicating that once implemented in this school district, the program?s effectiveness on this subtest remained the same from year-to-year at least over the years of this study. These first set of results address one of the research questions for this study. The hypothesis states that participation in the RR program will have a statistically significant positive effect on the reading achievement of the lower performing students who participated in the RR program, which will result in a statistically significant increase in their Text Reading level. The analyses of these data resulted in a statistically significant effect for the RR program, t(174) = 49 54.21, p < .05, with the students significantly improving their Text Reading Level after successfully completing the early interventions program. Long-term Effectiveness The next research question is whether the improved reading achievement effects of the RR program are maintained in subsequent grades, as measured by students? performance on a district wide standardized assessment, in this case the North Carolina End-of-Grade test in Reading, when compared to district-wide average performance on the EOG. The initial analysis begins with the end of third grade which is the first time the NC EOG test is administered. Had these former RR students maintained the gains that they had made in the RR program? In order to answer this research question the analysis was focused on the former RR students? mean Developmental Scale Scores on the End-of-Grade test in reading. This was compared to the mean EOG score performance of the school district at the time of the EOG?s administration. The mean EOG score, confidence interval, and standard deviation were calculated for each of the five cohorts. The analyses are divided into individual cohorts since each cohort had a unique year in which they were enrolled in third grade (and subsequent grades) and therefore were compared to the specific third grade (and successive grades) mean EOG score for the school district for that particular year. Table 4 focuses on the third grade mean EOG scores for each cohort which is the first EOG administration since the RR program. This table also includes the 95% confidence intervals and standard deviations. Equivalence testing was performed for the third grade EOG with confidence intervals of ? 5 points as the zone of clinical indifference (a predefined range of equivalence) in this case around the district mean EOG score for that grade. Essentially this is plus or minus the mean school district EOG score as the interval around the standard against 50 which we compare the confidence intervals (CI) of each of the RR cohorts. In this situation, when comparing a sample with a ?standard comparator? it is essential to show that the sample is sufficiently similar to the standard to be ?clinically indistinguishable? (Cleophas, Zwinderman, & Cleophas, 2006, p.63). If 95% CIs of the sample fall completely within the zone of indifference it can be concluded that equivalence is demonstrated and therefore the RR students have maintained their reading achievement levels from first grade when they were achieving average to their peers upon exiting the RR program. Confidence intervals completely outside the zone of indifference are considered not equivalent to the standard. If confidence intervals cross into the zone of indifference (i.e. part in, part out), equivalence cannot be determined. When the results are reviewed in Table 4, it is clear that the 95% confidence intervals from each of the cohorts in third grade do not fall completely within the zone of indifference for each of the district mean EOG scores, the predefined standard. For example, for the 2002-03 Year Cohort the zone of clinical indifference was established as 242.3 and 252.3. The 95% CI [238.76, 242.79] for the 2002-03 cohort does not fall completely within the zone of indifference and therefore equivalence cannot be established (although there is some overlap). 51 Table 4 Third (3rd) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. Cohort Grade Year Mean EOG SD Confidence Interval Zone of Indifference N 2002-03 3rd 2004-05 240.78 5.96 238.76 242.79 242.30 252.30 36 2003-04 3rd 2005-06 242.17 4.69 240.42 243.92 242.70 252.70 30 2004-05 3rd 2006-07 238.97 7.40 236.30 241.64 242.20 252.20 32 2005-06 3rd 2007-08 331.16 9.53 328.43 333.90 332.80 342.80 49 2006-07 3rd 2008-09 326.76 7.95 323.74 329.78 332.00 342.00 29 Note: EOG = End-of-Grade; SD = Standard Deviation. Moving forward into the fourth grades of each of the cohorts a similar pattern emerges of close approximation to the district mean, but only small amounts of overlap, if at all, with the ? 5 points zone of clinical indifference. Table 5 Fourth (4th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. Cohort Grade Year Mean EOG SD Confidence Interval Zone of Indifference N 2002-03 4th 2005-06 243.43 6.65 241.14 245.71 246.10 256.10 35 2003-04 4th 2006-07 243.89 7.05 241.16 246.63 247.10 257.10 28 2004-05 4th 2007-08 337.09 8.23 334.13 340.06 338.80 348.80 32 2005-06 4th 2008-09 337.72 8.10 335.09 340.34 338.70 348.70 39 2006-07 - - - - - - - - - Note: EOG = End-of-Grade; SD = Standard Deviation. aNo 4th grade data were obtained for the 2006-07 Cohort as they had not matriculated in this grade at the time of this study. 52 Table 6 Fifth (5th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. Cohort Grade Year Mean EOG SD Confidence Interval Zone of Indifference N 2002-03 5th 2006-07 247.32 7.56 244.69 249.96 251.30 261.20 34 2003-04 5th 2007-08 341.25 6.88 338.58 343.92 343.50 353.50 28 2004-05 5th 2008-09 343.68 7.64 340.29 347.07 344.00 354.00 22 2005-06 - - - - - - - - - 2006-07 - - - - - - - - - Note: EOG = End-of-Grade; SD = Standard Deviation. aNo 5th grade data were obtained for the 2005-06 and 2006-07 Cohorts as they had not matriculated in this grade at the time of this study. Table 7 Sixth (6th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. Cohort Grade Year Mean EOG SD Confidence Interval Zone of Indifference N 2002-03 6th 2007-08 342.00 7.79 339.32 344.68 346.50 356.50 35 2003-04 6th 2008-09 345.08 6.71 342.37 347.79 346.10 356.10 26 2004-05 - - - - - - - 2005-06 - - - - - - - 2006-07 - - - - - - - Note: EOG = End-of-Grade; SD = Standard Deviation. aNo 6th grade data were obtained for the 2004-05, 2005-06, and 2006-07 Cohorts as they had not matriculated in this grade at the time of this study. 53 Table 8 Seventh (7th) Grade Mean and Confidence Intervals for Year Cohorts compared to School District Mean EOG scores. Cohort Grade Year Mean EOG SD Confidence Interval Zone of Indifference N 2002-03 7th 2008-09 349.56 6.45 347.00 352.11 349.30 359.30 27 2003-04 - - - - - - - 2004-05 - - - - - - - 2005-06 - - - - - - - 2006-07 - - - - - - - Note: EOG = End-of-Grade; SD = Standard Deviation. aNo 7th grade data were obtained for the 2003-04, 2004-05, 2005-06 and 2006-07 Cohorts as they had not matriculated in this grade at the time of this study. Achievement Levels Comparison In light of the results of these last analyses it should be emphasized that, although equivalence was not established in any of the cohorts, the purpose of the analysis was to determine if the RR students had maintained their gains made during the RR program in first grade. The metric by which the school district and the state of North Carolina measure proficiency in students is by looking at the number of students whose scores result in Achievement levels of III or IV. On the EOG, there are four possible Achievement Levels; I, II III, and IV, each representing a range of scores depending on the year in which the EOG was administered and the grade the student is in (see previously Table 2). Students who obtain a level III or IV are considered to be performing at proficiency per the NC Dept of Public Instruction (NCDPI). Table 9 shows the percentage of RR students who achieved a level III or IV beginning in third-grade through seventh-grade. In third grade 44% of RR students scored a III or IV on the 54 EOG. In fourth grade 40% of RR students scored a III or IV. In subsequent grades the percentage dropped to 35% in fifth grade, then to 20% in sixth and 26% in seventh grade. It should also be noted that the number of participants who had completed these subsequent grades is also smaller. Table 9 Third-grade Through Seventh Grade EOG Results for RR Students (All Cohorts Combined): Percentage of Students Scoring At or Above Grade Level (Levels III and IV). Year N Level I Level II Level III Level IV Levels III & IV 3rd Grade 176 29.0 27.3 39.2 4.5 43.7% 4th Grade 134 25.4 34.3 37.3 3.0 40.3% 5th Grade 84 32.1 33.3 33.3 1.2 34.5% 6th Gradea 61 52.5 27.9 19.7 - 19.7% 7th Gradea 27 37.0 37.0 25.9 - 25.9% Note: EOG = End-of-Grade; RR = Reading Recovery aNo Level IV?s were obtained by these groups in sixth and seventh grade. The following tables show the percentage of students scoring at or above grade level in grades three through seventh grade, broken down into year cohorts spanning the 2002-03 school year to the 2006-07 school year. Table 10 shows the 2002-03 cohort year. In third grade, 67% of the RR students achieved a level III or IV. In fourth grade, the percentage dropped to 52% and then dropped significantly to only 14%, then up to 26% in seventh grade. 55 Table 10 2002-03 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). Year N Level I Level II Level III Level IV Levels III & IV 3rd Grade 36 5.6 27.8 58.3 8.3 66.6 4th Grade 35 14.3 34.3 48.6 2.9 51.5 5th Grade 34 11.8 32.4 52.9 2.9 55.8 6th Gradea 35 51.4 34.3 14.3 - 14.3 7th Gradea 27 37.0 37.0 25.9 - 25.9 Note: EOG = End-of-Grade; RR = Reading Recovery aNo Level IV?s were obtained by these groups in sixth and seventh grade. Table 11 shows the 2003-04 cohort year. In third grade, 70% of the RR students achieved a level III or IV. In fourth grade, the percentage decreased to 54% and then decreased significantly to only 14%, then rose to 27% in seventh grade. Table 11 2003-04 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). Year N Level I Level II Level III Level IV Levels III & IV 3rd Grade 30 - 30.0 63.3 6.7 70.0 4th Grade 28 17.9 28.6 50.0 3.6 53.6 5th Grade 28 53.6 32.1 14.3 - 14.3 6th Gradea 26 53.8 19.2 26.9 - 26.9 7th Gradea - - - - - - Note: EOG = End-of-Grade; RR = Reading Recovery aNo Level IV?s were obtained by these groups in sixth and seventh grade. Table 12 shows the 2004-05 cohort year. In third grade, 59% of the RR students achieved 56 a level III or IV, less than the previous two cohorts. In fourth grade, the percentage dropped to 31% and then dropped to 27% in fifth grade. Table 12 2004-05 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). Year N Level I Level II Level III Level IV Levels III & IV 3rd Grade 32 12.5 28.1 56.3 3.1 59.4 4th Gradea 32 34.4 34.4 31.3 - 31.3 5th Gradea 22 36.4 36.4 27.3 - 27.3 6th Gradeb - - - - - - 7th Gradeb - - - - - - Note: EOG = End-of-Grade; RR = Reading Recovery aNo Level IV?s were obtained by these groups in fourth and fifth grade. bNo data was available for this cohort for sixth and seventh grade as they had not matriculated in those grades at the time of this study. Table 13 shows the 2005-06 cohort year. In third grade, 23% of the RR students achieved a level III or IV, lower than any previous cohort. In fourth grade, the percentage was slightly higher, 28%. 57 Table 13 2005-06 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). Year N Level I Level II Level III Level IV Levels III & IV 3rd Grade 49 55.1 22.4 18.4 4.1 22.5 4th Grade 39 33.3 38.5 23.1 5.1 28.2 5th Gradea - - - - - - 6th Gradea - - - - - - 7th Gradea - - - - - - Note: EOG = End-of-Grade; RR = Reading Recovery aNo data was available for this cohort for fifth through seventh grade as they had not matriculated in those grades at the time of this study. Table 14 shows the final cohort, the 2006-07 cohort year. In third grade only 7% of the RR students in this cohort scored a level III, there were no level IVs in this group. Table 14 2006-07 Cohort Year Third-grade Through Seventh Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). Year N Level I Level II Level III Level IV Levels III & IV 3rd Gradea 29 62.1 31.0 6.9 - 6.9 4th Gradeb - - - - - - 5th Gradeb - - - - - - 6th Gradeb - - - - - - 7th Gradeb - - - - - - Note: EOG = End-of-Grade; RR = Reading Recovery aNo Level IV?s were obtained by these groups in fourth and fifth grade. bNo data was available for this cohort for fourth through seventh grade as they had not matriculated in those grades at the time of this study. 58 If the percentage of just third grade EOG achievement levels (I, II, III, IV) by cohort is shown in Table 15, a difference can be seen in the year cohort performance on the EOG Reading. The earlier cohorts (2002-03, 2003-04, and 2004-05) have a higher percentage of students who scored either a III or IV. The later cohorts (2005-06, 2006-07) had fewer level III?s and IV?s and a higher percentage of level I?s. The 2006-07 cohort had a slightly smaller sample size (n=29), but the 2005-06 cohort had 49 students, more than any other cohort. This difference in percentage of IIIs and IVs in the 2005-06 and 2006-07 cohorts was directly related to the district (and statewide) shift or re-norming of the range of scores that correspond with the various achievement levels mentioned previously. The 2005-06 cohort was enrolled in third grade during the first year after the re-norming and the 2006-07 cohort was in third grade the following year. Table 15 Third-Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV). Year Cohort N Grade Level I Level II Level III Level IV Levels III & IV 2002-03 36 Third 5.6 27.8 58.3 8.3 66.6 2003-04a 30 Third - 30.0 63.3 6.7 70.0 2004-05 32 Third 12.5 28.1 56.3 3.1 59.4 2005-06 49 Third 55.1 22.4 18.4 4.1 22.5 2006-07 29 Third 62.1 31.0 6.9 - 6.9 Note: EOG = End-of-Grade; RR = Reading Recovery aNo Level I?s were obtained by this group in third grade. 59 Third-Grade EOG Results: Percentage of Students Scoring At or Above Grade Level (Levels III and IV) 66.6 70 59.4 22.5 6.9 0 10 20 30 40 50 60 70 80 2002-03 2003-04a 2004-05 2005-06 2006-07 Year Cohort Pe rc en t o f S tu de nt s A t O r A bo ve Le ve ls III & IV Figure 2. Third Grade EOG results: Percentage of RR students at or above grade level (Levels III and IV). Upon reviewing the data made available by the NC Department of Public Instruction for those years there is a clear difference in overall percentage of students across the district scoring in the III or IV range shown in Figure 3. 60 Third-Grade District Wide EOG Results: Percentage of All Students Scoring Proficient (Levels III or IV) 82.4 83.7 81.7 53.8 50.2 0 20 40 60 80 100 2004-05 2005-06 2006-07 2007-08 2008-09 School Year Pe rc en ta ge Figure 3. Third Grade EOG results district wide: Percentage of all students scoring proficient (Levels III and IV). Correlations In addition to the hypothesis and research question posed by this study?it would also be advantageous to know if the Text Reading Level subtest could be used as an indicator of future reading achievement (e.g. on the EOG). The Text Reading levels of the Observation Survey range from 0, 1 to 30. Upon successfully exiting the RR program the participants in this study were on levels ranging from 8 to 30. Given this range, was there a connection between Exit Text Reading levels and EOG scores in reading in later grades? In other words, did a higher Text Reading level increase chances of higher EOG scores in reading? Also, is the inverse true; for students whose ending Text Reading levels were low, were they also more likely to have low 61 EOG scores in reading in later grades? This analysis first focuses on the correlation between the Text Reading levels at the end of the RR program with all of the participants combined. Figure 4 compares the Exit Text Reading Level of all of the RR participants with their combined third grade EOG scores. As you can see there is a distinct split between the data points. This is due to the re-norming of the EOG scores mentioned earlier. This comparison includes all of the third grades from the various year cohorts which include the years prior to the re-norming (2004-05 through 2006-07) and the years following the change (2007-08 through 2008-09). Since the change involved essentially shifting the score ranges for a I, II, III, and IV upwards approximately 100 points, the correlation shown in Figure 4 divides into two clusters. One cluster representing the third grades for cohorts prior to 2007-08 school year (2002-03, 2003-04, 2004-05) when the score scale change was made and the other cluster representing the cohorts whose third grades occur in the 2007-08 and 2008-09 school years (cohorts 2005-06 and 2006-07) after the shift. When all of the cohorts are combined in this manner a weak correlation of r(174) = .132, p > .05 is found between the Exit Text Reading level and the third grade EOG scores represented in Figure 4. 62 Figure 4. Scatterplot of Text Reading Level and 3rd Grade EOG Scores for All Year Cohorts. When a correlation was conducted with all of the cohorts in their fourth grade year the same split occurred in the data represented in Figure 5. Figure 5 compares the Exit Text Reading Level of all of the RR to their combined fourth grade EOG scores. When all of the cohorts were combined in Figure 5 a weak correlation of r(130) = .108, p > .05, was found between the Exit Text Reading level and the fourth grade EOG scores. 63 Figure 5. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? All Year Cohorts. In Figure 6, a correlation with all of the cohorts in their fifth grade year is represented and the effects of the scale score shift were still seen, as well. Figure 6 compares the Exit Text Reading Level of all of the RR students to their combined fifth grade EOG scores. A weak correlation of r(80) = .137, p > .05 can be seen between the Exit Text Reading level and the fifth grade EOG scores. 64 Figure 6. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? All Year Cohorts. In Figure 7, Text Reading level upon exit of the RR program and sixth grade EOG scores for all of the cohorts combined were compared and the effects of the scale score shifts are no longer seen. This is due to the fact that the available sixth grade data only occurred after the scale score shift in 2007-08 and 2008-09 and only included the two longest running cohorts (2002-03 and 2003-04). Figure 7 compares the Exit Text Reading Level of all of the RR students to their combined sixth grade EOG scores and only a weak correlation of r(57) = .124, p > .05 between the Exit Text Reading level and the sixth grade EOG scores is observed. 65 Figure 7. Scatterplot of Text Reading Level and 6th Grade EOG Scores ? All Year Cohorts. The final comparison was the seventh grade year and only the 2002-03 cohort had seventh grade data available, since they were the only ones who had completed seventh grade at the time this data was collected. Figure 8 shows the Exit Text Reading Level of the RR students compared with their seventh grade EOG scores. The result was a weak, but somewhat higher, correlation of r(24) = .314, p > .05. This was most likely due to the fact that this analysis included only one cohort, 2002-03, which had matriculated in that grade. As will be seen in later 66 analyses, this particular cohort had a higher correlation between Exit Text Reading level and EOG scores than any other cohort. Figure 8. Scatterplot of Text Reading Level and 7th Grade EOG Scores ? All Year Cohorts. Overall, when the study data was considered in its entirety, there was little correlation between the Text Reading level upon exiting RR and the third through seventh EOG scores. The third grade EOG scores were the first standardized assessment available after the students completed the RR program. Because they were administered less than two years after the student?s participation in the program, it would stand to reason that they would also be the most 67 likely to show residual predictive effects of the RR program. Unfortunately there was very little correlation among these variables. If the third grade EOG scores are divided into two groups, one composed of the former scale score range, the Pre-Re-Norm group, and another group made up of the newer re-normed scale score range, Post- Re-Norm group, the two groups separately continued to show a weak correlation between the Text Reading level and third grade EOG [r(94) = .092, p > .05 and r(76) = .105, p > .05 respectively] (See Appendix A). In an attempt to further determine the predictive power of the Text Reading level scores, the correlations by year cohort were examined and some subtle distinctions could be seen. The first cohort was the 2002-03 Year Cohort. This was the earliest implementation of the RR program with regards to the other year cohorts in this study and was the cohort for which there was the most data covering the most school years all the way through seventh grade. Figure 9 shows a weak correlation r(33) = .246, p > .05 between the 2002-03 cohort?s Exit Text Reading level and the third grade EOG scores. Scatterplots for the remaining grades for the 2002-03 year cohort and the subsequent cohorts (2003-04, 2004-05, 2005-06, and 2006-07) can be found in Appendix B. 68 Figure 9. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2002-03 Year Cohorts. This study was composed of 177 students who participated in the RR program in first grade and whose subsequent performance on the NC EOG was tracked beginning in third grade. The aim of the study was essentially twofold: Determine whether the RR program improved reading achievement in this study sample and establish whether or not the RR students maintained those gains over time. The results indicated that the RR students? reading achievement improved after having been in the program (mean Text Reading Level of 3 at the start of the program to a mean level of 14, an increase of eleven points) and continued to 69 improve even after the program?s completion (mean increase of 3 additional levels by year?s end). Based on equivalence testing the study participants? EOG scores and the mean EOG scores for the district were not equivalent, in third grade and beyond. In other words, by not being equivalent the results indicated that the study participants were no longer performing on the average level with the school district population despite having been on the average level of their peers upon completion of the RR program. Using the standard set by the school district (i.e. scores of III or IV indicating proficiency) 44% of the RR students were performing at proficiency in third grade. Subsequent grade levels have increasingly lower percentages of students performing at proficiency levels. Individual cohorts faired better with some achieving as high as 70% proficiency in third grade and some as low as 7% proficiency in third grade. Finally, a correlation was investigated between the TRL and future EOG performance, but this relationship proved to be weak. 70 Conclusion This concluding section will briefly review and summarize this research, its methodology and discuss the findings and their implications. The results will be discussed along with their significance and ultimately indicate the direction that this researcher and further research should take. The goal of this research was to determine how effective a particular early reading intervention was at raising poor readers? reading achievement, in this case the Reading Recovery early intervention program, and to determine if those acquired skills carried the child into third grade and beyond while still performing on average with their peers. The researcher was given a set of data comprised of scores for RR students who successfully completed the program. Based on their performance on the RR program?s own measure, the Observation Survey, the students were considered to be performing on average with their peers upon completion of the program. Using the Text Reading Level subtest of the Observation Survey, the students pre- and post reading levels were compared to determine the amount of growth in the program. The data also included scores for each RR student on the district-wide administered assessment, the NC End- of-Grade (EOG) test in reading, beginning in third grade. The aim was to compare RR student performance on the EOG beginning in third grade with the school district average and determine if the students maintained their on average performance which was the result of having participated in the RR program in first grade. 71 Program Effectiveness The results indicate that prior to entering the RR program, the students involved in this study were clearly some of the lowest performers in reading achievement. According to the tenets of RR, these students were chosen because of their poor performance at the end of kindergarten and in the beginning of their first grade year. These lowest 20% scored on average at a level 3 on the Text Reading level subtest which essentially ranges from 1 to 30. A level 0 is in fact possible, but basically designates non-readers. Although it is true that some children are prematurely exited from the program for failure to make sufficient progress, those who do successfully complete the program are truly some of the lowest reading performers in their grade. In this study they increased on average 10-14 levels and another 3 levels by the end of the school year even after they no longer were receiving the RR instruction. For these first graders this program was significantly effective. Participation in the RR program had a statistically significant positive effect on the students? reading achievement as demonstrated by the statistically significant increase on the Text Reading Level subtest and the students? successful exit from the program. The RR program does what it sets out to do, which is bring the lowest readers up to the average level of their peers. Administrators would do well to at least consider this program for their schools. It targets a very specific group of readers?the kindergartener/first graders who are performing well below their peers who are on the cusp of learning to read. The program intervenes before the emergent readers have a chance to fail instead of simply addressing their needs once they?ve begun to fail and feel discouraged about the challenges of learning to read. In the short-term, this intervention clearly has an impact on their reading achievement. 72 There are many considerations for school systems when choosing a reading program,, such as the cost of training, which populations are targeted, and the number of students who can be served at once. Reading Recovery has a large cost associated with training teachers with graduate level instruction, it is specifically designed for struggling students in first grade only, and it is a one-on-one program which limits the number of students that can participate in the program during a year?s time. All of these factors must be considered by the school district prior to implementing the program. This final characteristic, the fact that RR is a one-one-one program, especially limits the volume of students who can be served and forces schools to select only the lowest students. This can have the unfortunate effect of excluding less severe students who might benefit from simply participating in the RR program. On the other hand, it also allows for a tool in the school?s arsenal that focuses on an especially difficult population that will be affected most severely by failing to learn to read at an early age. Nevertheless, if the goal of an administration is to address the struggling emergent readers of their schools early on while still in first grade, then the RR program appears to function in that role?at least in the short term. Long-term Effectiveness As we look beyond the short-term effectiveness of the RR program, the long-term impact of the program will tell administrators and other educators whether the RR program is worth the time and investment. Are the improved reading achievement effects of the Reading Recovery program maintained in third grade and subsequent grades, as measured by performance on the North Carolina End-of-Grade test in Reading when compared to district-wide average performance on the EOG? The results are mixed. The first analysis consisted of testing for equivalence between the mean EOG score of each cohort, its corresponding 95% CI and the 73 school district mean EOG score and its corresponding interval of ? 5 points. The results did not allow for equivalence to be demonstrated in any of the cohorts. Although they were certainly close and clearly performing well above their original standing in first grade as the lowest 20% of readers, they were not similar enough to the school district population mean in later grades. In real terms, this does not mean that they had not maintained the gains made during the RR program, but only that the study analysis did not detect the relationship between the study sample and the school population. The next analysis looked at a slightly different metric, one that is used by the school district and the state of NC to gauge student proficiency. Instead of comparing the RR students? scaled scores with the District mean EOG scores, the percentage of RR students who were scoring at proficiency was examined and the question asked, what percentage of RR students scored an Achievement Level of III or IV? The data showed that 44% of the former RR students in third grade scored either a III or IV. This is less than half of the former RR group. In fourth grade there were 40% and in subsequent grades the percentage dropped to 35% in fifth grade, then to 20% in sixth and 26% in seventh grade. Looking at the group as a whole, the return on the investment from the administrators? standpoint is an immediate increase following the initial program participation and the beginnings of a positive learning trajectory by the end of the year. Then in two years we had less than half of the group performing at proficiency. The group of proficient performers shrinks with each subsequent grade, making for an early intervention investment with little long-term gain or impact on the student performance. Admittedly there is much that can happen in those two years. The question for further research is why and when did half of the former RR students begin to struggle and get lost academically? 74 Some of the individual cohorts faired much better. The 2002-03 cohort retained 67% of the students in the proficiency category, scoring either a III or IV. The 2003-04 cohort faired even better with 70% of the students maintaining proficiency levels in third grade. The 2004-05 cohort did not fair as well, with only 59% of the students remaining on the proficient levels. Other research questions that need to be analyzed further are: Why is there such a contrast in percentages maintaining proficiency from cohort to cohort? Were there problems with fidelity in the administration of the program, procedural changes, or less support for the program in subsequent years? The remaining two cohorts, 2005-06 and 2006-07, had only 23% and 7% of its students scoring in the proficient range respectively. The 2006-07 cohort had no students receiving level IVs in third grade. One apparent explanation for the sharp contrast between these last two cohorts and the others is due to the shift or re-norming of the EOG scoring which occurred just prior to the 2007- 08 school year (highlighted earlier), which is the same year that the 2005-06 cohort was in third grade and the 2007-08 cohort was in third grade the following year. Clearly there is a problem with using the EOG as a metric in these longitudinal studies, in this case due to the shifting in scores and difficulty in making comparisons. Future research will require measures that will provide the researchers with more control, but that will also cover a broader range of reading skills. The EOG in reading essentially measures comprehension via a reading passage and multiple choice questions and is not administered until the end of third grade. On the EOG it is difficult to pinpoint an issue like fluency or letter-word identification because you simply cannot know why a student missed a particular question. Additionally, there is a period of two years after participation in the RR program in which there is little information gathered on the reading achievement of the students. There are many curriculum based measures (CBM) that are utilized 75 by school districts that can measure a variety of tasks and can be implemented prior to, during, and immediately after, as well as the following year after participation in an intervention program. These measures make it possible to see the immediate affects of an intervention, like RR, and whether it is having the desired effect. Further examination of the relationship between the independent measure, the Text Reading Level of the RR Observation Survey, and the dependent variable, student performance on the EOG, was undertaken to see if the Text Reading Level (TRL) subtest might function as an early indicator of the students? later performance on the EOG.) However, this was not the case. The highest correlation r(33) = .246, p > .05 was between the Exit TRL of the 2002-03 cohort and their third grade EOG scores. This is considered a weak correlation. There does not seem to be a strong enough relationship between the TRL at the conclusion of the RR program and the first EOG in third grade almost two years later. There are simply too many variables unaccounted for between first grade and third grade and beyond. There is not sufficient variability of scores on the Exit Text Reading Level subtest to offer indicators of future performance. Eighty percent of the scores were either a level 16 or level 18 by year?s end. Furthermore, the journey for the students from the end of first grade to the end of third grade is fraught with obstacles which interfere with their ongoing reading development. In the end, this may be the best explanation for the varied results of this study, at least with regards to the longitudinal effects of the RR program. There are simply too many factors that interfere with learning both intrinsically within the learner and externally in the school environment. As stated before, closer monitoring with quicker more efficient Curriculum Based Measures (CBM) that are administered earlier will give researchers and schools the data to know how effective their programs are. 76 The students who participate in the program show substantial gains and even retain those skills and continue to grow through the end of the first grade school year. After many months, a year, two years the students have not maintained those gains according to the results. Future research should focus on tracking the students more closely with curriculum based measures that are unlike the Observation Survey in that they are designed to measure student progress against criterion measures. Studies in which the researcher is allowed to chart the students? growth more frequently, especially during those critical second and third grade years, will produce a clearer picture of what occurs to these students once the RR program and first grade are completed. Given that the End-of-Grade test only measures reading comprehension after a year of classroom instruction, measures like the Developmental Reading Assessment (DRA) or the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) offer not only more frequent assessments but also tap into other critical areas such as fluency. Perhaps the regression of the once improved low performers is due to unaddressed fluency issues that, although improved after the RR instruction, were not maintained and later became a problem for some of these students. Their other skills having continued to mature, fluency or inability to read and simultaneously comprehend the passages of the EOG may have made it difficult if not impossible to recall the information once presented with the comprehension questions. Continued research on this subject will also need to replicate the analysis with other components of the Observation Survey and look at those subtests? ability to predict future performance and highlight future needs. Although the students upon exiting the RR program are considered to be on average with their peers, they are not yet proficient readers and therefore have weaknesses, some hidden, that may manifest themselves in the future as greater demand is placed on the student?s reading. The current analyses indicated that Text Reading level was not a 77 good indicator of future performance, but perhaps one of the other subtests of the Observation Survey, or a combination of two or more, can assist classroom teachers in continuing to monitor and address weak areas beyond the RR program. A study which focused on the transmittal of information from the RR program to the first grade teacher and the second and the third grade teachers about the student?s performance during the RR program and the results of the Observation Survey might yield markers or traits that would indicate areas in which the student needed additional instruction. These indicators could then be addressed in the classroom but using other interventions or programs (Tier 2 of the RTI process). In an education landscape where the schools are already accused of over-testing, what is required is more frequent assessments, not simply ?tests?, but smarter, quicker, less-intrusive measures that give a truer picture of where a student has been and what their areas of need are. These measures will assist any effective reading intervention program in tracking student progress, which will ultimately determine the long-term effects that that program has on student achievement. 78 References American Federation of Teachers (AFT). (1999). Building on the best, learning from what works: Seven promising reading and English language arts programs. Washington, DC. Retrieved August 5, 2008 from http://www.aft.org/pubs- reports/downloads/teachers/remedial.pdf Baenen, N, Bernhole, A., Dulaney, C., & Banks, K. (1997). Reading Recovery: Long-Term progress after three cohorts. Journal of Education For Students Placed At Risk, 2(2), 161- 181. Beaver, J. M. (2006). Teacher guide: Developmental Reading Assessment, Grades K? 3, Second Edition. Parsippany, NJ: Pearson Education, Inc. Briggs, C. & Young, B. (2003). Does Reading Recovery work in Kansas? A retrospective longitudinal study of sustained effects. Journal of Reading Recovery, 3(1), 59-64. Center, Y., Wheldall, K., & Freeman, L. (1995). Evaluating the effectiveness of Reading Recovery: A critique. Educational Psychology, 12(3-4), 263-274. Christ, T. J., Burns, M. K., & Ysseldyke, J. E. (2005). Conceptual confusion within response-to- intervention vernacular: Clarifying meaningful differences. Communiqu?, 34(3), 1?8. Clay, M. M. (1987). Learning to be learning disabled. New Zealand Journal of Educational Studies, 22, 155-173. Clay, M. M. (2001). Change over time in children?s literacy achievement. Portsmouth, NH: Heinemann. Clay, M. M. (2002). Reading Recovery: A guidebook for teachers in training. Portsmouth, NH: Heinemann. 79 Clay, M. M. (2005a). Literacy lessons designed for individuals part one: Why? When? and how? Portsmouth, NH: Heinemann. Clay, M. M. (2005b). Literacy lessons designed for individuals part two: Teaching Portsmouth, NH: Heinemann. Clay, M. M. (2006). An observation survey of early literacy achievement. Portsmouth, NH: Heinemann. Cleophas, T. J., Zwinderman, A. H., & Cleophas, T. F. (2006). Equivalence Testing., Statistics Applied to Clinical Trials. (pp. 59-65). Springer Netherlands. Denton, C. A., Ciancio, D. H., & Fletcher, J. M. (2006). Validity, reliability, and utility of the Observation Survey of Early Literacy Achievement. Reading Research Quarterly, 41(1), 8-34. Dunn, M. W. (2007). Diagnosing a reading disability: Reading Recovery as a component of a response-to-intervention assessment method. Learning Disabilities: A Contemporary Journal 5(2), 31-47. Fletcher, J. M., Shaywitz, S. E., Shankweiler, D. P., Katz, L., Liberman, I. Y., Steubing, K. K., Francis, D. J., et al. (1994). Cognitive profiles of reading disability: Comparisons of discrepancy and low achievement definitions. Journal of Educational Psychology, 86, 6- 23. Fuchs, D., Mock, D. Morgan, P., & Young, C. (2003). Responsiveness-to-instruction: Definitions, evidence, and implications for learning disabilities construct. Learning Disabilities Research & Practice, 18(3), 157-171. G?mez-Belleng?, F. X. & Thompson, J. R. (2005). Twenty years of data evaluation: A brief history of the national data collection. The Journal of Reading Recovery, 4(2), pp 66-69. 80 G?mez-Belleng?, F. X. & Thompson, J. R. (2005). U.S. norms for tasks of an observation survey of early literacy achievement. (Rep. No. NDEC 2005?02). Columbus: The Ohio State University, National Data Evaluation Center. http://www.ndec.us. Hiebert, E. (1994). Reading Recovery in the United States: What difference does it make to an age cohort? Educational Researcher, 23(9), 15?29. Iverson, S. & Tunmer, W. E. (1993). Phonological processing skills and the reading recovery program. Journal of Educational Psychology, 85(1), pp. 112-126. Joseph, L. M. (2008). Best practices on interventions for students with reading problems. In A. Thomas & J. Grimes (Eds.), Best practices in school psychology, Vol. 4, (pp. 1163-1180). Bethesda, MD: National Association of School Psychologists. Kershner, J. R. (1990). Self-concept and IQ as predictors of remedial success in children with learning disabilities. Journal of Learning Disabilities, 23, 368-374. Lose, M. K., Schmitt, M. E., G?mez-Belleng?, F. X., Jones, N. K., Honchell, B. A., & Askew, B. J. (2007). Reading Recovery and IDEA Legislation: Early Intervening Services (EIS) and Response to Intervention (RTI). The Journal of Reading Recovery, 6(2), 44?49. Retrieved September 12, 2008, from http://www.readingrecovery.org/pdf/reading_recovery/SPED_Brief-07.pdf. Lovell, K., Gray, E. A., & Oliver, D. E. (1964). A further study of some cognitive and other disabilities in backward readers of average non-verbal reasoning scores. British Journal of Educational Psychology, 34, 275-279. Lyon, G. R. Fletcher, J. M., Shaywitz, S. E., Shaywitz, B. A., Torgesen, J. K., Wood, F. B., Schulte, A., & Olsen, R. (2001). Rethinking learning disabilities. In C. E. Finn, C. R. 81 Hokanson, & A. J. Rotherham (Eds.), Rethinking special education for a new century (pp. 259-287). Washington, DC: Thomas B. Fordham Foundation. McEneaney, J. E. (2006). Agent-based literacy theory. Reading Research Quarterly, 41(3), 352? 371. National Data Evaluation Center (NDEC). (2008). Reading Recovery overview. Ohio State University. Retrieved on August 13, 2008 from http://www.ndec.us/AboutRR.asp. North Carolina Department of Public Instruction ? Division of Accountability Services/North Carolina Testing Program (NCDPI-DAS/NCTP) (2007). North Carolina End-of-Grade Tests, Technical Report. Raleigh, NC: Author. Retrieved August 12, 2008 from http://www.dpi.state.nc.us/accountability. North Carolina Department of Public Instruction (NCDPI). (n.d.). North Carolina State Testing Results. Retrieved August 2, 2011, from http://report.ncsu.edu/ncpublicschools/AutoForward.do?forward=eog.pagedef. National Research Council (1998). Preventing Reading Difficulties in Young Children. Washington, D.C.: National Academy Press. Pearson, P. D. (1999). A historically based review of Preventing Reading Difficulties in Young Children. Reading Research Quarterly, 34, 231-246. Peterson, B. (1991). Selecting books for beginning readers. In D.E. DeFord, C.A. Lyons, and G.S. Pinnell, eds. Bridges to literacy: Learning from reading recovery (pp. 119?147). Portsmouth, NH: Heinemann. Pinnell, G S. (1989). Reading Recovery: Helping at-risk children learn to read. The Elementary School Journal, 90, 161-183. 82 Pinnell, G. S., Lyons, C. A., DeFord, D. E., Bryk, A. S., & Seltzer, M. (1993/94). Comparing instructional models for the literacy education of high risk first graders. Reading Research Quarterly, 29, pp. 8-39. Pinnell, G.S., Deford, D.E., & Lyons, C.A. (1988). Reading Recovery: Early intervention for at- risk first graders. Arlington, VA: Educational Research Service. President?s Commission on Excellence in Special Education (2002). A new era: Revitalizing special education for children and their families. Washington, D.C.: Author. Quay, L. C., Steele, D. C., Johnson, C. I., & Hortman, W. (2001). Children?s achievement and personal and social development in a first-year Reading Recovery program with teachers in-training. Literacy Teaching and Learning: An International Journal of Early Reading and Writing, 5, 7?25. Reading Recovery Council of North America (RR) (2008). Reading Recovery: Basic facts. Retrieved August 12, 2008 http://www.readingrecovery.org/reading_recovery/facts/index.asp. Reading Recovery Council of North America, North American Trainers Group Research Committee (2004). Five Reading Recovery studies: Meeting the criteria for scientifically based research. Retrieved on February 25, 2008, from http://www.readingrecovery.org/sections/research/index.asp. Rodgers, E., G?mez-Belleng?, F., Wang, C., & Schultz, M. (2005). Examination of the validity of the Observation Survey with a Comparison to ITBS. Paper presented at the Annual Meeting of the American Educational Research Association Montreal, Quebec. Roush, W. (1995). Arguing over why Johnny can?t read. Science, 267, 1896-1898. 83 Schwartz, R. M. (2005). Literacy learning of at-risk first-grade students in the Reading Recovery early intervention. Journal of Educational Psychology, 97(2), 257-267. Scruggs, T., & Mastropieri, M. (2002). On babies and bathwater: Addressing the problems of identification of learning disabilities. Learning Disability Quarterly, 25(3), 155. Shanahan, T., & Barr, R. (1995). Reading Recovery: An independent evaluation of the effects of an early instructional intervention for at-risk learners. Reading Research Quarterly, 30, 958?996. Slavin, R. E. (1987). Making chapter 1 make a difference. Phi Delta Kappa, 69, 110-119. Slavin, R. E. (1989). Pet and the pendulum: Faddism in education and how to stop it, Delta Kappan, 70, 752-758. Smith-Burke, M. T. (1996). Professional development for teacher leaders: Promoting program ownership and increased success. Network News, 1-4, 13, 15. Retrieved on May 23, 2008, from http://www.readingrecovery.org/development/archives/smith-burke.asp Snow, C. E., Burns, M. S., & Griffin, P. (1998). Preventing reading difficulties in young children. Washington, DC: National Academy Press. Stanovich, K. E. (1988). Explaining the differences between the dyslexic and garden variety poor reader: The phonological-core variance-difference model. Journal of Learning Disabilities, 21, 590-604. Stanovich, K. E. (1991). Discrepancy definitions of reading disability: Has intelligence led us astray? Reading Research Quarterly, 26, 1-29. Stanovich, K. E. (2005). The future of a mistake: Will discrepancy measurement continue to make the learning disabilities field a pseudoscience? Learning Disability Quarterly, 28, 103-106. 84 Tal, N. F. & Siegal, L. S. (1996). Pseudoword reading errors of poor, dyslexic and normally achieving Readers on multisyllable pseudowords. Applied Psycholinguistics, 17, 215- 232. Tang, M. & G?mez-Belleng?, F. (2007). Dimensionality and concurrent validity of the Observation Survey of Early Literacy Achievement. Paper presented at the 2007 American Educational Research Association Conference, Chicago, IL. United States Department of Education. (n.d.). Institute of Education Sciences, National Center for Education Statistics (NCES). Retrieved August 2, 2011, from http://nces.ed.gov/surveys/sdds/2010/sprofile.aspx?id1=97000US3701800. United States Department of Education (2002). Scientifically based research and the Comprehensive School Reform (CSR) program (pp. 17?18). Washington, DC: Government Printing Office. United States Office of Education. (1977, December 29). Assistance to states for education of handicapped children: Procedures for evaluating specific learning disabilities. Federal Register, 42(250), 65082-65085. Washington, DC: U.S. Government Printing Office. Vellutino, F. R., Scanlon, D., & Lyon, G. R. (2000). Differentiating between difficult-to- remediate and readily remediated poor readers: More evidence against IQ-achievement discrepancy of reading disability. Journal of Learning Disabilities, 33, 223-238. What Works Clearinghouse (2007). WWC intervention report: Reading Recovery. Washington, DC: U.S. Department of Education, Institute of Education Sciences. Woodcock, R., McGrew, K., & Mather, N. (2001). Woodcock-Johnson Tests of Achievement- Third Edition. Itasca, IL: Riverside Publishing. 85 Appendix A Data from Figure 4 presented in two separate scatterplots correlating the relationship between Text Reading level upon exiting RR and third grade EOG scores divided into years prior to state re-norming of the EOG (Pre- Re-Norm) and years after the shifting in score ranges (Post- Re- Norm). Figure A1. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? Pre- Re-Norm Cohorts. 86 Figure A2. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? Post- Re-Norm Cohorts. 87 Appendix B Scatterplots for each of the year cohorts and their respective grades showing the relationship between Text Reading level upon exiting RR and third through seventh grade EOG scores (when available) Figure B1. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2002-03 Year Cohorts. 88 Figure B2. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2002-03 Year Cohorts. 89 Figure B3. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? 2002-03 Year Cohorts. 90 Figure B4. Scatterplot of Text Reading Level and 6th Grade EOG Scores ? 2002-03 Year Cohorts. 91 Figure B5. Scatterplot of Text Reading Level and 7th Grade EOG Scores ? 2002-03 Year Cohorts. 92 Figure B6. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2003-04 Year Cohorts. 93 Figure B7. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2003-04 Year Cohorts. 94 Figure B8. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? 2003-04 Year Cohorts. 95 Figure B9. Scatterplot of Text Reading Level and 6th Grade EOG Scores ? 2003-04 Year Cohorts. 96 Figure B10. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2004-05 Year Cohorts. 97 Figure B11. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2004-05 Year Cohorts. 98 Figure B12. Scatterplot of Text Reading Level and 5th Grade EOG Scores ? 2004-05 Year Cohorts. 99 Figure B13. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2005-06 Year Cohorts. 100 Figure B14. Scatterplot of Text Reading Level and 4th Grade EOG Scores ? 2005-06 Year Cohorts. 101 Figure B15. Scatterplot of Text Reading Level and 3rd Grade EOG Scores ? 2005-06 Year Cohorts.