The Impact of Authentic Pedagogy on Student Learning in Tenth Grade History 
Courses 
 
by 
 
Lamont E. Maddox 
 
 
 
 
A dissertation submitted to the Graduate Faculty of 
Auburn University 
in partial fulfillment of the 
requirements for the Degree of 
Doctor of Philosophy 
 
Auburn, Alabama 
May 7, 2012 
 
 
 
 
Keywords:  Assessment, Inquiry, Authentic Intellectual Work 
 
 
 
Copyright 2012 by Lamont E. Maddox 
 
 
Approved by 
 
John W. Saye, Jr., Chair, Professor of Curriculum & Teaching 
Jada Kohlmeier, Associate Professor of Curriculum & Teaching 
Kathryn H. Braund, Professor of History 
 
 
 
 
ii 
 
Abstract 
 
 This mixed-methods study examined the impact of varying levels of authentic 
pedagogy on student learning in 9th and 10th grade history classrooms.  The sample 
included four junior high teachers and four high school teachers.  During the initial phase 
of the study, instructional artifacts (tasks) and classroom observational data were 
collected and analyzed to determine the level of authentic pedagogy students experienced 
in their classes.  Participating teachers were assigned an authentic pedagogy score based 
on this analysis which was used as the primary independent variable in subsequent 
statistical analyses designed to evaluate student learning outcomes.  The findings suggest 
that authentic pedagogy has a small, but positive impact on student performance on the 
Alabama High School Graduation Exam.  Classroom level comparisons suggest that 
students who receive higher levels of authentic pedagogy were not put at a significant 
disadvantage on a test of lower order knowledge. The study also evaluated the impact of 
authentic pedagogy on higher order learning outcomes and various subgroups of students 
(i.e. race, gender, etc.).  Due to the small sample of teachers, results should be viewed as 
extremely tentative and limited to the setting where the study was conducted.
 
 
 
 
 
 
iii 
 
Acknowledgments 
  
 A study as ambitious in scope and duration as this one would never have been 
possible without the assistance of many people.  I first would like to give thanks to God 
through which all things are possible.  Next, I?d like to express my gratitude to the 
participants of this study, to include the school system itself, for agreeing to support my 
research.  The central office staff went above and beyond the call of duty in providing the 
necessary coded student data.  The teachers were also very gracious for inviting me into 
their classrooms.  The professionalism of everyone involved is to be commended.     
I?d also like to thank Dr. John W. Saye, Dr. Jada Kohlmeier, and Dr. Kathryn 
Braund for their work on my doctoral committee. Each of these professors has had a 
profound impact on my life in some way.  This is especially true of Dr. Saye who has 
been my mentor since 1992.  I?ve been truly blessed to learn from someone of his caliber 
and intellect.  I owe a special debt of gratitude to Dr. Shannon for his assistance with the 
statistical portions of this research.   
 My family played an important role in helping me with this achievement.  I would 
like to thank my parents (Jerry and Roberta Maddox), Ryan Maddox, Kathryn Maddox, 
and Jolie Maddox for their support.  I?d also like to thank the Brenton family for allowing 
me the space and time to write during several holidays I spent with them in Indiana. 
 I would be amiss if I didn?t also recognize my peers in the doctoral program at 
Auburn.  I?d like to thank Dr. Charles Farmer, Dr. Linda Mitchell, Dr. Cory Callahan, Jay
iv 
 
Howell, Colby Jones, and Blake Busbin for their thoughtful advice and assistance at 
different stages of my graduate career.  I?m especially grateful to Jay and Colby for 
setting aside time from their busy schedules to evaluate student essays and participate in 
lengthy inter-rater reliability sessions.  I could not have completed this study without 
their support. My growth as a professional was also highly influenced through 
interactions with numerous graduate and undergraduate students I had the privilege of 
meeting while at Auburn University.  I?d like to thank this group collectively for the 
positive impact they?ve had on my life. 
 Finally, I?d like to dedicate this work to my wife, Jennifer.  Jennifer was daring 
enough to marry me while I was in the midst of this research.  I?d like to thank her for 
tolerating my busy schedule and helping me to maintain my sanity during the most 
difficult stages of this study.  Her support has been unwavering and it has made my 
success in this endeavor possible.   
 
Style manual used is Publication Manual of the American Psychological Association, 6th 
Edition.  Computer software used is Microsoft Word for Windows. 
 
 
 
 
 
 
 
 
 
v 
 
Table of Contents 
 
Abstract .......................................................................................................................... ii 
 
Acknowledgments ......................................................................................................... iii 
 
List of Tables................................................................................................................. ix 
 
List of Figures ............................................................................................................... xi 
 
List of Abbreviations .................................................................................................... xii 
 
CHAPTER ONE:  INTRODUCTION .............................................................................1 
 
Study Overview and Methodology ...........................................................................5 
 
Definitions. ..............................................................................................................8 
 
Study Limitations. .................................................................................................. 10 
 
Keywords .............................................................................................................. 11 
 
CHAPTER TWO:  LITERATURE REVIEW ................................................................ 12 
 
Theoretical Foundations ............................................................................................. 16 
 
Learning Theory .................................................................................................... 16 
 
Affordances of Disciplined Inquiry ........................................................................ 19 
 
Reservations .......................................................................................................... 29 
 
Authentic Intellectual Work ................................................................................... 31 
 
Research .................................................................................................................... 34 
 
Overview ............................................................................................................... 34 
 
Harvard Social Studies Project ............................................................................... 42
vi 
 
 
Survey-Based Research .......................................................................................... 45 
 
Learning Outcomes: Authentic Intellectual Work (AIW) ........................................... 47 
 
AIW and Lower Order Outcomes........................................................................... 48 
 
AIW and Higher Order Outcomes .......................................................................... 53 
 
Gates Foundation Research .................................................................................... 60 
 
International Research............................................................................................ 67 
 
Adding to the Research base .................................................................................. 73 
 
CHAPTER THREE:  METHODOLOGY ...................................................................... 77 
 
Study Design ......................................................................................................... 79 
 
Project setting and description of participants ........................................................ 80 
 
Instrumentation ...................................................................................................... 86 
 
Study Phases ........................................................................................................ 111 
 
Data Analysis Procedures..................................................................................... 119 
 
Conclusion ........................................................................................................... 130 
 
CHAPTER FOUR:  TEACHER USE OF AUTHENTIC PEDAGOGY ....................... 131 
 
Minimal Authentic Pedagogy ............................................................................... 136 
 
Limited Authentic Pedagogy ................................................................................ 146 
 
Moderate Authentic Pedagogy ............................................................................. 161 
 
Generalizations .................................................................................................... 173 
 
CHAPTER FIVE:  STUDENT LEARNING OUTCOMES .......................................... 180 
 
Description of the Sample .................................................................................... 180 
 
Results of Inferential Analyses ............................................................................. 182 
 
Summary ............................................................................................................. 207 
 
vii 
 
CHAPTER SIX:  SUMMARY, LIMITATIONS, & IMPLICATIONS ......................... 210 
 
Summary ............................................................................................................. 211 
 
Discussion and Alternative Explanations.............................................................. 215 
 
Limitations .......................................................................................................... 221 
 
Implications and Areas for Further Study ............................................................. 223 
 
Conclusion ........................................................................................................... 228 
 
References ................................................................................................................... 230 
 
Appendix A:  Teacher Interview Script ........................................................................ 262 
 
Appendix B:  Teacher Recruitment Script .................................................................... 265 
 
Appendix C:  Scoring Criteria for Classroom Instruction ............................................. 267 
 
Appendix D:  Scoring Tips for Instruction Rubric ........................................................ 269 
 
Appendix E:  Scoring Criteria for Tasks ...................................................................... 271 
 
Appendix F:  Scoring Tips for Task Rubric .................................................................. 272 
 
Appendix G:  Email Correspondence Request for Tasks .............................................. 273 
 
Appendix H:  U.S. History Higher Order Assessment Resources ................................. 274 
 
Appendix I:  U.S. History Higher Order Assessment Instructions ................................ 276 
 
Appendix J:  Advanced Placement Higher Order Assessment Student Resources ......... 278 
 
Appendix K:  Advanced Placement Higher Order Assessment Instructions .................. 281 
 
Appendix L:  Proctor Instructions ................................................................................ 283 
 
Appendix M:  Scoring Rubric for Advanced Placement Higher Order Editorial ........... 284 
 
Appendix N:  Scoring Rubric for Manifest Destiny Higher Order Assignment ............. 290 
 
Appendix O:  Authentic Pedagogy Scores.................................................................... 296 
 
Appendix P:  Manifest Destiny Painting ...................................................................... 299 
 
 
viii 
 
Appendix Q:  WWII Political Cartoon ......................................................................... 301 
 
Appendix R:  Moderate Authentic Pedagogy Task ....................................................... 302 
 
Appendix S:  Content Analysis Explanation and Examples .......................................... 304 
 
Appendix T:  Notes on the Student Sample .................................................................. 308 
 
Appendix U:  Technical Description of Multiple Regression Analysis ......................... 309 
 
Appendix V:  Technical Description of One-way ANOVA Procedures ........................ 311 
 
Appendix W:  Technical Description of Factorial MANOVA Procedures .................... 313 
 
Appendix X:  Higher Order Editorial Examples ........................................................... 316 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ix 
 
  List of Tables 
 
Table 1:  Summary of Results from Authentic Intellectual Work Studies ....................... 58 
 
Table 2:  Summary of AIW Studies with an Explicit Focus on Disadvantaged ............... 59 
                Students 
 
Table 3:  Gates Foundation Studies:  Authentic Student Work and Performance on  ...... 65 
                Standardized Tests 
 
Table 4:  Gates Foundation Studies:  Relation between Authentic Tasks and Student ..... 66 
                Work 
 
Table 5:  Summary of International Studies ................................................................... 71 
 
Table 6:  Comparison of Tenth Grade Graduation Exam Passage Rates ......................... 82 
 
Table 7:  Descriptive Statistics for Teacher Sample ....................................................... 84 
 
Table 8:  Student Participation by Course ...................................................................... 85 
 
Table 9:  Summary of Inter-Rater Reliability Observations ............................................ 91 
 
Table 10:  Inter-Rater Agreement on Instruction and Assessment Tasks ........................ 91 
 
Table 11:  Inter-Rater Agreement on Higher-Order Editorial Tasks ............................. 103 
 
Table 12:  Summary of Inter-Rater Reliability Sessions ............................................... 118 
 
Table 13:  Summary of Research Questions and Data Analysis Methodology .............. 120 
 
Table 14:  Overview of Independent Variables Used During Regression Analyses....... 124 
 
Table 15:  Teacher Profiles .......................................................................................... 132 
 
Table 16:  Cut Scores ................................................................................................... 133 
 
Table 17:  Overview of Roy?s Authentic Pedagogy Scores .......................................... 137 
 
Table 18:  Scores for ?Teach a Lesson? Task ............................................................... 140 
x 
 
 
 
Table 19:  Overview of Amy?s Authentic Pedagogy Scores ......................................... 147 
 
Table 20:  Scores for ?Ideal Form of Government? Task.............................................. 150 
 
Table 21:  Phillip?s Authentic Pedagogy Scores ........................................................... 159 
 
Table 22:  Ryan?s Authentic Pedagogy Scores ............................................................. 164 
 
Table 23:  Scores for ?WWII Political Cartoon Analysis? Task ................................... 166 
 
Table 24:  Scores for ?Truman Think Aloud? Task ...................................................... 173 
 
Table 25:  Teacher Profiles .......................................................................................... 181 
 
Table 26:  Descriptive Statistics for Student Sample .................................................... 181 
 
Table 27:  Impact of Authentic Pedagogy on Graduation Exam Results ....................... 183 
 
Table 28:  One-way ANOVA comparing Graduation Exam Scores for Minimal &?...184  
                  Limited Classes 
 
Table 29:  One-way ANOVA Comparing Graduation Exam Scores for Minimal &?.. 186 
                  Moderate Classes 
 
Table 30:  Distribution of Manifest Destiny Editorial Scores ....................................... 189 
 
Table 31:  Comparison of Authentic Pedagogy Groups on Manifest Destiny Editorial . 190 
 
Table 32:  Distribution of German Unification Editorial Scores ................................... 194 
 
Table 33:  Comparison of AP Groups on German Unification Editorial ....................... 195 
 
Table 34:  Analysis of the Impact of Moderate Authentic Pedagogy on Graduation?..200  
                  Exam Results 
 
Table 35:  Sequential Multiple Regression Analyses Predicting Impact of Repeated   .. 203 
                  Exposure to Moderate Authentic Pedagogy on Graduation Exam Results 
 
Table 36:  Authentic Pedagogy and Achievement by Subgroups  ................................. 205 
 
Table 37:  Authentic Tasks and Achievement by Subgroups ........................................ 205 
 
Table 38:  Authentic Instruction and Achievement by Subgroups ................................ 206
xi 
 
List of Figures 
 
 
Figure 1.  Process for determining Authentic Pedagogy Scores. ..................................... 78 
 
Figure 2.  Summary of Research Phases....................................................................... 112 
 
Figure 3.  The ?Teach a Lesson? Task ......................................................................... 140 
 
Figure 4.  Reformers of the 1800s Task ....................................................................... 145 
 
Figure 5.  Examples of Supporting Arguments for German Unification Editorial ......... 198 
 
Figure 6.  Effect of repeated exposure to moderate authentic pedagogy........................ 201 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
xii 
 
List of Abbreviations 
 
 
AHSGE  Alabama High School Graduation Exam 
AIW Authentic Intellectual Work    
AP Advanced Placement or Authentic Pedagogy depending on context 
ARMT Alabama Reading and Mathematics Test  
CORS Center on Restructuring Schools 
IRB Institutional Review Board 
NAEP National Assessment of Educational Progress 
NCLB No Child Left Behind 
NCSS National Council for the Social Studies 
SAT-10 Stanford Achievement Test 
SSIRC Social Studies Inquiry Research Collaborative 
 
 
 
 
 
1 
 
CHAPTER ONE:  INTRODUCTION 
 
 
What should students learn and how should they learn it?  This question is a 
difficult one to answer regardless of the field, but especially when it comes to history.  
Efforts to devise standards in U.S. history have often resulted in heated debate and 
controversy whether at the state or national levels (Cheney, 1994; Symcox, 2002).  The 
debate over what students should learn and how they should learn it in history is complex 
(Evans, 2004).  Many people agree that the history curriculum, as part of the social 
studies, is vital for preparing good citizens.  However, people have differing conceptions 
of America?s democracy and the role of the ?good citizen? within this context.  As a 
result, a variety of curriculums have developed over time to educate secondary history 
students with different civic outcomes in mind.  The following paragraphs provide a brief 
survey of three commonly known instructional approaches as the basis for discussing the 
impact of high-stakes testing on student learning.   
Traditional instruction represents the oldest and most commonly used approach 
for teaching history.  Students are asked to remember important names, dates, and events 
from the past as highlighted by the teacher or the textbook.  Emphasis is often placed on 
student mastery of one main narrative of the past.  This narrative tends to be a celebratory 
one depicting the steady progress of America?s democracy (Barton & Levstik, 2004).  
The main goal of traditional instruction as it pertains to citizenship is often to instill 
 
2 
 
patriotism and cultural literacy.  Instruction is mainly geared towards building 
foundational knowledge based on the belief that this is needed before significant higher 
order thinking can really take place (Hirsch, 1988; Newmann, Bryk, & Nagaoka, 2001, p. 
11). 
A second approach for teaching history has been especially popular since the 
1960s.  Advocates of disciplined inquiry believe students need to have the opportunity to 
?do history? using the techniques of historians in order to formulate more in-depth and 
nuanced understandings of the past (Seixas, 2001; Wineburg, 2001).  In doing history, 
students might construct narratives of a particular historical event based on the analysis 
of primary sources. Engaging in historical interpretation has the potential to help students 
conceptualize the discipline in a manner that is more consistent with professional 
historians.  Advocates of disciplined inquiry argue that this approach is more likely to 
help students develop the higher order thinking skills and dispositions needed for life in 
the 21st century.   These learning outcomes are assumed to have civic value.  For 
example, when students encounter a civic problem they should be able to apply their 
historical thinking skills to locate relevant information, evaluate its trustworthiness, 
analyze competing sources, and work through the problem to construct a supportable 
solution.  
Finally, some social studies educators advocate problem-based historical inquiry 
(PBHI) directed towards the study of persistent issues affecting democracies (Saye & 
Brush, 2004) .  Advocates of PBHI believe that in order for the knowledge, skills, and 
dispositions acquired in history classes to transfer to life outside of school, inquiry needs 
to be situated in real world social problems.  For example, a unit on the Mexican-
 
3 
 
American War might focus on whether the United States was justified in going to war 
with Mexico.  In examining that historical problem, students would also consider the 
broader question of when one nation is justified in imposing its will on another.  Criteria 
developed to address this broad question could be applied to the historical case of the 
Mexican-American War as well as other historical and contemporary conflicts.  PBHI 
units are intended to help students see connections between the events they study in 
history and life in today?s world.  The focus on applying historical knowledge in realistic 
decision-making activities is designed to prepare students to be active citizens who can 
make decisions for the public good.                
 As can be seen through this brief overview, the social studies curriculum can be 
conceptualized in a variety of ways.  Many social studies educators have been taught 
some version of inquiry-based instruction as a method to use with students to promote 
higher order thinking and other learning outcomes.  It continues to be highly advocated 
through research publications and professional development initiatives.  Despite attempts 
to influence the practice of teachers, inquiry-based instruction remains more highly 
regarded for its potential than its actual widespread use in schools.  Inquiry-based 
instruction is difficult to implement and a variety of obstacles exist in school settings to 
limit its use (Rossi, 1998).  This study focuses on one of the biggest disincentives to 
inquiry-based instruction ? high stakes testing. 
In many states social studies teachers must prepare their students to pass high-
stakes standardized tests that primarily measure students? acquisition of lower order 
content knowledge.  The tests often seem to be aligned to standards that reflect the goals 
of the traditional history model of instruction.  They focus on how well students can 
 
4 
 
remember discrete facts from across the curriculum.  The dilemma facing social studies 
teachers in this situation is an old one: depth vs. breadth.  The high stakes tests seem to 
demand rapid coverage of information in order to ensure students are exposed to all of 
the testable content during a course.  However, teachers who adopt more ambitious 
instructional goals are likely to favor in-depth treatment of specific historical topics in 
order to promote higher order learning outcomes.  The concern among these teachers, of 
course, is whether their students will be able to pass the high stakes tests.     
Advocates of inquiry-based instruction have argued that students learn just as 
much lower order content knowledge while engaged in active, inquiry-based activities as 
students in more traditional classroom settings.  There is evidence to suggest this is true 
in other subjects, but the social studies research is not as strong.  This study is an attempt 
to better ascertain some of the learning outcomes that can be expected from inquiry-based 
instruction in history classrooms. Hopefully the results of this study will offer some 
evidence to reassure teachers that inquiry-based instruction does no harm when it comes 
to student performance on the high-stakes tests that could determine their graduation 
status. 
In order to conceptualize instruction in this study as a variable, I?ve used 
Newmann?s authentic pedagogy framework.  Teachers who use authentic pedagogy 
engage students in activities that require construction of knowledge, using elaborated 
forms of communication to create products that have value beyond school (Newmann, 
King, & Carmichael, 2007).  This is their dominant practice.  However, they also utilize 
more traditional instructional strategies such as lecture and multiple-choice tests as 
needed.  In using Newmann?s authentic intellectual rubrics to analyze instruction, I was 
 
5 
 
able to classify the teachers in this study on a continuum.  Teachers on the lower end of 
the authentic intellectual work continuum use a great deal of didactic instruction.  As the 
scores increase on the continuum, they represent greater use of authentic pedagogy (in-
depth analysis of topics, inquiry, etc.).    
Using this framework enabled me to overcome some of the problems that have 
historically plagued studies that have attempted to compare learning outcomes associated 
with traditional and inquiry-based instruction.  Prior studies have compared inquiry 
classes with control classes.  However, it was often hard for consumers of this research to 
determine the nature of intellectual challenge that was really present in the inquiry based 
classrooms. How different were they really from the traditional classes?  In this study, all 
of the classes were assigned scores using the same task and instruction rubrics.  This 
makes it easier to readily compare the degree of intellectual challenge experienced by 
students taught by one teacher as compared to another teacher in the study.  It provides a 
better basis for comparing learning outcomes.  
Study Overview and Methodology 
This was a mixed methods investigation of existing instruction at the study 
schools that involved collecting qualitative data and converting it to quantitative data for 
analysis.  It also included the analysis of quantitative data in the form of test scores.  I 
selected a junior high school and high school in southeastern Alabama as the focus 
schools for this study.  I recruited the entire 9th and 10th grade social studies faculty as 
study participants.  These teachers were asked to provide three challenging tasks that 
provided the best evidence of students performing their subject at the highest levels.  I 
then established an observation schedule to coincide with the period when students would 
 
6 
 
be engaged in work related to the tasks.  My analysis of the tasks and instruction, using 
Newmann?s authentic intellectual work rubrics, resulted in each teacher being assigned 
an authentic pedagogy score.  Cut scores were developed to form descriptive categories 
representing different levels of authentic pedagogy (minimal, limited, moderate, 
substantial).  These data were used as the basis for an analysis of the impact of this type 
of instruction on student learning. 
 The student sample included four cohorts of tenth graders.  I obtained 
achievement records, graduation exam results, and demographic data for these students 
during the 07/08 and 08/09 school years.  All tenth grade students who took social studies 
courses during this time period were included in the study.   The main instrument for 
measuring the retention of lower order social studies knowledge was the Alabama High 
School Graduation Exam (AHSGE).  In order to measure higher order thinking, I created 
a writing assessment that measured the ability of students to analyze historic documents, 
formulate arguments, and make reasoned decisions.  The higher order measure was 
administered to a smaller slice of the 10th grade student population. 
I used several types of statistical analyses to determine the impact of authentic 
pedagogy on student learning outcomes on the AHSGE and the higher order essay.  In 
doing so, I controlled for demographic and prior achievement variables likely to 
influence student performance.  Once these variables were controlled for, the importance 
of instructional experiences in promoting the desired learning outcomes became more 
apparent.  
 
7 
 
Research Questions.  The focus of this study was to examine the learning 
outcomes associated with various levels of authentic pedagogy.  The research was guided 
by the following research questions:  
Question 1:  To what extent do teachers utilize authentic pedagogy and how much  
                     variation exists within the sample of teachers in this study? 
Question 2:  Do students that have been taught by teachers demonstrating higher  
                     levels of authentic pedagogy score higher on the Alabama High School  
                     Graduation Exam (AHSGE) than students taught by teachers with lower  
                     levels of authentic pedagogy? 
Question 3:  What is the impact of authentic pedagogy on student performance on an  
                     assessment that requires them to apply knowledge from a previous unit to a  
                     challenging new task? 
Question 4:   Does the ability to apply knowledge in these situations improve with 
                      repeated exposure (multiple courses) to classroom experiences that 
                      require students to perform challenging intellectual tasks? 
Question 5:   To what extent does authentic pedagogy bring different achievement  
           benefits to students of different social and academic backgrounds? 
 
Study Purpose.  There is a lack of information within the research and literature 
describing the impact of authentic pedagogy on student learning in social studies. The 
purpose of this study is to better understand how the work students do in their social 
studies classes relates to their ability to apply what they learn on tests of lower and higher 
order knowledge.  The study is timely and needed within the field.  Today?s high stakes 
 
8 
 
testing environment, which tends to focus on student acquisition of basic content 
knowledge, serves as a disincentive to teachers interested in using disciplined inquiry as 
part of authentic pedagogy.  Teachers need to be able to turn to a body of research to 
support their use of authentic pedagogy under these circumstances.  Information 
generated from this study could possibly contribute to the national body of research that 
suggests that problem-based historical inquiry not only helps students improve their 
critical thinking abilities, but also results in the knowledge needed to perform well on 
standardized tests. 
Definitions. 
A number of terms are used throughout this study that may be new or ambiguous 
to some readers. In this section I have operationalized several of the most commonly used 
terms. 
Authentic Pedagogy.  Authentic pedagogy includes any instructional practices 
designed to elicit authentic intellectual work from students.  A teacher?s pedagogy, 
according to Fred Newmann, is a combination of daily instruction and assessment tasks.  
In order for a teacher?s pedagogy to be considered ?authentic? it must adhere to certain 
standards.  Authentic instruction is designed to promote higher order thinking, depth of 
knowledge, substantive communication, and a connection to life outside the classroom.  
Authentic tasks are designed to promote construction of knowledge, elaborated 
communication, and a connection to students? lives. 
Authentic Intellectual Work.  Fred Newmann juxtaposes the work students are 
traditionally asked to complete in school with work he considers to be ?authentic?.  
Whereas traditional school assignments are often used to simply certify success in school, 
 
9 
 
authentic achievements have broader personal significance and meaning in the real world.  
As such, they closely mimic the thinking and effort required of significant intellectual 
accomplishments for adults.  Students engage in authentic intellectual work when they 
?construct knowledge, through disciplined inquiry, to produce discourse, products, or a 
performance that has value beyond school? (Newmann, King, & Carmichael, 2007, p. 3). 
Disciplined Inquiry.  In this study, inquiry is considered disciplined when it 
adheres to the conventions and methods of reasoning associated with a particular field of 
study.   In other words, when students engage in problem-solving in history they must 
produce defensible solutions that would be seen as valid among professional historians.  
Disciplined inquiry requires students to develop a knowledge base, strive for in-depth 
understanding, and communicate ideas using elaborated forms of communication 
(Newmann, King, & Carmichael, 2007).   
Traditional Instruction.  The primary purpose of traditional instruction is the 
delivery of content which students are asked to remember and recite (Lee, Smith, & 
Newmann, 2001, p. 10).  Traditional instruction is teacher-centered and dominated by 
lecture and drill and practice exercises.  This term is most often used in this study to 
describe instructional practices that generally do not follow the standards associated with 
the authentic pedagogy framework.      
Standardized test-based reform.  My conception of standardized test-based 
reform stems from Scott Thompson?s use of the term.  This is a system of reform where 
?academic progress is judged by a single indicator and when high stakes ? such as 
whether a student is promoted from one grade to the next or is eligible for a diploma ? are 
attached to that single indicator? (Thompson, 2001, p. 358).? 
 
10 
 
 
Study Limitations. 
This study has several potential limitations.  The first limitation relates to the 
format of the Alabama High School Graduation Exam.  The exam covers U.S. history 
from exploration through World War II.  Students in the focus grade of this study (grade 
10), regardless of their instructional experiences, did not have the opportunity to learn all 
of the testable content.  The tenth grade course only covers the first half of the U.S. 
history survey.  The graduation exam is provided to tenth grade students mainly as a 
familiarization exercise, although those who pass do not have to take the test again.  
Ideally, the lower order content knowledge measure for this study would have more 
closely adhered to the curriculum students experienced.  As such, the passage rates on 
this test could have been influenced, to a greater extent than usual, by non-instruction 
related factors such as the education level of a student?s parents. 
Another graduation exam related limitation stems from the difficulty I 
encountered in conducting a content analysis of the test.  A content analysis was needed 
to verify that the test items were predominately focused on measuring lower order 
content knowledge.  The state of Alabama does not provide public access to tests or test 
questions used in previous years.  A bulletin with 84 sample test items was the only thing 
available from the state.  I used this bulletin for my analysis despite the fact that there 
was no assurance that the sample items were comparable to those found on the actual 
graduation exam. This made it difficult to determine the challenge level of the test with 
absolute certainty.       
 
11 
 
Another study limitation relates to the collection of data.  Ideally, I would use all 
of Newmann?s authentic intellectual work rubrics (task, instruction, and student work) to 
determine levels of authentic pedagogy provided by the sample of teachers.  However, I 
simply did not have the resources available to collect and analyze the student work 
associated with the tasks assigned by the study teachers.  An analysis of student work 
would have been useful to gain a better sense of the degree to which students were 
engaged in such standards as construction of knowledge and elaborated communication.   
Finally, a potential limitation involves the ability to make generalizations from 
this study.  This study includes a very limited sample of teachers and uses outcome 
measures not found in other states.  However, it is a pilot study for a larger effort by the 
Social Studies Inquiry Research Collaborative (SSIRC) focused on essentially the same 
research questions.  This association with the work of other researchers will hopefully 
allow the results to be more meaningful.         
Keywords 
 Authentic Intellectual Work,  Assessment, Social Studies Education Reform, 
Inquiry-based instruction
 
12 
 
CHAPTER TWO:  LITERATURE REVIEW 
 
 Many people believe public schools are not doing an adequate job of preparing 
students for life in the 21st century (Partnership for 21st Century Skills, 2007).  In order to 
remedy this situation, a variety of reform initiatives have been suggested.  Each of these 
is designed to influence the quality of instruction in some way.  The No Child Left Behind 
(NCLB) test-based accountability model uses rewards or sanctions based on standardized 
test results to improve instruction.  A similar reform initiative uses value-added statistical 
modeling to hold teachers accountable for how much students learn during a semester 
(Braun, 2005; Koedel & Betts, 2009; Rothstein, 2009; Stewart, 2006).  A third reform 
model is based on the authentic pedagogy construct devised by Newmann (Newmann & 
Archbald, 1988; Newmann, King, & Carmichael, 2007; Newmann, Secada, & Wehlage, 
1995). Supporters of authentic pedagogy seek to improve the capacity of teachers to 
provide intellectually challenging instruction according to standards of Authentic 
Intellectual Work (AIW).  A wide range of other ideas for improvement have been 
proposed to include additional coursework requirements and calls for more active 
instruction (Smith & Niemi, 2001). 
The mainstream reform model today is No Child Left Behind (No Child Left 
Behind Act of 2001, 2002).  While NCLB focuses primarily on improving math and 
reading achievement, many states have adopted test-based reform as an accountability 
 
13 
 
measure for social studies.  Advocates of this system believe high-stakes attached to tests 
will improve student motivation, effort, and achievement (Stecher, 2002).  The tests are 
also meant to apply pressure on schools and teachers in a variety of ways.  It is believed 
that if students fail, teachers will work harder to improve their instruction so the 
standards are accomplished.  The standards and tests adopted by states send a message to 
teachers regarding the types of learning outcomes that are most valued.  However, reports 
by a variety of think-tanks and policy organizations consistently criticize many of the 
graduation exams and high stakes tests for their lack of rigor (Achieve Inc., 2004; 
Conley, 2003; Cronin, Dahlin, Adkins, & Kingsbury, 2007; Daugherty, 2004).   
 Many states use high-stakes history tests that emphasize lower order outcomes 
(Gaudelli, 2006; Grant & Horn, 2006).  These multiple-choice tests are often designed to 
see if students know ?the basics? (Newmann, Bryk, & Nagaoka, 2001).  Tests of basic 
knowledge usually assess the ability of students to remember specific factual information 
(names, dates, events).  In this type of environment, teachers feel pressured to adopt 
coverage-based instructional approaches to survey all the possible content students might 
encounter (Grant, 2005; Grant et al., 2002).  If teachers are going to pursue the type of in-
depth, inquiry-based instruction advocated by many researchers, they need evidence that 
it will not hurt their students on these tests (Grant et al., 2002).  
The dilemma facing teachers in the present high-stakes environment highlights a 
longstanding controversy in the social studies field.  Should social studies courses be 
survey-oriented or should they provide students with in-depth learning experiences 
 
14 
 
(Newmann, Lopez, & Bryk, 1998; Parker, 1991; Rossi, 1995; Rothstein, 2004)?  These 
two instructional approaches are based on very different assumptions regarding the 
purpose of social studies and what constitutes meaningful instruction.  Traditional survey 
instruction is usually focused on transmitting factual knowledge to students while in-
depth instruction is more concerned with obtaining higher order thinking objectives 
(Rossi, 1995).  It therefore seems likely that the goals of a broad, coverage-oriented 
survey course would more closely align with the format of many high-stakes history tests.  
However, proponents of in-depth, inquiry-oriented instruction argue that this type of 
instruction is more effective in helping students achieve both lower and higher order 
outcomes.  This chapter presents the theoretical argument for and against this statement 
as well as empirical research that has tested this claim.   
I begin with a basic explanation of in-depth instruction.  Rossi?s operational 
definition of in-depth instruction provides a clearer picture of how this term has been 
conceptualized in social studies.  The construct encompasses issues-centered, inquiry-
based instruction and involves:    
1.  The use of knowledge that is complex, thick, and divergent about a single  
      topic, concept, or event using sources that range beyond the textbook; 
 
2.  Essential and authentic issues or questions containing ambiguity, doubt, or 
     controversy; 
 
3.  A spirit of inquiry that provides opportunities, support, and assessment  
     mechanisms for students to manipulate ideas in ways that transform their 
     meaning; and 
 
4.  Sustained time on a single topic, concept, or event.  (Rossi, 1995, p. 89) 
 
 In-depth units can take a variety of forms under this broad definition.  Implicit is 
the idea that instruction should foster understanding and the ability of students to think.  
 
15 
 
Most social educators believe in-depth instruction is necessary if students are to become 
effective citizens who are able to apply what they?ve learned to make decisions for the 
public good in a diverse society (National Council for the Social Studies, 1994). 
 In-depth units represent a departure from traditional coverage-based approaches 
to instruction in a number of ways.  In traditional instruction, the teacher is primarily 
concerned with the student producing the right answer.  The teacher?s role is to transmit 
factual information to the student.  The student is then tested on his/her ability to 
reproduce this information (Lee, Smith, & Newmann, 2001).  In-depth inquiry units, on 
the other hand, typically require construction of knowledge.  The end goal of in-depth 
instruction, as described by Rossi and others, is not to determine how many facts students 
can remember (although facts are still considered important).   It is to evaluate the quality 
of students? reasoning and understanding according to the intellectual standards of a 
particular discipline.  How well can students marshal relevant facts to support an 
argument?     
Having established the definition of in-depth instruction and its relation to 
traditional instruction, I now present some of the major reasons for its use in schools.  
Advocates support in-depth instruction because it is grounded in disciplinary standards, 
can be used to promote the citizenship mission of social studies, and is consistent with 
contemporary understandings of how people learn.   Each of these points will be 
examined more closely in the following sections.  Because this study focuses only on 
instruction in history classrooms, the terms in-depth instruction and historical inquiry are 
used interchangeably.  However, Rossi?s definition applies across the social science 
disciplines.   
 
16 
 
 
Theoretical Foundations 
Learning Theory  
        It seems counter-intuitive that in-depth units would result in the type of learning 
necessary for students to excel on tests of basic factual knowledge.  After all, an in-depth 
curriculum usually involves sacrificing some breadth.  How can students pass tests that 
cover a broad range of content?  Advocates believe constructivist theories of learning 
help to explain the effectiveness of in-depth instruction.    
Constructivists view students as active meaning-makers in the instructional 
process who interpret what happens in a classroom environment based on their prior 
knowledge and experiences (Bransford, 2000; Brooks & Brooks, 1993; Scheurman, 
1998).  They internalize information into mental maps or ?schemas? (Bartlett, 1932; 
Piaget, 1952; Rumelhart, 1980).  When new information is presented during class, 
students either add it to an existing schema, modify a schema to accommodate it, or 
create an entirely new schema if the information differs radically from what they have 
experienced previously (Cornbleth, 1985; Rumelhart, 1980).  
 The development of complex schemata and deep knowledge is thought to be the 
key to long-term memory as well as higher-order problem-solving (Greeno, Collins, & 
Resnick, 1996).  Much of the basis for this belief comes from studies that analyze the 
different ways experts and novices in particular fields solve novel problems.  These 
studies indicate that expert knowledge is organized hierarchically around big ideas and 
concepts while the knowledge of novices tends to be fragmented.  Knowledge that is 
highly connected in this manner is more accessible for rapid recall and can be flexibly 
 
17 
 
applied for problem-solving (Bransford, 2000; VanSickle & Hoge, 1991; Wineburg, 
1991).  In contrast, much of the knowledge of novices is inert (Whitehead, 1929).  This 
simply means that novices are often unable to recognize when their knowledge might be 
applicable when confronted with problems they haven?t experienced previously 
(Cognition and Technology Group at Vanderbilt, 1990).  
 The research on expert/novice problem-solving supports the need to have deep, 
complex knowledge.  The ability of students to form these complex connections is 
thought to depend on the nature of the instructional experience they encounter.  The 
question therefore becomes: Which instructional approach is more likely to promote the 
understanding necessary for the development of complex schemata?   Advocates of 
traditional survey-oriented instruction tend to place the greatest emphasis on drill and 
repetition (Greeno, Collins, & Resnick, 1996). Students are taught factual information 
and then subsequent courses add more information to the knowledge base.  The intent is 
for instruction to have a cumulative effect over time to where students eventually form 
more complex understandings.  Constructivists argue that this approach is based on a 
faulty belief of how humans learn.   If knowledge is never successfully integrated into 
students? schemata, it is likely to result in disconnected rather than connected knowledge 
(Bransford, 2000, p. 30).   
Advocates of in-depth instruction believe it promotes the development of complex 
schemata in a number of ways.  A good central question or problem captures students? 
attention and promotes a ?felt need? to resolve an issue (Dewey, 1938).  Authentic 
questions, those that have relevance in contemporary life, provide students with a greater 
purpose for engaging in the learning process. The issue or problem in a lesson becomes 
 
18 
 
the focal point around which students organize information; the mental peg which aids in 
memory and application of knowledge (Brooks & Brooks, 1993).  As students investigate 
a problem, they are forced to actively work with information and reorganize it in new 
ways.  The resulting process of knowledge construction (in solving problems) is believed 
to help students gain depth of knowledge and the ability to think at a higher level.   
In-depth instruction provides skilled teachers with a greater opportunity to 
diagnose student misunderstandings and provide support for learning.  Vygotsky believed 
students had a ?zone of proximal development (ZPD)? which constituted the difference 
between what students could do on their own and what students could do with guidance 
(Vygotsky, 1978).  Based on this idea, educators advocated the use of scaffolds to 
enhance student learning.  The use of scaffolds in the classroom can be compared to 
spotters in weight-lifting.  Spotters help athletes lift heavier weight and complete more 
repetitions than they ever could on their own.  The good ones make the athlete do the 
bulk of the work providing just enough support to complete the exercise.  Eventually, the 
lifter progresses to where he/she can lift the weight without the support.  The same idea 
applies to scaffolding in the classroom.  The teacher locates the student?s current 
developmental level and seeks to provide support (questions, sequenced activities, etc.) to 
help students stretch their intellectual abilities. In-depth instruction allows teachers to 
intervene with scaffolding to optimize learning and help students develop rigorous 
(discipline-based) solutions to problems (Scheurman, 1998; Vygotsky, 1978).   
In summary, many social educators endorse in-depth, inquiry-based instruction 
because they believe it allows students to build more complex understandings of 
historical concepts than is possible in coverage based environments.  This belief is 
 
19 
 
supported by the research of many cognitive scientists.  Deep knowledge and 
understanding is thought to be the key not only to improved problem-solving capacity, 
but also efficient recall of factual information.  If standardized tests are mainly tests of 
reading comprehension as some suggest (Newmann, Bryk, & Nagaoka, 2001), students 
that have developed complex schemata should possess a network of associations that can 
be used to more effectively make inferences when reading and answering multiple choice 
questions (Doyle, 1983, p. 167; Nuthall & Alton-Lee, 1995).     
Affordances of Disciplined Inquiry 
  The close relationship between in-depth instruction and contemporary 
understandings of how people learn provides one justification for the use of inquiry-based 
instructional practices in social studies.  Many researchers, educators, and policy makers 
also support inquiry because it is central to the practice of social scientists and therefore 
essential for helping students developed more accurate understandings of the discipline 
under study (Barton & Levstik, 2004; Wineburg, 2001, Seixas, 2001).  Since my research 
dealt with instruction in history classrooms, examples in this section will focus on this 
field and how inquiry is applied within research on issues-centered curriculums.   
 Wineburg?s research has been very influential among educators seeking to help 
students learn to think historically. One of his most well known studies compared high 
school students and professional historians as they reasoned with historical texts 
(Wineburg, 1991).  This study revealed major differences in the way historians and 
students viewed historical knowledge.  The students viewed historical accounts as 
definitive and truthful and this limited their ability to recognize the need to look for 
underlying meanings and subtext in the documents they were provided during the study.  
 
20 
 
The historians, meanwhile, viewed the same accounts as ?human creations? requiring 
analysis and interpretation to fully understand (Wineburg, 1991, p. 510-12).  This led 
them to apply sourcing, contextualization, and corroboration skills to develop reasoned 
conclusions about the trustworthiness of the documents.  Wineburg believed that the 
students? inability to reason deeply with the texts was primarily due to the textbook 
driven history they likely encountered in school (see Baldi, et al., 2001).   
 Textbooks often portray history as a single meta-narrative.  This creates the 
illusion that historical knowledge is static and relatively uncomplicated (Bain, 2000; 
Gabella, 1994; Wineburg, 1991).  Students gain little sense of the scholarly debate 
surrounding many of the topics they study.   Inquiry-based instruction has the potential to 
shift the epistemological stance of students from the perception of history as something to 
be memorized, to an ?uninterrupted negotiation about the character of the past? (Nash, 
1995, p. A2).   
In addition to developing a more discipline-based understanding of history, 
inquiry advocates believe this instructional approach provides students with a broad 
range of historical thinking and reasoning skills that have application in the real world.  
When students engage in historical inquiry they get to ?do? history.  They analyze 
sources of evidence to develop their own account of an historical event.  There are some 
inherent dangers in this process.  Without a proper understanding of the rules of evidence 
used by historians, students could form unwarranted conclusions or develop shallow 
interpretations of the past.  Teachers also have to fight the tendency of students to 
become relativistic once they understand that the past can be viewed from multiple 
perspectives (Saye, 1999; Barton, 2008).  The upside of allowing students to engage in 
 
21 
 
inquiry is that with proper guidance, students learn firsthand how historical knowledge is 
constructed.  They also develop skills such as the ability to critically examine sources of 
evidence, detect bias, make logical inferences and generalizations, evaluate the 
trustworthiness of competing accounts, synthesize information, look at problems from 
multiple perspectives, and empathize with the perspectives of people from different 
times, places, and cultures (Barton, 2008; Kohlmeier, 2006; Saye & Brush, 2007; 
VanSledright, 2002).   
Students in traditional classroom settings are also exposed to many of these skills, 
but this is typically through worksheets or more limited classroom exercises that are 
sometimes isolated from the primary objectives of an instructional unit.  Inquiry 
advocates believe historical thinking skills are difficult to learn out of context.  In 
inquiry-based classes, teachers embed these skills within major instructional activities.  
When students engage in realistic inquiry activities, advocates believe they are more 
likely to then be able to apply these skills in the real world (Brown, Collins, & Duguid, 
1989; Cognition and Technology Group at Vanderbilt, 1990).   
Studies suggest that students can be taught historical thinking skills and the ability 
to formulate reasoned decisions about contemporary social issues, even at a relatively 
early age  (Barton, 1997; Foster & Yeager, 1999; Lee & Ashby, 2000; Saye & Brush, 
2007; VanSledright, 2002).  Researchers have documented improvements in these areas 
in a number of ways.  Some observe classroom teachers to evaluate the effectiveness of 
various instructional practices while others evaluate the impact of specific interventions.  
The following paragraphs briefly describe the research basis for some of the more 
common claims made by inquiry advocates.  The studies are organized into two 
 
22 
 
categories: historical thinking & reasoning and decision-making.  This division reflects 
the primary orientation of the studies (i.e. historical inquiry vs. issues-centered 
instruction).  In both categories, students engage in activities that can build skills and 
dispositions needed for effective citizenship.  However, the citizenship focus is generally 
more explicit when looking at the issues-centered studies.  The main purpose here is to 
highlight some of the major research outcomes that are often cited as affordances of 
having students engage in inquiry in their social studies classes.    
 
Historical Thinking & Reasoning.  Research by Young & Leinhardt (1998), 
Monte Sano (2008), De La Paz (2005), Ferreti, et al., (2002), and Kohlmeier (2005/06) 
support the idea that inquiry-based instruction can develop students? capacity to think and 
reason on tasks that require constructing evidence-based arguments.  Young & Leinhardt 
(1998) & Monte Sano (2008) found a positive relationship to exist between classroom 
environments that emphasized inquiry and historical interpretation and the ability of 
students to construct evidence-based essays.  Young & Leinhardt examined the effect of 
the document based questions (DBQs) commonly experienced by students in Advanced 
Placement courses on historical thinking.  In their study, students became more adept at 
applying historical thinking skills on successive DBQs despite receiving little direct 
instruction on how to complete the task itself (Young & Leinhardt, 1998).  The 
researchers attributed this improvement to the teacher?s use of classroom activities that 
required students to construct arguments using a variety of different types of evidence.  
Monte Sano compared the instructional approach of two teachers and, in particular, their 
methods for teaching historical writing.  Students in the class oriented around a more 
 
23 
 
inquiry-based instructional model wrote essays that demonstrated significantly greater 
levels of historical argumentation and reasoning.  Other studies have noted increases in 
the ability of students to engage in historical inquiry in classrooms with similar 
characteristics (Gabella, 1994; Grant, 2001a). 
Research by De La Paz, Ferreti, et al., and Kohlmeier featured more explicit 
interventions designed to elicit advanced historical thinking outcomes.  De La Paz (2005) 
analyzed the ability of a diverse group of 8th grade students to apply historical inquiry 
skills after taking part in an integrated language arts/social studies unit.  Students were 
broken into three groups for analysis: students with learning disabilities, average writers, 
and talented writers.  After a relatively brief intervention of approximately two weeks, 
students in the experimental group constructed a document based persuasive essay.  The 
essays were evaluated in terms of their length, persuasiveness, number of arguments, and 
accuracy.  Students in the experimental group achieved higher scores in each of these 
areas than students in the control group.  Students from each of the groups also showed 
improvement in the areas being measured when compared to their pre-test essays.  Ferreti, 
MacArthur, & Okolo (2001) found similar results in a study that included eighty-seven 
fifth grade students in an urban, inclusion classroom environment.  Participants in this 
study experienced an eight week project-based inquiry curriculum that concluded with 
students developing multi-media presentations describing the perspectives of particular 
groups involved in westward expansion.  Students achieved statistically significant 
improvements over their pre-test scores in the areas of content knowledge and application 
of historical inquiry skills.   
 
24 
 
 Kohlmeier (2005, 2006) used a three step instructional approach to help 9th grade 
World History students effectively reason with documents to develop a deeper 
understanding of the experiences of historical women.  Students received sets of primary 
documents, at different points in the semester, describing the perspectives of women 
living during the Renaissance, Russian Revolution, and Cultural Revolution in China.  
Each time they encountered the source documents, they completed a reading web, 
Socratic seminar, and essay task.  As the students gained experience interpreting history 
and constructing evidence-based essays over the course of a semester, they demonstrated 
a better understanding of the role of historians in creating historical narratives 
(Kohlmeier, 2005).  Kohlmeier found that the documents and the three step process were 
successful in getting students to empathize with the perspectives of women and ?ordinary 
people? during the periods under study.  The three step instructional model improved the 
ability of students to critically analyze documents and write evidence-based essays 
(Kohlmeier, 2006).  Perhaps most significantly, at least one of Kohlmeier?s students 
mentioned applying skills from the course to his own reading of contemporary articles 
that were not assigned in class (Kohlmeier, 2005).   
 These studies, taken in total, suggest that students can be taught to apply historical 
inquiry skills to document sets to construct reasoned arguments.  In addition, Kohlmeier?s 
research shows the power of using carefully chosen documents in class to motivate 
students and cultivate historical empathy.  The studies were not without their faults.  In 
most cases they featured brief interventions and small sample sizes.  In De La Paz?s study 
(2005), students who did not master important aspects of the experimental curriculum 
were excluded from the final analysis.  Barton (2008) notes many of these same 
 
25 
 
weakness in his review of research on students? historical thinking,  but argues that the 
consistency of findings among a diverse body of research is encouraging.
Decision-Making.  Issues centered curriculums seek to foster some of the same 
types of thinking skills described in the previous studies, but they connect disciplined-
inquiry to broader citizenship outcomes.  Researchers are interested in whether students 
can apply these skills within the context of formulating reasoned decisions about 
contemporary societal issues (Engle, 1960).  Since decision-making in a democracy 
doesn?t occur in a vacuum, social inquiry in the classroom typically involves activities 
which require discussion and collaboration.  When evaluating the effectiveness of 
interventions, researchers tend to use tasks (written & oral) that require persuasive 
argumentation (Newmann, 1990, 1991a; Parker, Mueller, & Wendling, 1989; Saye & 
Brush, 1999a).  One such study by Newmann engaged students in a persuasive essay task 
based on a question involving the justification of a locker search.  His research suggested 
that simple exposure to a classroom environment that exhibited general characteristics 
thought to promote higher order thinking was not enough to improve students? abilities to 
reason about an unfamiliar topic (Newmann, 1991c).  However, later studies suggested 
that students can experience success on similar tasks with explicit scaffolding and 
support.   
 Parker, Mueller, & Wendling?s (1989) study demonstrated the ability of high 
school students to engage in dialectical reasoning when asked to write an essay on a civic 
issue.  The use of a scaffolded essay design encouraged the majority of the students to be 
able to argue both sides of the issue and empathize with opposing views (Parker, Mueller, 
& Wendling, 1989).  Research by Saye and Brush built on these findings while 
 
26 
 
investigating the potential of technology to help students more effectively reason about 
social issues (Saye & Brush, 1999a, 2002).  These researchers conducted a design 
experiment where they studied the effects of successive implementations of a problem-
based unit on the Civil Rights movement.  Even though the problem-based unit was 
designed and executed by a teacher with little experience with inquiry-based instruction, 
the students in the experimental class wrote essays that were more persuasive and 
featured higher dialectical reasoning scores than their peers (Saye & Brush, 1999a).  
When additional scaffolding was introduced with a new class in the second iteration of 
the study, students performed better than students from the 1st iteration in their ability to 
construct persuasive multi-media presentations that effectively used evidence to argue a 
position (Saye & Brush, 2002).   
A wide range of additional civic outcomes have been documented by researchers 
who study the effects of controversial issues discussions (Hahn & Tocci, 1990; Hess & 
Posselt, 2002; Kahne & Sporte, 2008; Larson, 2003; McDevitt & Kiousis, 2006; Torney-
Purta, 2002).  Hess & Posselt (2002) investigated how 10th grade students experienced 
controversial public issues (CPI) discussions over the course of a semester.  Two teachers 
were observed as they implemented a curriculum that featured discussions related to five 
public issues.  Students learned a variety of discussion skills such as how to ask probing 
questions, cite evidence to support an argument, make stipulations, and identify and 
explain value conflicts reflected in an issue.  Through the analysis of a variety of data 
sources (i.e. interviews, scored discussions, class observations, questionnaires), Hess & 
Posselt concluded that CPI discussions improved the discussion skills of participants and 
that students generally liked engaging in this type of activity.     
 
27 
 
 Another study conducted by McDevitt & Kiousis (2006) also found positive 
outcomes associated with controversial issue discussions.  This study evaluated 
longitudinal outcomes associated with the Kids Voting USA curriculum; a curriculum 
that includes service learning, mock election voting, family outreach activities, and other 
activities designed to inculcate deliberative habits in students.  The researchers found the 
use of frequent classroom discussions, about election issues where students could express 
their opinions, to be among the most effective strategies for promoting long term civic 
development (McDevitt & Kiousis, 2006).  Survey and focus group discussions revealed 
several benefits of discussion to include ?increased news attention, political 
conversations with parents, opinion formation, and motivation for voting? (McDevitt & 
Kiousis, 2006, p. 4).  The effects of the curriculum were shown to persist for two years 
after it was initially introduced resulting in ?self perpetuating? habits associated with 
deliberative democracy.  Other researchers have also noted that the discussion of 
controversial issues in an open classroom environment can promote civic engagement 
and participatory attitudes (Torney-Purta, J., 2002; Kahne & Sporte, 2008; Hahn & 
Tocci, 1990). 
 Some researchers have analyzed inquiry curriculums designed to produce specific 
dispositions such as tolerance.  Avery and her colleagues conducted a study with 274 9th 
grade students which evaluated an inquiry-oriented program designed to help students 
recognize the civil liberties of groups with whom they disagree (Avery, Bird, Johnstone, 
Sullivan, & Thalhammer, 1992).  Analysis of the effects of this curriculum indicated that 
the students in the experimental groups experienced statistically significant increases in 
tolerance above and beyond those in the control.  Avery concluded that a ?curriculum 
 
28 
 
that helps students comprehend the consequences of intolerance can increase students? 
willingness to extend rights to disliked groups? (Avery, et al., p. 410).   
Summary.  Social educators tend to view classrooms as ?laboratories of 
democracy? where students work together to make sense of societal problems (Parker, 
1996).  The inquiry process involves a number of steps which are not necessarily linear.  
It begins with the selection of a meaningful question.  Historical thinking and reasoning 
skills are used to gather relevant foundational knowledge and evidence.  Student 
understanding is further enhanced by classroom deliberations that reveal different views 
and perspectives on the problem.  The outcome of these activities is the construction of 
an individual or group decision about the question.  The studies reviewed in this section 
suggest a wide variety of beneficial outcomes can result from the inquiry process to 
include increased tolerance, participation in the political process, attention to news 
events, and an enhanced ability to developed reasoned positions on important issues.
 A broad range of studies and interests fit under the ?inquiry? umbrella in social 
studies.  Despite persistent appeals by inquiry advocates, significant evidence suggests 
that students rarely experience this type of instruction (Baldi, Perie, Skidmore, 
Greenberg, & Hahn, 2001; Goodlad, 1984; Kahne, Rodriguez, Smith, & Thiede, 2000; 
Levstik, 2008; Rogers & Freiburg, 1994; Sizer, 1984).  The lack of an overall consensus 
regarding inquiry or its purposes in social studies certainly makes it difficult for 
practitioners to envision alternative teaching strategies.  Inquiry is also very challenging 
and time consuming making it less practical given institutional barriers that commonly 
exist in schools (Onosko, 1991).  The next section explores some of the main reasons 
why some researchers argue for more traditional approaches to teaching social studies.   
 
29 
 
Reservations 
  Opponents of inquiry-oriented instruction criticize the research described in the 
previous section for many reasons.  Some researchers argue that controversial public 
issues instruction requires the ability to reason at levels that exceed most students? 
capabilities (King & Kitchener, 1994; Leming, 2003).  Others have voiced concerns 
about the ability of teachers and students to reason effectively about the past without 
resorting to ?presentistic? interpretations (Stern, Chesson, Klee, & Spoehr, 2003, p. 10).  
Finally, there is a general belief that students don?t have the content knowledge base 
needed to engage in critical thinking (Onosko, 1991; Ravitch & Finn, 1987).  These 
arguments are applied even more strenuously when discussing the capabilities of 
disadvantaged students and those with disabilities (Rossi, 1998).   
Critics also question the wisdom of constructivism and the student-centered 
approach commonly associated with inquiry-based curriculums (Frazee & Ayers, 2003; 
Schug, 2003).  Frazee & Ayers (2003) have argued that essential content gets 
shortchanged when teachers attempt to apply constructivist practices in their classroom. 
The consequences of shortchanging content, according to Hirsch, are most severely felt 
by disadvantaged students who miss out because they don?t have the same learning 
opportunities outside of school as their more affluent classmates (Hirsch, 2009-2010).   
Constructivist oriented curriculums face an uphill battle in winning over the 
general public. Traditional beliefs about learning remain entrenched in the public psyche 
(Powell, 1985, p. 311).  The traditional learning paradigm holds that most students don?t 
find academic work to be very interesting or motivating and therefore they must be 
pushed to achieve ? especially through external reward systems (Brooks & Brooks, 
 
30 
 
1993).  Attempts to revise the curriculum to tap into the intrinsic desire of humans to 
learn are often viewed with skepticism.  People question whether these reforms are as 
rigorous as the instruction they might have received while in school (Newmann, Marks, 
& Gamoran, 1996; Windschitl, 2002). 
Traditional notions about teaching are bolstered by researchers who claim that 
achievement can best be improved through direct instruction (Kirschner, Sweller, & 
Clark, 2006; Schug, 2003).  Schug has argued that the poor performance of students on 
content knowledge tests like the U.S. history portion of the NAEP is a reflection of 
inadequate preparation of teachers.  In his view, teachers jettison student-centered, 
constructivist pedagogy when they encounter the real world of the classroom, only to find 
they have no preparation in how to use direct instructional approaches that actually work 
(Schug, 2003, pp. 124, 127).  The constructivist emphasis of teacher education is the 
culprit, rather than teacher-centered instruction which he acknowledges as dominant in 
most schools. 
 Perhaps the most fundamental difference of opinion, when it comes to history 
instruction, centers on the basic purpose of including this subject in the curriculum.  
Many advocates of history in public schools want students to learn traditional 
interpretations of the past (Cheney, 1994; Newmann, 1991a, p. 391).  They argue that 
students need increased knowledge of U.S. history for the purpose of cultural literacy 
(Bennett, 1992; Hess, 2008b; Hirsch, 1988) and national unity (Finn, 2003; Paul Gagnon 
and the Bradley Commission on History in Schools, 1989; Saxe, 2003).  Critical 
pedagogy is generally opposed because it is believed to undermine the more patriotic 
narrative of national progress found in many textbooks.  The debate over the national 
 
31 
 
history standards and a Florida bill which defines American history for classroom 
instruction as factual and not constructed demonstrate the discomfort many Americans 
feel with postmodernism and curriculums designed to encourage historical analysis and 
interpretation (Laws of Florida, 2006; Symcox, 2002).      
The merits of inquiry-based instruction have therefore been hotly debated for 
many years.  Reconciling these competing perspectives often seems like an intractable 
problem. The next section describes the authentic intellectual work model - a vision for a 
more rigorous form of inquiry-based instruction that has gained the attention of many 
education reformers.   
Authentic Intellectual Work 
  The authentic pedagogy model proposed by Fred Newmann provides a framework 
that addresses at least some of the criticisms voiced by skeptics of inquiry-based 
instruction. Like NCLB, it is a product of the standards and accountability movement of 
the 1980s.  However, rather than using high-stakes tests as a ?lever? for instructional 
reform (Grant, 2001b), this model is designed to improve student learning outcomes by 
focusing on the quality of instruction.  The major problem with in-depth programs, as 
described by Newmann and his associates, is the implementation of constructivist 
teaching strategies without standards of quality (Newmann, Marks, & Gamoran, 1996, p. 
280).  Teachers implement constructivist strategies (i.e. projects, hands-on activities, etc.) 
without ensuring the work students are asked to complete is rigorous and grounded in 
disciplinary standards (Newmann, Marks, & Gamoran, 1996).  The authentic pedagogy 
model provides a framework that helps educators engage students in the types of 
intellectual work they are likely to encounter in today?s society.  In this model, 
 
32 
 
?authentic? refers to school tasks that are complex enough to be considered ?socially or 
personally meaningful?; on par with the types of intellectual accomplishments performed 
by adults (Newmann, King, & Carmichael, 2007, p. 2-3). 
In developing this model, Newmann and his colleagues examined a wide variety 
of intellectual challenges encountered by people in their daily occupations to ?define 
criteria for intellectual performance necessary for success in contemporary society 
(Newmann, King, Carmichael, 2007, p. 2).?  They found that adults routinely face 
problems that require them to construct or develop solutions by applying what they know.  
Newmann concluded from this analysis that meaningful classroom instruction must move 
beyond memorization of factual material to provide students with similar intellectual 
experiences.  While students are not expected to be on the same level as adults, 
Newmann?s vision is a curriculum where students are engaged in complex intellectual 
challenges that have importance beyond certifying success in school (Newmann, King, & 
Carmichael, 2007, p.5).   
The theoretical basis and major components of authentic intellectual work (AIW) 
are discussed in several important works (Newmann, 1991a; Newmann & Archbald, 
1988; Newmann, Secada, & Wehlage, 1995; Nystrand & Gamoran, 1990; Resnick, 1987; 
Wiggins, 1989).  AIW includes the following main components:  construction of 
knowledge, disciplined-inquiry, and value beyond school.  Each of these components is 
described in further detail in specific standards.   
The first component, construction of knowledge, requires students to move from 
being consumers of information to producers.  They must use their prior knowledge and 
 
33 
 
information they learn in class to construct new (for them) interpretations or solutions to 
problems.  This clearly involves significant higher order thinking.       
  The process used in developing solutions to problems is called disciplined-
inquiry.   Disciplined-inquiry is advocated to ensure students develop rigorous 
interpretations or solutions.  This means that students must use procedures and ?rules of 
evidence? that are considered legitimate by professionals in the academic discipline 
under study (i.e. historians, economists, etc.).  A disciplined approach to inquiry also 
requires students to convey their findings to others through elaborated forms of 
communication.  This can include a variety of formats to include more traditional essays 
or projects.  The goal is for students to provide deep and nuanced explanations of their 
work.   
Finally, authentic intellectual work has value beyond school.  Student work that 
has value beyond school is focused on a real world problem and is often designed to 
?have an impact on others? (Newmann, King, & Carmichael, 2007, p.5).  In social 
studies, an example might be if students tried to influence public policy by writing a 
persuasive letter to a Congressman (and actually sent it) or if a class created an 
informative website describing opposing perspectives on the issues for an upcoming 
election.  These types of activities are meaningful and significant because students are 
grappling with the same types of intellectual challenges as adults.  Ideally, the 
authenticity of the task evokes an emotional and personal investment in students as they 
strive to meet or exceed real world standards.  Students know their work will be 
evaluated (informally or formally) by a public audience that is familiar with standards of 
excellence associated with the task.  As band performances and athletic competitions 
 
34 
 
demonstrate, public scrutiny of this nature can motivate students to excel (Wiggins, 
1993b).  Newmann believes that teachers who offer students opportunities to construct 
knowledge, engage in disciplined-inquiry, and develop products that have value beyond 
school will have greater success in helping students obtain lower and higher-order 
learning outcomes that are authentic according to his definition. 
This reform model is attractive to advocates of 21st century skills and other 
education stakeholders anxious to see high school students obtain the type of education 
that will allow them to compete in global, information-based economy (Kozma, 2008; 
Pink, 2008; Wallis & Steptoe, 2006).  It also emphasizes disciplinary knowledge making 
it more palatable to proponents of the ?basics.? The use of authentic pedagogy therefore 
offers a potential way to bridge the gap between proponents of instruction for higher 
order outcomes (the disciplined inquiry advocates described earlier) and those who place 
a greater emphasis on the learning of specific historical facts.  The culture wars will 
likely continue to be an obstacle in implementing the history curriculum, but positive 
results on standardized tests, which tend to measure traditional content knowledge, might 
ease the minds of at least some critics.  The next section moves beyond theoretical 
considerations to review research that analyzes learning outcomes associated with in-
depth curriculums and authentic pedagogy.   
 
Research 
Overview 
        A wide range of studies have evaluated instructional programs and curriculums that 
correspond with Rossi?s definition of active, in-depth instruction.  This section provides 
 
35 
 
an overview of some of the more prominent efforts to compare traditional curriculums 
with inquiry-based instructional approaches designed to foster critical thinking.  The 
variables in these studies closely relate to those found in the authentic pedagogy model.  
This overview is followed by a more in-depth discussion of the authentic intellectual 
work studies.  The purpose of this review is to consider the extent to which research 
suggests that AIW and disciplined inquiry enable students to achieve lower and higher 
order learning outcomes.  I am also concerned with equity:  does this type of instruction 
benefit certain students while leaving others behind?   I argue that the research is 
inconclusive on these topics and that more authentic pedagogy studies are needed which 
specifically deal with social studies content.   
Inquiry has been a central component of many curriculums designed to teach 
students to think critically in social studies.  Research towards this goal has evolved over 
time with the major trends documented by a number of researchers (Cornbleth, 1985; 
Dewey, 1910; Fenton, 1967; Gross & McDonald, 1958; Hahn, 1991; Massialas & Cox, 
1966; Metcalf, 1963; Newmann, 1991b; Oliver & Shaver, 1966; Parker, 1991; VanSickle 
& Hoge, 1991; Wallen & Travers, 1963).  This section is primarily organized according 
to the periodization emphasized in Parker?s review of literature on the promotion of 
critical thinking in social studies (Parker, 1991). 
In the early 20th century, two main approaches were used to teach critical 
thinking.  The first approach involved breaking down the components of critical thinking 
into subskills to be taught directly (Parker, 1991).  Studies of this nature during the 
interwar period focused on propaganda resistance.  During WWII, these gave way to 
studies designed to test the efficacy of various teaching strategies designed to help 
 
36 
 
students apply specific rules of logic (Chenoweth, 1953; Glaser, 1941; Henderson, 1958; 
Hyram, 1957; Rothstein, 1960).    
The other dominant approach was referred to as progressive education.  
Progressive educators believed critical thinking could be fostered in classroom settings 
that permitted ?a greater degree of self-determination, flexibility of curriculum, and 
freedom of behavior? (Wallen & Travers, 1963, p. 484).  Research projects focused on 
the effects of specific inquiry teaching methods such as the problems-approach (Bayles, 
1956; Kight & Mickelson, 1949; Quillen & Hanna, 1948) or interventions designed to 
evaluate classes which were more student-centered (Barratt, 1964; Elias, 1958; Rehage, 
1951).  Major research initiatives such as the Eight Year Study evaluated the 
effectiveness of a variety of progressive reforms (Aikin, 1942; Dimond, 1948; Lipka et 
al., 1998; Peters, 1948) .   
The next major period of innovation took place during the 1960s and was 
influenced by the cognitive revolution in psychology.  Research efforts centered on 
teaching students how to engage in disciplined-inquiry (Bruner, 1960; Taba, 1966) or 
strategies for investigating value conflicts through issues-centered curriculums (Levin, 
Newmann, & Oliver, 1969; Massialas, 1963; Newmann & Oliver, 1970; Oliver & 
Shaver, 1966).  The first category included large curriculum projects whose purpose was 
?to shape the mindset of a generation into rational structuralist and scientific ways of 
seeing, and away from moral questions, social issues and social problems? (Evans, 2004, 
p. 129).  These projects tended to focus heavily on fielding new curriculums and not as 
much on comparing learning outcomes with traditional instruction.  One exception was 
the work of Hilda Taba (1964/1966) which successfully developed and tested a 
 
37 
 
curriculum designed to promote critical thinking in elementary social studies students.  
Taba found that a sequential curriculum which embedded instruction on critical thinking 
within disciplinary based inquiry lessons was able to significantly improve student 
thinking as indicated by their performance on the Social Science Inference Test (Taba, 
Levine, & Elzey, 1964).  In addition, their ability to learn traditional content knowledge 
was not compromised (Taba, Levine, & Elzey, 1964). 
Dissertations focused on this same ?structure of the disciplines? inquiry model 
were often more evaluative (Armstrong, 1970; Dodge, 1966; Frankville, 1969; Hunkin, 
1967; Madden, 1970; Rose, 1970; Williamson, 1966; Womack, 1969; Yost, 1972).  In 
nearly every study where an inquiry model was compared with a traditional instructional 
approach, students in the inquiry-oriented groups did as well or better on conventional 
achievement tests (Armstrong, 1970; Dodge, 1966; Frankville, 1969; Hunkin, 1967; 
Rose, 1970; Womack, 1969; Yost, 1972).  The inquiry groups also showed the greatest 
improvement when critical thinking or problem-solving variables were measured 
(Armstrong, 1970; Dodge, 1966; Yost, 1972).  Two studies from this period investigated 
the effects of in-depth curriculums compared with coverage-based instructional programs 
(Johnson, 1961; Williamson, 1966) and found the in-depth curriculums were as effective 
in preparing students for conventional tests.  Studies in later decades in geography, 
economics, and U.S. history (Byungro, 1991; Harmon, 2006; Mackenzie & White, 1982) 
also supported the use of active, inquiry-based instruction.  The only exception that I 
encountered was a study conducted by Williams (1981) which generally found the 
traditional curriculum to be superior.  In this study, an experimental group of 51 students 
who received an inquiry-based curriculum was compared with a control group of 53 
 
38 
 
students.  The experimental group demonstrated significantly greater achievement on the 
Cooperative Topical Tests (CTTAH) for U.S. History (Williams, 1982). 
Issues-centered curriculums during the 1960s built off the problem-based research 
of the progressive era (Hahn, 1991).  The issues-centered studies that I reviewed spanned 
several decades and suggested that inquiry based instructional approaches do not harm 
the ability of students to learn factual content (Cousins, 1962; Cox, 1961; Elsmere, 1961; 
Gallagher & Stepien, 1996; Lambert, 1980; Lee, 1967; Massialas, 1961; Saye & Brush, 
1999a).  The ability of these programs to help students think critically and achieve higher 
order outcomes was mixed.  In some cases, students in the experimental groups made 
significant improvements on standardized tests purported to measure critical thinking 
outcomes (Cousins, 1962; Lambert, 1980; Lee, 1967).  Other studies failed to note 
noticeable differences between the control and experimental groups (Cox, 1961; 
Massialas, 1961).  Some researchers concluded that significant advances in critical 
thinking, not captured on traditional tests, were still taking place based on qualitative 
analyses of classroom discussions (Elsmere, 1963; Massialas, 1961).  
A further advance in the research on inquiry and the fostering of critical thinking 
in social studies took place during the 1980s as Fred Newmann worked to create a 
general framework for promoting higher order thinking that would be widely accepted by 
both researchers and teachers (Newmann, 1991a, 1991b).  The design of his framework 
was grounded in a thorough review of research across subject areas and synthesized 
findings from the issues-centered and discipline-based inquiry traditions.  Newmann 
conceived of higher order thinking as involving ?a challenge that requires the person to 
go beyond the information given; that is to interpret, analyze, or manipulate information 
 
39 
 
because a question or a problem to be solved cannot be resolved through the routine 
application of previously learned knowledge? (Newmann, 1991b, p. 385).  Success with 
these ?novel? challenges involved the integration of knowledge, skills, and dispositions 
(Newmann, 1991b, p. 385).  In order to promote these components of higher order 
thinking, he devised standards which eventually evolved into the authentic intellectual 
work model.  Research associated with this model will be discussed later in this chapter. 
This brief overview describes the evolution of curriculums designed to teach 
students to think critically in social studies courses.  The degree of correspondence 
between these studies and the authentic intellectual work model varies.  Some of the early 
work does not appear to have much in common with the AIW research.  The narrow 
skills based conception of critical thinking (i.e.  Henderson, 1958) isn?t very compatible 
with Newmann?s definition of higher order thinking or the constructivist orientation of 
authentic pedagogy.  The research on student-centered reforms evaluated during the 
progressive movement is also suspect in the sense that it is difficult to evaluate the actual 
intellectual demands that were placed on students.  On the other hand, some studies, like 
Chenoweth?s, had a stronger connection to Newmann?s vision of authentic intellectual 
work.  Chenoweth?s study explicitly established a connection between a problem-based 
curriculum and contemporary issues while also requiring students to take action beyond 
the classroom (Chenoweth, 1953, p. 21).  This type of curriculum would likely score high 
on the ?connectedness to the real world? standard.   
Generally speaking, the early period of experimentation (1920?s-1950s) yielded 
some findings to suggest that a more student-centered, problem-based approach to 
instruction can improve student performance.  Most studies indicated that instructional 
 
40 
 
programs designed to encourage critical thinking did not harm students in their ability to 
achieve on conventional tests (Aikin, 1942; Barratt, 1964; Bayles, 1956; Elias, 1958; 
Kight & Mickelson, 1949; Peters, 1948; Rehage, 1951; Rothstein, 1960).  However, the 
Stanford Social Education Project found that juniors in the experimental ?problems? 
group learned less factual content about American history than those in the control who 
experienced a chronological curriculum (Quillen & Hanna, 1948, p. 174).  Early research 
also indicated that it was possible to directly teach specific critical thinking skills and that 
students were not as likely to learn these skills through a traditional didactic curriculum 
(Chenoweth, 1953; Glaser, 1941; Henderson, 1958; Kight & Mickelson, 1949; Quillen & 
Hanna, 1948; Rothstein, 1960).    
Most scholars who have reviewed the social studies literature previously 
described have noted that despite years of study, a coherent research base is lacking.  
This is due to a number of factors such as the use of diverse terminology (i.e. concept-
generalization method, problems-approach, jurisprudential approach, reflective inquiry, 
etc.), studies grounded in differing assumptions of the nature of thinking, poor research 
design, and the general difficulty of implementing inquiry-based curriculums (Hahn, 
1991; Metcalf, 1963; Newmann, 1991b; Onosko, 1991; Taba, 1966).  Without the use of 
a consistent underlying theoretical framework it becomes difficult to establish significant 
findings among the diverse studies.  Which inquiry approach is most likely to provide the 
type of learning outcomes (lower and higher) needed in today?s high-stakes testing 
environment?   
The biggest factor to consider when comparing the body of research in social 
studies with the authentic intellectual work research is the variable of instructional 
 
41 
 
quality.  Instructional methods tend to be compared without considering the intellectual 
challenge they represent for students (Metcalf, 1963; Newmann, Bryk, & Nagaoka, 2001; 
Quillen & Hanna, 1948).  An analysis of classroom dialogue and the demands placed on 
students is important because without this analysis it is difficult to compare the level of 
intellectual challenge students experienced in different inquiry oriented environments.  It 
is also possible that instruction might not have differed substantially between the 
experimental and control groups within some studies.   
The research base also includes a disproportionate number of dissertations.  These 
often included a number of limitations such as small sample sizes, relatively short 
interventions, and settings that were probably not very diverse.  It was fairly common for 
the researcher to serve as the teacher for both the control and experimental classes 
(Barratt, 1964; Cousins, 1962; Massialas, 1961; Yost, 1972).  While this mitigated 
concerns about teacher personality influencing outcomes, it presented new problems.  To 
what extent was the teacher genuinely able to switch from one instructional approach to 
the other during the course of the day?  Was the teacher unconsciously biased towards the 
experimental group?   
Some dissertations also evaluated student content knowledge based on a teacher 
made unit test (Barratt, 1964; Elias, 1958; Lee, 1967).  Today?s graduation exams are 
more ambitious in their demands since they usually encompass an entire semester?s worth 
of material.  The consequences of sacrificing coverage for depth could be more severe for 
students when they are held accountable for material that was either omitted or rapidly 
covered between inquiry units.   Schools are also held accountable for the performance of 
all of their students.  While some studies evaluated the effectiveness of inquiry-based 
 
42 
 
instructional strategies based on gender, most did do not provide information regarding 
how other subgroups (ethnicities) performed on the outcome measures.      
The strongest studies, including the Taba experiments and Indiana Experiments in 
Inquiry, tested a clear theoretical model, with a relatively large sample, over an extended 
period of time.  Findings from these studies tend to favor an inquiry approach over 
traditional instruction in producing lower and higher order outcomes.  More replication 
and longitudinal studies are needed to confirm their findings.   
The next section provides a more in-depth look at several studies that evaluate 
instructional models with strong connections to authentic intellectual work.  The first 
group of studies involves the jurisprudential model associated with the Harvard Social 
Studies project of the 1960s.  Two additional survey-based studies provide evidence of 
the effects of various types of instruction on student performance.  These studies are 
frequently cited by proponents of authentic intellectual work and are therefore most in 
need of examination. 
Harvard Social Studies Project   
       The Harvard Social Studies project tested a jurisprudential inquiry model which had 
students consider recurring public policy issues that often have no easy solution (Oliver 
& Shaver, 1966).  The model primarily used a discussion format to get students to clarify 
important facts, definitional issues, and ethical considerations associated with a persistent 
issue (Oliver & Shaver, 1966).  Phase I was a four year study conducted at the junior high 
(7th & 8th grade) level.  The experimental curriculum was implemented by four teachers 
who were also researchers from Harvard.   
 
43 
 
 The assessment that best evaluated the experimental curriculum was the Social 
Issues Analysis Test (SIAT).  The SIAT included an argument analysis test, argument 
description and rebuttal test, oral argument analysis test, and the analytic category system 
(ANCAS) test which featured interview and student-led discussion components.  The 
experimental classes outperformed the control groups on each of these tests.  The results 
of these tests led the researchers to conclude that students could be taught to think 
abstractly using the jurisprudential model, especially when considering relatively simple 
cases (Oliver & Shaver, 1966, p. 272).  Oliver and Shaver also measured student 
attainment of factual content knowledge and concluded that the experimental curriculum 
did not put students at a disadvantage when compared to students in a more conventional 
setting.  In fact, students from the experimental group were better able to retain the 
factual information they learned. 
The high school component of the project began in 1964.  Two classes received 
instruction on the experimental curriculum from project staff for three years beginning in 
the tenth grade.  The students in these classes were compared with three other groups 
during their senior year.  The first comparison group included students at the same high 
school who received the experimental curriculum from regular (non-project) teachers.  
The other two groups, an honors group and ?standard? track group, came from an affluent 
and academically strong school in the neighborhood (Levin, Newmann, & Oliver, 1969, 
p. 115).    
 During the final evaluation, the participating students took a variety of 
assessments. Among the written tests that featured lower order content were a 
standardized Problems of Democracy (POD) test and an open ended American History 
 
44 
 
Factual Recall Test.  On the POD test, the honors control group scored the highest.  The 
project group did as well as the other two control groups.  On the open ended factual 
recall test, the project groups scored significantly lower than both control groups (Levin, 
Newmann, & Oliver, 1969).  The researchers suggested that in affluent schools, where 
students and parents highly value education and equate success with testing, that students 
are more likely to be motivated and excel on conventional measures of achievement (p. 
174).  However, the intellectual challenge students actually experienced from the 
chronological curriculum at the affluent school was not investigated.  
 A strength associated with this research is the fact that the Harvard researchers 
compared the learning outcomes of the project students with students from a more 
academically oriented school.  This lends significance to the results from the Problems of 
Democracy test; the closest thing to a traditional standardized assessment.  However, the 
research still has several important weaknesses.  First, the researchers did not implement 
pre-tests or conduct periodic assessments to measure learning outcomes until the very end 
of the program.  There is no way to determine how much students learned as a result of 
the experimental curriculum.  Second, a variety of curriculum innovations were 
implemented during the three years associated with this project.  It is impossible to 
isolate which variables might have contributed to student success on the POD test and the 
other outcome measures (Levin, Newmann, & Oliver, 1969, p. 112).  Finally, the general 
setting of the Harvard Social Studies Project was a middle class suburb.  Limited data is 
provided regarding the ethnic diversity of the schools.  More information is needed 
regarding whether this curriculum is effective for different groups of students.   
 
45 
 
Two later studies also evaluated curriculums based on a similar public issues 
discussion model as the Harvard Social Studies Project.  The first one addressed 
criticisms levied against inquiry-based curriculums which argued that they were too 
advanced for disadvantaged students or slow learners.  Curtis and Shaver (1980) found 
that slow learners can effectively engage in complex reasoning of social issues when 
appropriate scaffolding of materials is provided.  Another study featured research on the 
effectiveness of a Channel 1 television segment called You Decide (Johnston, Anderman, 
Milne, Klenk, & Harris, 1994).  Students in the experimental groups performed better on 
a test of factual knowledge related to the news events they had watched.   
Survey-Based Research   
       A fairly large study was conducted by Smith & Niemi (2001) which investigated a 
number of factors that might influence achievement in history.  One area in particular 
was the impact of instructional methods on achievement.  The study incorporated an ex 
post facto analysis of data from the 12th grade National Assessment of Educational 
Progress (NAEP) history exam.  Data was derived from a sample of 4,465 students.  A 
questionnaire associated with this test provided a description of the types of instruction 
students reported experiencing in their social studies classes.  The researchers looked at 
the extent to which high school history courses emphasized writing complexity, reading 
complexity, use of alternative sources, and student discussion/debate.  
The outcome measure was student performance on the NAEP, a test featuring a 
mixture of lower order items and items that require more extended responses and some 
higher-order thinking.  The students who reported experiencing higher degrees of active 
instruction that required complex reading, writing, and discussion scored higher on this 
 
46 
 
assessment than their peers.  The researchers concluded ?if left with a choice of only one 
?solution? to raise history scores, it is clear that instructional changes have the most 
powerful relationship to student performance? (Smith & Niemi, 2001, p. 38).  This 
finding is promising since the variables in this study closely fit the authentic pedagogy 
model.  However, the self reported nature of the data is a limitation.  This limitation can 
best be seen when looking at how the researchers defined discussion.  This variable 
mainly focused on the amount of discussion related to a specific context (i.e. whole class, 
small group, presentations).  We have no way of determining whether the active 
discussion students reported experiencing was the type envisioned by Newmann; 
especially with indications that students and teachers seem to have very different views 
of quality discussions when compared with researchers (Hess, 2008a).    
Another study conducted in 2001 by Lee, Smith, and Newmann analyzed how 
different types of instruction influence student learning outcomes on conventional tests.   
Their study focused on Chicago elementary schools with data from grade levels 2-8.  The 
researchers used a 1997 survey conducted by the Consortium on Chicago Public School 
Research to determine the extent to which teachers used didactic or interactive 
instruction.  They also analyzed the amount of review teachers included in their 
curriculum.  Instructional data was paired with student performance data from the Iowa 
Test of Basic Skills which was a measure of reading and math proficiency.   
The study included data from 384 schools, over 5,000 teachers, and over 100,000 
students.  Three important conclusions were made by the researchers.  First, students who 
received more interactive instruction performed better on the ITBS.  They learned 5.1% 
more in math and 5.2% more in reading when compared to the city average.  Students 
 
47 
 
who frequently received didactic instruction tended to score below the city average.  
Second, interactive instruction was more frequent in the lower grades and became less 
frequent in the upper grades.  Finally, this study noted trends to suggest that low income 
students and students in classes with low prior achievement levels received more didactic 
instruction.   
The drawback of this study is essentially the same as the previous one.  The use of 
survey instruments (as opposed to direct observation) makes it difficult to understand 
exactly how the interactive instruction was implemented from class to class and whether 
it was rigorous.  This study was part of a broader Annenberg research grant in Chicago 
that focused primarily on Authentic Intellectual Work.  The AIW studies provide greater 
fidelity in examining the effects of intellectual challenge on achievement.  It is to these 
studies that I now turn. 
 
Learning Outcomes: Authentic Intellectual Work (AIW)    
 
Reforms associated with the authentic pedagogy model have been enacted 
domestically (i.e. Iowa, Michigan, Washington, Minnesota, Illinois) and internationally 
in Australia, the Netherlands, and Singapore (Koh, Kim, & Luke, 2009; Koh et al., 2005; 
Roelofs & Terwel, 1999).  A number of research projects have analyzed the impact of 
authentic intellectual work on student learning.  Very few of these studies focused on the 
relationship between authentic pedagogy and lower order achievement outcomes.  Most 
are concerned with the extent to which authentic tasks promote complex intellectual work 
by students.  My review of this research begins with the studies conducted by Newmann 
and his associates under the auspices of the Center on Organization and Restructuring of 
 
48 
 
Schools (CORS).   Newmann?s work primarily took place in Chicago?s public schools 
during a series of studies beginning in the mid 1990s.  Rather than present these studies 
chronologically, I separate them into two main categories; AIW?s impact on lower order 
outcomes associated with standardized tests and its impact on higher-order rubric-based 
measures of authenticity.  In each category I distinguish between social studies research 
and research that focuses on other subject areas.  The final section describes the 
progression of this line of inquiry both domestically and internationally.    
AIW and Lower Order Outcomes 
Subjects other than Social Studies.  The AIW studies in this section are useful 
even though they focus on other subject areas.  In reviewing this research, the first 
question is whether authentic intellectual work impedes student performance on 
standardized tests.  A study that dealt with this issue was conducted by Lee, Smith, and 
Croninger (1997).  The goal of this research was to determine the factors that most 
contribute to successful school restructuring. An earlier study by the same authors (1995) 
indicated that smaller communal schools were more effective in promoting student 
learning than larger schools.  The 1997 follow-up study analyzed a variety of variables, 
including instruction, to learn more about the reasons for this finding.  It focused on 
9,631 seniors in 789 high schools.  Data was derived from the National Educational 
Longitudinal Study (NELS; 1988-1992).  The NELS tracked the academic progress of 
participating students while associating them with their respective high schools and 
teachers.  The researchers had achievement scores and surveys at their disposal for 
seniors extending back to their eighth grade year.    
 
49 
 
  Lee et. al. utilized the NELS survey data from teachers and students to estimate 
the level and distribution of authentic instruction in schools. They then linked the results 
of this analysis with achievement outcomes in science and math on the NELS tests.  The 
researchers concluded ??.students attending schools that are instructionally rich and 
incorporate active learning and in which this type of instruction is shared widely gain 
more in science and mathematics achievement, both early and late in high school? (Lee, 
Smith, & Croninger, 1997, p. 141).  They also noted that achievement gains were more 
equitably distributed when authentic instruction was pervasive in the school.    
This study offers some interesting insights, but it also has at least one limitation; 
the use of survey data to estimate the authentic demands of instruction instead of actually 
observing instruction or collecting tasks and student work.  The surveys allowed the 
researchers to characterize instruction in broad terms as being more or less active, but do 
not provide a clear indication of the intellectual challenge being offered to students.  For 
example, the survey items asked how often students use computers, use hands-on 
materials or models, use books other than the math textbook, participate in student-led 
discussion, etc.  These activities and others listed in the survey can, and often do, involve 
little challenge.  Despite this limitation, the study still provides a rough measure of the 
type of instruction students? encounter and its conclusions, when taken into account with 
Newmann?s other research, are an important contribution to the field.  Assuming that the 
NELS is primarily a test of lower order knowledge, this study suggests that student 
performance will not be harmed by the use of authentic instruction.    
A later study also addressed the issue of authentic intellectual work and student 
performance on standardized achievement tests (Newmann, Bryk, & Nagaoka, 2001).  
 
50 
 
This study included a mixture of grade levels to correspond with the grades in which the 
standardized tests were customarily administered.  Researchers collected data for three 
years (1997-1999) from a sample that included 19 schools.  The schools were a 
representative sample of the types of public schools found in Chicago, but were actually a 
little more disadvantaged than the norm (Newmann, Bryk, & Nagaoka, 2001).  The most 
pertinent information in this study related to performance data collected on the 1400 
eighth grade participants.   
 The research team collected two typical and two challenging math and writing 
assignments from each participating teacher.  These tasks were evaluated by outside 
teachers trained in the use of the AIW scoring rubrics.  Based on this analysis, the 
participating teachers were ranked according to the intellectual challenge represented by 
the tasks they submitted.  The researchers compiled the scores of the students in the 
participating teachers? classes on the Iowa Test of Basic Skills (ITBS) and the Illinois 
Goal Assessment Program (IGAP).   
 The researchers found that students in classes that received high quality 
assignments scored 20% higher than the national average on the writing and math 
portions of the ITBS.  In comparison, students who received assignments that were less 
demanding scored 25% less than the national average in reading and 22% less in math on 
average.  The same basic trend was evident for IGAP scores.  Students who received high 
quality assignments were likely to outperform their peers on the IGAP reading portion by 
32 points, the math portion by 48 points, and the writing rubric by 2.3 points (Newmann, 
Bryk, & Nagaoka, 2001, p. 25).   This study is significant because it most clearly 
 
51 
 
demonstrates the relationship between authentic instruction and achievement on lower 
order conventional tests. 
 In a third study which featured content other than social studies and focused on 
standardized testing outcomes, D?Agostino sought to determine the impact of instruction 
on Title I programs seeking to improve the reading and math achievement levels of 
disadvantaged students.  Researchers analyzed instruction in 53 third grade Title I 
classrooms in 29 schools in the Chicago public school system.  These schools were in 
high poverty areas and the majority had student populations that were 90% African 
American (D'Agostino, 1996).  Instruction was rated based on Newmann?s authentic 
intellectual work principles.  D?Agostino found that most classrooms did not heavily 
emphasize AIW.  In general, the math lessons tended to score higher than reading.  
Disadvantaged students in these schools often did not engage in lessons that featured 
higher order thinking related to situations they were likely to encounter in their lives 
outside of school (D?Agostino, 1996).    
 Student achievement in this study was based on specific reading and math 
subtests of the Iowa Test of Basic Skills that measured both higher order and lower order 
knowledge and skills.  D?Agostino found that authentic instruction had no relation to 
vocabulary achievement in reading (D?Agostino, 1996).  However, a moderate amount of 
authentic instruction was shown to improve achievement on the reading comprehension 
section of the test.  The results for math instruction were more consistent.  Students who 
received higher levels of authentic instruction demonstrated greater adjusted gains (pre 
vs. post) on both the higher order and basic skills portions of the ITBS than their peers in 
less authentic settings.   
 
52 
 
 The studies by Lee et. al., Newmann, and D?Agostino suggest that students who 
experience higher levels of authentic pedagogy are not likely to perform any worse on 
conventional standardized tests than students in less authentic settings.  The findings are 
strongest for elementary students in math, reading, and writing.  In general, authentic 
pedagogy was not very prevalent among the teachers in these studies.  However, when 
teachers did provide authentic instruction, the benefits appeared to be equitably 
distributed.  In Newmann?s study in particular, gains on the ITBS were in some instances 
larger for students with lower levels of prior achievement than those of their higher 
achieving peers (Newmann, Bryk, & Nagaoka, 2001).  D?Agostino?s findings were less 
decisive, but still suggest that lower achieving students might benefit from AIW; 
especially in math.  The findings from these studies should be viewed tentatively, 
especially since two of the three focused exclusively on Chicago?s public schools. 
Social Studies.  Only one AIW study measured the impact of authentic 
intellectual work on a lower order measure of student learning in social studies.  Avery?s 
study (1999) involved five U.S. history teachers in one urban high school in Minnesota.  
The teachers implemented a four-week unit on immigration where they used the same 
authentic essay task as the culminating activity.  The students also completed a 
conventional 10 item multiple choice test.  Avery controlled for factors that might 
influence student achievement to include sex, race/ethnicity, socioeconomic status, and 
student engagement.  The main goal of this study was to determine how the level of 
authentic instruction would impact student performance on a common task that met 
Newmann?s requirements for being authentic.  Each teacher taught a similar lesson to 
prepare their students for the task (in terms of content), but the approach they used varied 
 
53 
 
significantly.  Raters evaluated the classroom instruction and assigned scores based on 
Newmann?s instruction rubric.   Avery found that authenticity of instruction accounted 
for 40% of the differences in student performance on the task (Avery, 1999).  Students 
who received higher quality instruction performed better on the authentic task.  Avery 
noted a small statistical link between the level of authenticity of instruction and student 
scores on the multiple choice test.     
AIW and Higher Order Outcomes 
Subjects other than Social Studies.  Two of Newmann?s studies preceded the 
2001 study discussed in the previous section (Newmann, Bryk, & Nagaoka, 2001) and 
focused on higher order authentic outcomes.  Newmann and his associates conducted a 
one year study on AIW in restructured schools (Newmann & Associates, 1996; 
Newmann, Marks, & Gamoran, 1996) and another study in 1998 designed to collect 
baseline data for later research efforts (Newmann, Lopez, & Bryk, 1998).  I will review 
these studies out of chronological order since the 1998 study did not include any social 
studies content.   
Newmann, Lopez, & Bryk collected data from math and writing teachers in 
grades 3, 6, and 8.  Twelve schools were included in this study.  They were atypical of 
Chicago schools in general in that the students had lower test scores and were more 
disadvantaged than their peers.  The purpose of the study was to collect information 
regarding the authenticity of assignments provided by teachers in the study schools and to 
analyze the link between these assignments and quality student learning outcomes.  The 
researchers gathered data from two teachers in each grade, for each subject, in each 
participating school.  They collected four tasks (two typical, two challenging) along with 
 
54 
 
student work associated with the challenging tasks.  They received tasks from 74 teachers 
and work from 700 students.  In this study, classroom instruction was not rated.   
The degree of challenge represented by the assignments was broken down into 
four categories:  extensive, moderate, minimal, or none.  The researchers noted that in all 
three grades, the majority of the writing and math assignments fell into the lowest two 
categories (Newmann, Lopez, & Bryk, 1998).  The challenging assignments did tend to 
rate higher than the typical assignments.  The writing assignments were generally more 
demanding than the math assignments.   
Students in the classrooms that offered more authentic assignments produced 
work that was on average 46 percentile points higher than peers in less authentic classes 
(Newmann, Lopez, & Bryk, 1998).  This study supports the strong relationship between 
assignment quality and student work while noting that quality instruction is also needed 
to ensure student success.  This study did not attempt to control for other factors that 
might contribute to quality student work.  It only demonstrated that a relationship exists 
between quality tasks and quality student work.  As a baseline study, it showed that at 
least some students in the study schools in Chicago had the opportunity to produce 
authentic work and they were often able to do it (Newmann, Lopez, & Bryk, 1998). 
Social Studies.  Other AIW studies focused more attention on the link between 
authentic instruction and authentic achievement.  A study conducted by Newmann, 
Marks, and Gamoran in 1996 analyzed 24 significantly restructured public schools 
(Newmann, Marks, & Gamoran, 1996).   The social studies sample from this study 
(grades 9 and 10) included 23 teachers with 348 students.  Newmann and his associates 
analyzed instruction, tasks, and student work to determine the extent to which teachers 
 
55 
 
offered authentic pedagogy, which students were most likely to experience it, and its 
impact on student performance.  Several important outcomes were determined from this 
study.  First, the upper levels of the AIW standards were difficult to achieve.  Both the 
teacher authentic pedagogy scores and student work scores were, on average, below the 
midpoint of the range of possible scores (Newmann, Marks, & Gamoran, 1996).  In 
testing the connection between authentic pedagogy and authentic student performance, 
the researchers found the level of authentic pedagogy to be the most significant predictor 
of quality student performance. Researchers concluded that ??an average student would 
increase from about the thirteenth percentile to about the sixtieth percentile as a result of 
experiencing high versus low authentic pedagogy? (Newmann & Associates, 1996, p. 
58).  A final set of findings for this study relates to equity.  Newmann?s analysis found 
that student characteristics (gender, race, ethnicity, SES) did not play a significant role in 
determining whether they received authentic pedagogy in the restructured schools 
(Newmann, Marks, & Gamoran, 1996).  In addition, the effect of authentic pedagogy also 
seemed to be positive for all students.  The only area where the effect seemed to differ 
was in terms of student prior achievement levels.  Students with high prior achievement 
levels benefited more than their peers when they experienced higher levels of authentic 
classroom instruction (Newmann, Marks, & Gamoran, 1996).  Newmann also sought to 
determine whether the high scoring student work indicated a bias towards any particular 
subgroup.  According to his analysis, ?Hispanics and low-SES students did not score 
significantly lower than whites or high-SES students, respectively? ? (Newmann, Marks, 
& Gamoran, 1996, p. 303-304).  Some achievement gaps were noted; blacks scored lower 
than whites and girls outperformed the boys.  These achievement gaps were not 
 
56 
 
significantly greater than the gaps found on the traditional NAEP assessment (Newmann, 
Marks, & Gamoran, 1996). 
A limitation in the study design was the fact that students who received 
inauthentic tasks were not really afforded the opportunity to produce quality student work 
(Newmann, Marks, & Gamoran, 1996).  Noel criticized the study for having low inter-
rater reliability among researchers evaluating student work (.54), the lack of validity data 
on the measurement scale, and for failing to use the same assessment tests to evaluate 
student learning (Noel, 1996).  Other researchers have also questioned the authentic 
pedagogy terminology and its validity (Cizek, 1991a, 1991b; Terwilliger, 1997, 1998). 
 King, Schroeder, and Chawszczewsli (2001) examined the impact of authentic 
instruction on students with disabilities in inclusion classrooms.  Specifically, they 
looked at secondary schools with inclusionary practices and asked ?to what extent are 
teacher-designed assessments authentic?? and ?how do students with and without 
disabilities perform on these assessments? (p. 1).?   The study included a variety of 
subjects (language, science, math, social studies) and grades (9/10, 11/12).  The 
researchers collected a task and student work from teachers in two different data sets 
during the 1999-2000 school year.  The first data set included 16 teachers from two 
schools.  In this data set, the researchers collected and analyzed student work for the 
entire class pertaining to the submitted task.  The second data set included 35 teachers 
from three schools representing the same subject areas and grades.  The teachers 
submitted work for two students; one regular and one with a disability.  This data was 
used for comparison purposes.  The tasks in both data sets were analyzed based on the 
 
57 
 
writing and math AIW task rubrics.  The student work was analyzed using subject 
specific AIW rubrics.   
   The findings for data set one indicated that the majority of the tasks failed to 
offer problems that showed a connection to students? lives.  Nevertheless, a significant 
relationship was noted between the level of authenticity on the task and the quality of 
student work as measured by the AIW student work rubrics.  The most interesting finding 
shows that special education students who were assigned higher quality tasks produced 
work of higher quality than special education students who received less authentic 
assignments.  The difference was not statistically significant (King et. al, 2001).   
 In data set 2, accommodations granted to special education students were factored 
in to the analysis of task quality.  A statistically significant amount of the tasks scored 
lower in authenticity when accommodations were factored in, but the researchers noted 
that most tasks (85.7%) scored the same.  When the work produced by the matched pairs 
(one disabled, one regular) was analyzed, King found that 62% of the work produced by 
the disabled student was of equal, or higher, quality than their non-disabled peer (King et. 
al, 2001).  Like data set 1, there was a high correlation (r= .68) between the authenticity 
of a task and the authenticity of student work.  King et al. concluded ?Teachers who use 
more authentic assessments elicit more authentic work from students with and without 
disabilities (King et al., 2001, p. 12).? 
 
 
 
 
58 
 
     Table 1   
 
    Summary of Results from Authentic Intellectual Work Studies 
 
Study Subject(s) Conventional Outcomes Authentic/Higher Order Outcomes Equity 
Lee, Smith, & 
Croninger 
(1995/1997) 
Science and 
Math 
(8-12) 
Students in schools with high 
levels and distributions of 
authentic instruction achieved 
larger gains on the NELS tests. 
(p. 141) 
N/A 
Learning is more 
equitably distributed 
when authentic 
instruction is 
?pervasive? in schools. 
(p. 141) 
Newmann, 
Marks, & 
Gamoran 
(1996) 
Math and Social 
Studies 
(all levels) 
N/A 
Authentic pedagogy in both 
subjects was the highest 
predictor of complex 
intellectual student work. 
Authentic instruction 
was beneficial for all 
students regardless of 
gender, ethnicity, race, 
or SES. 
Newmann, 
Lopez, & Bryk 
(1998) 
Math and 
Writing (3,6,8) N/A 
?Students in the classrooms that 
offered more authentic 
assignments produced work 
that was on average 46 
percentile points higher than 
peers in less authentic classes? 
(p. 39) 
Students in this sample 
were more 
disadvantaged than 
others in Chicago. 
Avery 
(1999) U.S. History 
Students not harmed by 
authentic instruction on 
conventional 10-item test. 
Students who experienced 
authentic instruction performed 
better than peers on authentic, 
higher-order essay measure 
 
Newmann, Bryk, 
& Nagaoka 
(2001) 
 
Math and 
Writing 
(3,6,8) 
Authentic instruction enabled 
students to perform at a higher 
level on the ITBS and IGAP. 
N/A 
Students in this sample 
were generally more 
disadvantaged than their 
peers in other Chicago 
schools. 
    
 
59 
 
Table 2   
 
    Summary of AIW Studies with an Explicit Focus on Disadvantaged Students 
 
Study Focus Subject(s) Conventional Outcomes Authentic/Higher Order Outcomes 
D?Agostino, 1996 
Title I and Low 
SES 
 
Math and 
Reading 
(3) 
Higher levels of authentic 
instruction positively 
correlated with improved 
math scores1.  Moderate use 
of authentic instruction 
appears to best promote 
reading comprehension. 
 
King, Schroeder, & 
Chawszczewski, 2001 
Special Education 
Students 
Language, 
Math, Science, 
Social Studies 
(9-12) 
 
Special Education students 
who received high levels 
of authentic pedagogy 
achieved at higher levels 
than regular students who 
received low levels of 
authentic pedagogy.   
Amosa et al., 2007 
Aboriginal and 
Torres Islander; 
Low SES 
(Australia) 
Not Indicated 
 
 
 
 
No conventional measure 
Indigenous students who 
received high quality tasks 
produced work that on 
average exceeded work 
produced by non-
indigenous students who 
received low quality tasks.   
Low SES performed better 
than high SES when both 
given high scoring tasks. 
    Note:  Outcome measure is the ITBS.  Amosa?s study used a construct unique to Australia which incorporated AIW
 
60 
 
 
Gates Foundation Research 
The High School Grants Initiative supported by the Gates Foundation adopted the 
authentic intellectual work model as a way to evaluate the effectiveness of its initiative to 
redesign or build new high schools based on a small learning community model.  In 
2002/03, researchers collected and evaluated tasks and student work as a pilot study with 
teachers in Washington State (AIR/SRI, 2004).  During the next year, the program was 
implemented nationwide and the AIW rubrics were used as one measure to compare 
redesigned schools with traditional ones (AIR/SRI, 2006).   A final study completed in 
2007 evaluated the performance of Foundation schools with the baseline data collected 
from the pilot study (AIR/SRI, 2007).  In each of these studies, quality student work was 
defined by the criteria specified in Newmann?s AIW framework and therefore 
encompassed higher order outcomes such as construction of new knowledge in 
English/Language Arts and reasoning and problem-solving in mathematics (AIR/SRI, 
2006).   
 The 2006 study included limited analyses of the relationship between authentic 
pedagogy and learning outcomes; both standardized and authentic.  After controlling for 
a number of factors, researchers in this study found a significant positive relationship 
between quality student work in English/Language Arts (ELA) and improved 
standardized test scores in reading.  In mathematics, the relationship was positive, but not 
statistically significant (AIR/SRI, 2006, 2007).  In analyzing the relationship between 
authentic assignments and higher order authentic outcomes, the researchers focused on 
 
61 
 
elements of the AIW framework individually (assignment rigor & relevance) to more 
accurately pinpoint the elements of an authentic task that most influence student learning.  
This makes it difficult to make direct comparisons between the Gates? studies and other 
studies conducted by Newmann and his associates.  However, some general conclusions 
still apply.  In both subjects, authentic assignments were positively associated with 
quality student work (AIR/SRI, 2006, 2007).  ELA students responded to challenging 
tasks by producing work of better quality than students in the comparison schools 
(AIR/SRI, 2006).  However, in math, where the assignments from the Foundation schools 
were slightly better than comparison schools, the student work did not exceed that of the 
traditional schools.  The researchers noted that most of the math assignments did not 
score very well from either type of school and that teachers may be encountering 
difficulties in implementing constructivist assignments.   
The other Gates Foundation study with implications for my research focused on 
the redesigned schools in Washington State and evaluated instructional changes and 
learning outcomes compared to baseline data collected in the pilot study (AIR/SRI, 
2007).  In conducting their analysis, the researchers controlled for student demographics, 
prior achievement, teacher characteristics, and several other variables that potentially 
could influence achievement. Student performance was based on the Washington State 
10th grade achievement tests (WASL) and in this instance researchers noted a positive 
relationship between quality student work and math scores (AIR/SRI, 2007).  There was 
not a positive relationship for language arts.  The researchers, noting the discrepancy of 
outcomes between this study and the previous one, hypothesized that student work 
 
62 
 
quality might have ?higher correlations with tests other than the WASL? (AIR/SRI, 2007, 
p. 21).  This has implications for my study in that authentic pedagogy may succeed in 
enabling students to perform well on certain types of standardized tests, but not others.   
This study also analyzed the extent to which authentic tasks result in complex 
intellectual work in student products.  As in the previous study, a high correlation was 
noted between authentic tasks and quality student work for ELA and math (AIR/SRI, 
2007).  Tables 3 and 4 summarize the learning outcomes associated with authentic 
pedagogy as established by the Gates Foundation research.  
 Several important points should be made about these studies.  First, an explicit 
goal of the Gates Foundation was to target disadvantaged student populations.  In an 
analysis of whether the Foundation was meeting this objective, the ARI/SRI 2006 report 
indicated that ?Two thirds of new schools and almost 80% of redesigned schools 
exceeded their district averages for enrollment of students eligible for free or reduced-
price lunch and or enrollment of students from minority backgrounds? (p.16).  The 
outcomes described in these reports show that authentic pedagogy can achieve at least 
some success with students from disadvantaged backgrounds.   
 Another important aspect of these studies is their unique methodology.  As 
previously noted, assignment rigor and relevance were analyzed independently as 
variables instead of together as part of a composite authentic task variable.  The relevance 
variable included elements of Newmann?s connection to students? lives standard while 
the rigor variable examined the extent to which the task required construction of 
knowledge.  In analyzing the tasks provided by teachers, the researchers noted low levels 
 
63 
 
of relevance in general even though the restructured schools achieved better scores than 
the more traditional schools.  Assignment relevance and rigor were strongly correlated.   
When examining the impact of the two variables on student work quality, the rigor 
variable appeared to have the more direct, positive impact on student performance.  The 
researchers argued that future research should stick with their methodology to more 
precisely understand how assignment rigor and relevance influence student performance 
(AIR/SRI, 2007).   
   These studies also demonstrate the difficulty of the AIW scales.  Even after the 
Foundation schools were redesigned, most math assignments were rated as showing little 
to no rigor (AIR/SRI, 2007).   This was also the case when math tasks from traditional 
schools were evaluated (AIR/SRI, 2006).  ELA assignments rated a little better, but 71% 
still fell below the substantial rigor level (AIR/SRI, 2007).   This reinforces the need for 
standards of intellectual quality to improve the way constructivist practices are 
implemented in the classroom.   
 Finally, the importance of observational data is underscored by these studies.  
While observations were conducted to note qualitative trends, the AIW instruction rubrics 
were not used in any of the analyses.  The researchers attributed lower than expected 
performance gains in math to teachers adopting more rigorous assignments without 
corresponding improvements in their instruction (AIR/SRI, 2007).  A recommendation 
from this study is to link the analysis of classroom instruction with assignments in order 
to determine how they work together to influence student work quality (AIR/SRI, 2007).  
This was also a recommendation provided by Bruce King, a researcher affiliated with 
 
64 
 
Newmann?s studies (B. King, personal communication, Nov. 21, 2007).  I incorporated 
this recommendation into the design of this dissertation study. 
 
 
 
 
 
 
 
65 
 
Table 3   
 
Gates Foundation Studies:  Authentic Student Work and Performance on Standardized Tests  
 
Study Purpose of Study Subject 
Focus 
Sample Data 
Collected 
Outcome Measures Results 
AIR/SRI 
2006 
Compare 12 new 
Foundation high 
schools in 8 regions 
across the U.S. with 
8 traditional 
schools 
English/ 
Language 
Arts (ELA)  
and 
Math   
ELA:   
113 students, 
16 teachers, 
8 schools  
 
Math:   
92 students, 
20 teachers,  
8 schools  
 
Tasks, 
Student 
Work 
Scores from multiple state 
achievement tests converted 
to common metric based on 
norms from the CAT-6 and 
SAT-9 
 
ELA work relates to 10th 
grade reading test scores 
ELA (+)* 
Math (+) 
AIR/SRI 
2007 
Gauge 
improvement for 12 
Foundation Schools 
from pilot study  
Same as 
above 
ELA: 
71 teachers,  
Math: 
68 teachers 
 
Tasks, 
Student 
Work 
10th grade Washington State 
achievement tests (WASL) 
ELA (-) 
Math (+)* 
Note.  (+)* = Significant Positive Relationship; (+) = Positive Relationship, but not significant at .05 level; (-) = No 
relationship. 
 
 
 
 
 
 
 
 
 
 
 
66 
 
Table 4   
 
Gates Foundation Studies:  Relation between Authentic Tasks and Student Work 
 
Study Purpose of Study Subject 
Focus 
Sample Data 
Collected 
Outcome 
Measure 
Results 
AIR/SRI 
2006 
Compare New 
Foundation Schools 
with Traditional 
Schools across 
country 
English/ 
Language 
Arts (ELA) 
and Math 
ELA:  
89 teachers 
 
Math:   
81 teachers 
ELA:  717 
tasks & 
associated 
student work 
 
Math:  606 
tasks & 
associated 
student work 
AIW 
rubrics 
Authentic tasks were closely 
associated with quality work in 
ELA (p. 38).  Assignment rigor 
is more closely associated with 
quality work than relevance in 
math (p. 48). 
AIR/SRI 
2007 
Gauge improvement 
for 12 Foundation 
Schools before and 
after redesign 
Same as 
Above 
 ELA:   
71  teachers 
 
Math:  
68  teachers 
ELA:  509 
tasks, 966 
student 
responses 
 
Math:  523 
tasks, 1078 
student 
responses 
AIW 
rubrics 
Strong positive correlation 
between high scoring authentic 
tasks and student work quality 
in both subjects (p. 18). 
 
67 
 
International Research  
In Australia, two states have implemented an elaborated version of the AIW 
framework to improve instruction in their schools.  James Ladwig, who worked with 
Newmann in the 1990s to help develop and test the construct, is the common link 
between American and Australian research related to authentic pedagogy. The Australian 
studies are comparable with studies in the United States since the Australian models 
include all of the original components of Newmann?s work.  In both settings (Queensland 
& New South Wales) researchers found authentic pedagogy to be rare in their schools 
(Gore, Ladwig, Lingard, & Luke, 2001; Ladwig, Smith, Gore, Amosa, & Griffiths, 2007).  
Studies in Singapore report a similar finding (Koh, Kim, & Luke, 2009; Koh et al., 2005).   
Queensland launched a three-year school reform program called the New Basics 
in 2000 in specific trial schools. Thirty-eight schools were selected from the 1,296 state 
schools in Queensland (Education Queensland, 2004, p. 23).  Trial schools included 
students from less advantaged backgrounds than their peers at non-trial schools 
(Education Queensland, 2004, p. 11).  The reform project focused primarily on 
comparing learning outcomes of trial students with those of non-trial students.   
The trial schools associated with the New Basics demonstrated a departure from 
regular schools in their documented use of rich authentic tasks and authentic instruction.  
Since the instructional program was focused on higher order outcomes, reformers did not 
anticipate it having much effect on basic skills development. The teachers in the trial 
schools administered their usual conventional tests to students.  In schools that adopted 
the New Basics, external standardized tests (years 3, 5, 7) indicated no general decline in 
literacy and numeracy as compared to the rest of the state schools (Education 
 
68 
 
Queensland, 2004).  The focus on authentic outcomes did not result in diminished returns 
on these traditional tests.   A later study analyzed numeric and literacy scores for New 
Basics schools in 2004 and 2005.  Once again, these schools showed no evidence of a 
decline in scores.  In fact, ?there is evidence that the scores of lower achieving students in 
New Basics schools are rising? (Lake Corporate Consulting, 2006, p. 1).   
Students in both types of schools also completed two different types of higher 
order assessments.  The first higher order assessment was the International Schools? 
Assessment which focused on reading and math.  The second assessment was the World 
Class Test; an interdisciplinary problem-solving assessment.  On both of these measures, 
no real difference was identified between trial students and non-trial students in higher 
order ability (Education Queensland, 2004, pp. 38,41).  The researchers viewed this as 
significant since trial students generally were more disadvantaged than their peers at non-
trial schools. 
A second Australian state enacted similar reforms.  In New South Wales, 
authentic pedagogy was incorporated into a model known as Quality Teaching.  The 
Systemic Implications of Pedagogy and Achievement in New South Wales Public 
Schools (SIPA) longitudinal study (2004-2007) provided data to document the effects of 
this program.  The schools included in SIPA offer a representative sample of students 
from varying grade levels, school settings, and socioeconomic backgrounds (Ladwig, 
Smith, Gore, Amosa, & Griffiths, 2007).  The first study based on this data was explicitly 
designed to replicate Newmann?s authentic pedagogy research in the Australian context 
(Ladwig, Smith, Gore, Amosa, & Griffiths, 2007).   
 
69 
 
In this study, Ladwig et al. evaluated instruction in grades 4 and 8.   Tasks were 
collected primarily for Math and English.  Other subjects such as Science, PDHPE 
(Health/PE), and an area similar to social studies called Human Society and Its 
Environment (HSIE) were also included in the analysis.  However, only one task was 
from HSIE at the secondary level.  These tasks (78 total) were analyzed from 26 SIPA 
schools.  Student work came from 1,374 students.  This study found a significant positive 
(p <.001) relationship between high scoring tasks and quality student work even when 
controlling for other factors that might influence achievement (prior achievement, gender, 
SES, etc.).     
A second study that utilized SIPA data took a closer look at learning outcomes for 
disadvantaged students (Amosa, Ladwig, Griffiths, & Gore, 2007).  Amosa et al. studied 
the effects of authentic instruction on indigenous students and students from low 
socioeconomic backgrounds in New South Wales.  As part of their study they collected 
95 tasks from 121 teachers in 19 primary schools and 11 secondary schools (Amosa, 
Ladwig, Griffiths, & Gore, 2007). Out of the 1,912 students in the sample, 180 were 
Aboriginal or Torres Strait Islander.  The sample was also divided according to the SES 
background of the students.   
Amosa found that as tasks became more authentic, achievement also became 
more authentic for indigenous students and non-indigenous students alike.  The work of 
indigenous students remained below non-indigenous students when aggregate 
comparisons were made for students who received low quality tasks.  The same was true 
when comparisons were made for students who received high quality tasks.  However, 
the indigenous students who received high quality tasks produced work that on average 
 
70 
 
exceeded work produced by non-indigenous students who received low quality tasks 
(Amosa, Ladwig, Griffiths, & Gore, 2007).   
The most important finding in this research was the fact that when tasks of high 
intellectual quality were given to students from both low and high SES backgrounds, the 
students from a low socioeconomic background actually performed better than students 
from a high SES background (Amosa, Ladwig, Griffiths, & Gore, 2007, p. 6).  This is the 
only AIW related study that has produced this finding.  It is possible that the students 
from lower SES backgrounds were better prepared for the tasks, but this is unknown 
since classroom observations were not a part of the study (Amosa, Ladwig, Griffiths, & 
Gore, 2007).   
The authentic intellectual work studies, taken in sum to include those in the 
United States and elsewhere, include a number of promising findings.  However, these 
findings should be viewed tentatively.  The researchers are making judgments about the 
intellectual demands of teachers based on limited sets of data.  A variety of circumstances 
could cause these judgments to be flawed.  Much rests on the ability to collect quality 
teacher data.  The cooperativeness of teachers was an issue in some of these studies 
(Wenzel, Nagaoka, Morris, Billings, & Fendt, 2002).  It is possible that some teachers 
merely gave researchers tasks to get them to go away or otherwise changed their routine 
because they were being studied.  This is not a unique problem in educational research, 
but it should still be considered when weighing the significance of these findings, 
especially when they hinge on categorizing teachers based on their AIW scores.  
 
71 
 
Table 5   
 
Summary of International Studies 
 
Study Focus Method Data Collected Results 
New Basics Research 
Report, 2004 
 
Queensland, Australia 
 
Based on data from 
Queensland School Reform 
Longitudinal Study 
(QSRLS)  
 
Compare 
achievement of 
students in 18 
trial schools with 
students in 21 
non-trial schools. 
 
Trial students 
received authentic 
instruction and 
completed 
authentic ?rich? 
tasks 
 
Grades 3,6, & 9 
Comparison of 
student work 
samples (traditional 
folios vs. rich tasks) 
 
Instruction 
analyzed based on 
scored classroom 
observations  
 
Student learning 
measured through 
external 
standardized tests 
& rich tasks  
26 traditional folios and 
26 rich tasks. 
 
256 observations over 
three years 
 
Conventional test results 
from the International 
Schools? Assessment 
(Reading & Math) & 
World Class Test (inter-
disciplinary problem-
solving) 
Student work on rich tasks 
perceived as rigorous by 
experts and community 
members 
 
No decline in general 
literacy and numeracy as 
compared to state schools 
 
No significant difference 
between trial students and 
regular students on two 
higher order assessments. 
Ladwig, Smith, Gore, 
Amosa, & Griffiths, 2007 
 
New South Wales 
 
Based on data from 
Systemic Implications of 
Pedagogy and 
Achievement (SIPA) 
longitudinal study (2004-
07) 
Replicate 
Newmann?s 
authentic 
pedagogy 
research in the 
Australian 
context 
 
Grades 4 and 8 
 
 
Hierarchical Linear 
Modeling 
 
Analysis of 
challenging tasks 
and student work 
78 tasks from 26 SIPA 
schools 
(primarily English and 
Math) 
 
Student work from 
1,374 students collected 
in 2005 
 
Significant relationship 
(p<.001) between high 
scoring tasks and quality 
student work  
 
 
 
 
 
72 
 
Table 5   
 
Summary of International Studies (Cont.) 
 
Study Focus Method Data Collected Results 
Amosa, Ladwig, 
Griffiths, & Gore, 
2007 
 
New South Wales 
Focused on 
disadvantaged 
students 
(indigenous and 
low SES students) 
Hierarchical Linear 
Modeling (HLM) 
 
Rated intellectual 
quality of tasks 
provided by 
teachers 
 
Student work 
analyzed using 
Newmann?s  task 
rubric 
 
*No classroom 
observations 
95 tasks from 121 teachers 
in 19 primary schools and 
11 secondary schools 
 
1,912 students (2,913 pieces 
of student work) 
Indigenous students who 
received high quality tasks 
produced work that on 
average exceeded work 
produced by non-indigenous 
students who received low 
quality tasks 
 
When tasks of high 
intellectual quality were 
given to students from both 
low and high SES 
backgrounds, the students 
from a low socioeconomic 
background actually 
performed better than 
students from a high SES 
background 
Koh et. al., 2005 
 
Singapore 
 
Pre-Intervention 
Study 
36 Singapore 
Schools (18 
primary, 18 
secondary) 
 
Subjects:  
English, Math, 
Science, and 
Social Studies 
Tasks collected and 
scored by 
experienced master 
teachers in 
respective subject 
areas using 
standards consistent 
with AIW 
Four high, medium, and low 
quality tasks from each 
teacher 
Tasks were generally of low 
intellectual quality with the 
exception of primary social 
studies.  Students were 
generally not afforded the 
opportunity to produce work 
that would score high on 
Newmann?s AIW scale. 
 
 
73 
 
The limitations of using statistical analysis to represent an inherently complex 
event such as classroom instruction also should be considered.  The researchers were 
diligent in their efforts to control for a variety of variables, but it is always possible that 
other factors contributed to the positive outcomes described in these studies.  These could 
include teacher variables (personality, management style, etc.) or some of the variables 
included as part of the ?productive pedagogies? model Ladwig worked with in 
Queensland. 
Adding to the Research base  
In reviewing the major AIW studies, the need for replication in secondary social 
studies classrooms is evident.  Few studies focused on the impact of authentic pedagogy 
on lower order learning outcomes.  The studies that did address this relationship were not 
interested in social studies content.  Newmann?s 2001 study provided the best evidence 
that authentic instruction helps students on tests of basic knowledge, but it was focused 
on math and writing.  Several other studies provided similar results, but included little 
data on the nature of the conventional assessments (Lee, Smith, & Croninger, 1997; 
AIR/SRI, 2006/2007).  It is difficult to determine whether they were as heavily weighted 
towards lower order knowledge as some history graduation exams.  The present study is 
therefore important because it seeks to determine the impact of authentic pedagogy on 
learning using an assessment that is almost entirely fact based and has high-stakes 
attached to it.  The addition of a higher order assessment to the study should enable me to 
tap a broader range of learning outcomes that may be enhanced through authentic 
pedagogy. 
 
74 
 
The AIW studies to date have also included a relatively limited sample of high 
school social studies teachers (i.e. 6 for Newmann?s 1996 study).  Five studies included 
social studies content, and most of these involved multiple subjects at different grade 
levels.  More studies are needed to gain a better appreciation of the intellectual demands 
placed specifically on students in secondary history classrooms.    
My study has the potential to add to the literature on authentic intellectual work 
because it incorporates some of the most important design modifications recommended 
from earlier research.  I am able to more rigorously measure authentic pedagogy because 
the tasks supplied by participating teachers are linked to instruction.  Less guess work is 
involved in determining the teacher?s intent.  In addition, I?ve built off of Newmann and 
Avery?s work by providing students with a common higher order essay.  This allows all 
students in the study to demonstrate their ability on authentic tasks which should provide 
a better indication of the role authentic instruction plays in promoting higher order 
outcomes.  Since Avery?s study involved a small sample, it should be replicated in 
different contexts with more social studies teachers.  I also believe it is important to look 
at learning outcomes that span a semester.  Avery?s findings focused on one unit and 
suggested that authentic instruction positively impacts student performance on a higher 
order essay and a 10 item test.  However, authentic instruction usually takes more time 
thus limiting the ability of teachers to cover all the necessary content on state developed 
standardized tests.  A useful study for practitioners would seek to determine if students 
are able to excel on higher order tasks while also gaining the necessary content to pass a 
graduation exam of basic skills. 
 
75 
 
Finally, it is important to determine what works for different groups of students 
on the exam that arguably most determines their future academic success.  Graduation 
exams in Alabama and elsewhere are the key accountability mechanism used by 
education stakeholders to gauge achievement.  Students who pass the graduation exam 
early often have opportunities for more advanced study.  Conversely, those who fail may 
be placed in remedial courses where they are more likely to receive drill and practice 
oriented instruction (Kornhaber, 2004; McNeil & Valenzuela, 2001; Oakes, 2005).   
Two important factors related to equity need further study.  First, who has access 
to authentic instruction and to what extent does high stakes testing influence its 
distribution?  Newmann?s study (1996) indicated that authentic instruction is equitably 
distributed, but this was in a best case scenario of restructured schools.  Secondly, most 
prior studies indicate that all students benefit from authentic pedagogy.  When dealing 
with high stakes social studies exams of lower order content, does this finding continue to 
be true?  Do certain students need direct instruction to make up their deficit in content 
knowledge (Delpit, 1995)?  These are important questions which this study attempts to 
investigate.   
 The authentic pedagogy model challenges education stakeholders to create better 
accountability mechanisms that have meaning and relevance in the real world.  As argued 
by Wiggins, authentic assessments should help teachers to ?improve performance, not 
just monitor it? and prepare students for the types of intellectual challenges they are 
likely to face as adults (Wiggins, 1993b, p. 5).  Adoption of this model requires a 
commitment to establishing conditions in schools that enable educators to work 
 
76 
 
collaboratively to translate the AIW standards into effective classroom practice (Avery & 
Palmer, 1999; Stewart & Brendefur, 2005).      
Research by Stiggins (1992) and others suggests that teachers need a good deal of 
support to develop assessment literacy. Teachers would need significant training in order 
to create and evaluate authentic tasks.  The demands and challenges associated with 
implementing constructivist teaching are also well documented (Onosko, 1991; Rossi, 
1995; Saye & Brush, 2002).  The teaching force would likely face a large learning curve 
in adopting reforms based on the authentic pedagogy model.  A change in education 
policy in America would require a major investment in professional development 
resources.  This study should provide policy makers with a better basis for deciding 
whether to support this investment.   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77 
 
CHAPTER THREE:  METHODOLOGY 
 
 This study investigated the manner in which teaching influenced student 
performance, particularly on the standardized high-stakes social studies tests deemed 
most important by many policy makers.  Instruction was analyzed in terms of its 
authenticity.  Newmann?s authentic pedagogy model provided a way to measure the 
extent to which instruction engaged students in activities that required construction of 
knowledge to solve meaningful problems that have value beyond school.  Prior studies in 
a variety of subject areas indicate that students learn more when their teachers routinely 
provide authentic intellectual challenges. This research was an effort to determine the 
effect of authentic pedagogy in history classes.   
My study used rubrics developed by Newmann and his associates that measure 
both instruction and assigned tasks on a numerical scale that runs from 7 to 30.  Scores on 
the low end generally represented more teacher-centered, didactic classrooms while 
higher scores were associated with inquiry oriented classrooms that emphasized higher-
order thinking, deep knowledge, substantive communication, and connectedness to the 
real world (Newmann, King, & Carmichael, 2007; Newmann, Secada, & Wehlage, 
1995).  I viewed this scale as a continuum reflecting the extent to which teachers 
provided intellectually challenging instruction.  The teacher data, when paired with 
student scores on a lower and higher order assessment, provided the basis for determining 
 
78 
 
the impact authentic pedagogy on student performance.  Figure 1 depicts this 
relationship. 
 
 
Figure 1.  Process for determining Authentic Pedagogy Scores. 
 
 
 In order to test this association, I initiated a study with a school system in 
Alabama beginning in January, 2008.  I collected data from the 9th grade history teachers 
at a junior high and the 10th grade history teachers at a high school.  I was able to recruit 
all of the social studies teachers initially assigned to these grade levels (N=8).  In addition 
to teacher data, a variety of student data was collected by the school system.  The student 
data specifically target tenth graders who took social studies during the 2007/08 and 
2008/09 school years.  The study had four main purposes: 1) to determine the extent to 
which social studies teachers at the study schools utilize authentic pedagogy 2)  to 
develop a clearer understanding of how exposure to authentic pedagogy in coursework 
influences the ability of all students to perform at high levels on assessments that require 
lower and higher order knowledge  3) to determine if experiences in authentic pedagogy 
classrooms for multiple courses result in improved performance when compared to 
 
79 
 
students who have such experiences in only one course and 4) to determine the impact of 
authentic pedagogy on students from different socio-economic and ethnic backgrounds.   
These purposes were translated into five research questions which are stated on page 119. 
 This chapter describes how the study addressed each of these purposes.  The first 
section is a description and justification of the study design.  This is followed by an 
overview of the study setting and participants.  Once the context of the study is 
established, I transition into an analysis of the various research instruments and a phase 
by phase description of the data collection process.  This sets up a concluding discussion 
of the data analysis procedures and study limitations.  
Study Design 
 This study was a mixed methods investigation of existing instruction at the study 
schools (Feilzer, 2010; Greene, 2008; Johnson, Onwuegbuzie, & Turner, 2007; Patton, 
1987).  Quantitative data alone (i.e. standardized test scores) does not tell us much about 
the types of instructional experiences that are helpful in producing desired learning 
outcomes.  Descriptive information must also be included in the study design to 
determine what works for different types of students on tests like the Alabama High 
School Graduation Exam (AHSGE).  A mixed-methods study provided greater fidelity in 
capturing the broad range of learning that takes place in a social studies classroom and 
was therefore a better approach for measuring the overall quality of instruction.   
 The quantitative dimension of the study required the use of statistical measures to 
correlate student performance data with the authentic pedagogy score of students? 9th and 
10th grade social studies teachers.  Instead of using an experimental ?treatment?, teachers 
(and, by extension, their classrooms) were differentiated based on pedagogy through their 
 
80 
 
placement on the authentic pedagogy scale.  In analyzing the data, I sought to determine 
whether higher teacher scores on the authentic pedagogy continuum translated into 
statistically significant student achievement gains when other factors that might influence 
achievement were controlled.    
 The qualitative data, derived primarily from interviews (see Appendix A) and 
field notes, were converted to a numerical scale using the AIW rubric for instruction.  I 
also conducted a content analysis of the tasks provided by teachers and converted this 
data in a similar fashion.  This data was then subjected to inferential statistical analysis to 
determine achievement outcomes.  The field notes provided a record of the features that 
distinguished higher scoring classes on the authentic pedagogy scale from lower scoring 
classrooms.  The analysis of data from the field notes is described in chapter four.      
Project setting and description of participants 
 Central High School and Central Junior High School (pseudonyms) were selected 
from a school system in Alabama to participate in this study.  The selection of these 
schools represented purposeful sampling.  I chose these schools primarily because 
previous work in these schools gave me confidence that there would be some teachers 
who might be expected to score relatively high on the AIW rubrics.  These schools were 
also easier to visit on a routine basis than alternative schools given a limited budget.  
Finally, the school system was selected because it was willing to provide important 
student achievement data that probably would not have been as easily available in other 
areas of the state.   
 The study schools are situated in a city of approximately 43,000 people.  The area 
constitutes the most rapidly growing area of the state.  Even though many people are 
 
81 
 
attracted to the area, poverty is still a problem.  Fourteen percent of the families with 
children under 18 in the community live under the poverty line (U.S. Census Bureau, 
2000a).  The community is 78% white, 17% African American, and 3% Asian (U.S. 
Census Bureau, 2000b).  These statistics do not include international students that live in 
the city while attending a university that is within commuting distance.  They also do not 
accurately reflect the number of Korean families moving to the area as part of the 
growing automobile manufacturing industry.  A recent accreditation report, conducted by 
the school system, noted that 42 different languages are spoken in the homes of students 
from the district. Overall, the area offers a relatively small town atmosphere with the 
economic and social benefits you might associate with a bigger city. 
 The high school that took part in this study has been recognized as among the best 
in the country by Newsweek magazine (Kantrowitz & Wingert, 2006).  This designation 
was based on a ratio: the number of students who take Advanced Placement or 
International Baccalaureate exams divided by the number of seniors who graduate.  The 
high school?s enrollment in 2007 was 1,156 students while the junior high had 908 
students (Central District Accreditation Guided Self Study, 2007).  Despite being 
relatively large schools, the students at both schools enjoy some advantages that might be 
associated with smaller school settings.  The system employs enough teachers to maintain 
small class sizes (average - 17 CHS, 19 CJHS).  Teacher salaries rank among the top in 
the state enabling the schools to attract top applicants each year.  The high school offers a 
number of challenging programs such as the International Baccalaureate Program, 
Advanced Placement courses, and dual enrollment.  The system drop-out rate is 
substantially less than the state average.  The vast majority of the students graduate and 
 
82 
 
pursue some form of higher education.  The graduating class in recent years included 
numerous AP scholars, National Merit awardees, and students with GPAs over 4.0.   All 
schools in the system have met adequate yearly progress standards for the past three 
years. 
 The schools, while atypical in some regards, were still a good choice for this 
study.  It is true that students at Central High School typically perform above the state 
average on the social studies graduation exam (see Table 6).  However, the scores at 
Central High follow the same basic trend found in other schools across the state.  In 2008 
and 2009, Central High students in all eligible grades (10-12) scored lower on the social 
studies graduation exam than any other subject (Alabama Department of Education, 
2009a).  Since Central?s students are struggling with the same exam as students in other 
schools, perhaps some useful generalizations can be made from the results of this study. 
 
Table 6   
 
Comparison of Tenth Grade Graduation Exam Passage Rates 
  
 2008 Passage Rate 2009 Passage Rate 
Central High 67%  79%  
State Average 52%  62%  
Note.  Data derived from Alabama State Department of Education Accountability Reports  
for the 2008 and 2009 school years. 
 
 
  Teachers.  As mentioned previously, social studies teachers were recruited at the 
ninth and tenth grade levels for this study.  Tenth grade teachers were selected to 
participate because this is the first year when students are eligible to take the Alabama 
High School Graduation Exam.  The eleventh grade was not a viable option because 
many students in this system pass the exam on their first attempt.  More student 
achievement data was available using this sample as opposed to other alternatives. I 
 
83 
 
included the ninth grade teachers to determine the nature of the social studies instruction 
students received while in junior high.  Research suggests that multiple years of inquiry-
based instruction may have a larger impact on student performance than more limited 
exposure (Klentschy, Garrison, & Amaral, 2001).  The 9th/10th grade design allowed me 
to test this finding in history classes.  I was able to examine the impact of instruction over 
the course of multiple years on the performance of students on the Alabama High School 
Graduation Exam.  
 Once I narrowed the teacher sample to the 9th and 10th grade, I recruited all of the 
social studies teachers in order to maximize the potential for capturing the range of 
authentic pedagogy at the study schools (see Appendix B).  As the study progressed, 
some teachers retired or had their teaching responsibilities adjusted by the administration.  
I did not add any new teachers to the study after the initial recruitment period.  A 
descriptive summary of the teachers involved in the study is provided in Table 7. 
The teachers proved to be an interesting sample due to the diverse range of 
courses they taught.  The graduation exam focuses exclusively on U.S. history content.  
However, some teachers in this study taught World History (primarily 9th grade teachers) 
and two of the high school teachers taught Advanced Placement (AP) European History.  
In addition, the AP class sections and some U.S. history sections ran for an entire school 
year, while most other classes were a semester in duration.   Complicating things further, 
some students took 9th grade World History again if they didn?t pass it the previous year 
or perhaps if they transferred from another area and needed the credit.  Data from this 
study enabled me to investigate whether access to authentic pedagogy was influenced by 
 
84 
 
the courses students? took and whether certain courses/course designs were more 
effective in promoting student achievement. 
 
 
Table 7   
 
Descriptive Statistics for Teacher Sample   
 
 Junior High (4) High School (4) Total (8) 
Percent Male 50%  100% 75% 
Percent Caucasian 100% 100% 100% 
Percent with advanced degree1 75% 50% 63% 
Age 
   26 to 35:   
   36 to 45:   
   46 to 55: 
 
25% 
25% 
50% 
 
50% 
50% 
 
37.5% 
37.5% 
25% 
Total Years Teaching 
   3 to 5: 
   6 to 10: 
   11 to 15:         
   More than 16: 
 
25% 
 
25% 
50% 
 
 
25% 
75% 
 
12.5% 
12.5% 
50% 
25% 
Note:  Advanced degrees include master?s degree and Ph.D.       
 
Students.  Students? data was gathered to determine individual learning outcomes 
associated with authentic pedagogy.  It provided a window into whether authentic 
pedagogy benefited certain students more than others.  Student data was also used as the 
basis for aggregating class level effects for statistical analysis.  Data for all tenth graders 
at Central High School who took social studies classes during the Fall 07, Spring 08, Fall 
08, and Spring 09 semesters were included in this study.  This included both regular 
classes and AP courses.  The AP European history course was open to any student willing 
to take the challenge although students who took this course were generally not given the 
option to drop if it proved to be too difficult. 
 
85 
 
An initial concern in trying to obtain a student sample for this study was the 
potential that the students most needed (disadvantaged, poor academic achievers) might 
opt out of the study.  Since student performance data and demographic information was 
already routinely collected for analysis by the school system, the school system organized 
the data and coded it in order to maintain student anonymity when the dataset was sent to 
me.  This strategy allowed me to include the data of all grade level students as part of the 
study (assuming the pertinent data was provided for each student).  Organizing the study 
in this fashion maximized the potential relevance of results by including the widest 
possible range of students.   This ensured a comparison could be made between the less 
advantaged students at the study schools and similar groups of students across the state.  
Table 8 provides a breakdown of the number of students for whom I collected data by 
semester and course.  Students who had multiple social studies teachers during the 10th 
grade were excluded from the sample.  This included nineteen students in 2008 and 
eleven students in 2009.    
Table 8 
 
Student Participation by Course 
 
2008 2009 
Advanced Placement 
European History 99 (28.2%) 
Advanced Placement 
European History 104 (22.9%) 
U.S. History/Geography 10 
(Sem.) 220 (62.7%) 
U.S. History/Geography 10 
(Sem.) 179 (39.4%) 
U.S. History/Geography 10 
Alt. (Year) 21 (6%) 
U.S. History/Geography 10 
Alt. (Year) 155 (34.1%) 
U.S. History/Geography 10 
Co-Teach (Inclusion) 11 (3.1%) 
U.S. History/Geography 10 
Co-Teach (Inclusion) 16 (3.5%) 
Total 351 Total 454 
 
 
 
86 
 
A sample must include a certain number of students to have enough statistical 
power for the regression analysis.  Statistical power analysis was completed as part of the 
planning process to determine whether a proper relationship existed between the sample 
size, significance criterion, population effect size, and power to prevent type I and II 
errors (Cohen, 1992).  In my study, the desired power was .80 with a significance 
criterion of .05.  The hypothesized effect size was medium according to the ES index for 
regression analysis.  Determining the sample size thus required taking the number of 
predictor variables and multiplying by 10 (signifying ten students needed per independent 
variable) (Stevens, 2002).  My sample of 805 students easily supports the number of 
independent variables needed for the study. 
Instrumentation 
 The instruments in this study served three purposes:  to classify the level of 
authentic pedagogy used by the teachers, to determine the prior academic ability of the 
students, and to measure academic achievement on lower and higher order tasks.  The 
only instruments specifically developed for this study were the higher order essay 
assessments.  The other instruments were either state assessments or rubrics created by 
Fred Newmann and his associates.   
Assessing Authentic Pedagogy.  This study replicated Newmann?s previous 
research on authentic pedagogy.  As a result, I used essentially the same AIW rubrics (see 
Appendices C-F) to allow comparisons to be made across studies.  The AIW rubrics 
incorporated a complex set of research based criteria into a series of different instruments 
for instruction, tasks, and student work making them a) tightly focused on the AIW 
construct and b) more efficient than alternative instruments that measure single 
 
87 
 
dimensions (i.e. higher order thinking).  The fact that they have been field tested in social 
studies classes with students at the ninth and tenth grade levels made them ideal for use in 
this study.  The AIW rubrics are valid instruments based on their significant construct, 
face, content, and predictive validity.   
Construct validity is concerned with how well a researcher operationalizes 
theoretical ideas.  The AIW framework, its associated rubrics, and other theories that 
form the basis of this construct are explained in detail in a number of articles and studies 
(Berlak et al., 1992; Newmann & Archbald, 1988; Newmann & Associates, 1996; 
Newmann, Secada, & Wehlage, 1995; Resnick, 1987; Wiggins, 1993a). The rubrics have 
been field tested extensively for over 12 years.  As the rubrics have been applied in 
studies associated with a diverse range of academic subjects, they have been steadily 
revised and sharpened through dialogue with disciplinary subject matter experts and 
education professionals.  In the process, certain stand alone dimensions of authentic 
intellectual work have been combined or removed altogether to help researchers make 
clearer distinctions between the standards being measured as part of the AIW framework.  
For example, the original task rubric included standards for organization of information 
and consideration of alternatives.  Later versions of the rubric incorporated the language 
of these standards into a single standard called construction of knowledge.  This 
streamlining and clarification of language over time enhanced construct validity by 
enabling researchers to more precisely describe tasks and instruction that meet the criteria 
of being authentic. 
Face validity involves making a determination of whether an instrument appears 
reasonable ?on its face.?  Typically face validity is determined by experts familiar with 
 
88 
 
the constructs being measured.  If their widespread use is any indication, the AIW rubrics 
meet the approval of a diverse group of experts.  The rubrics have been used as 
professional development tools for teachers in school systems in Minnesota (Avery, 
Kouneski, & Odendahl, 2001; Avery & Palmer, 1999).  The Gates Foundation used the 
rubrics to evaluate the performance of reforming high schools (AIR/SRI, 2006, 2007).  
The Iowa State Department of Education, Michigan State Department of Education, 
states in Australia (Queensland, New South Wales), and schools in Singapore have 
adopted the AIW standards (or similar standards) and utilize versions of the rubrics.  This 
suggests considerable face validity.  I reinforce the ?reasonableness? factor of the rubrics 
by providing examples of tasks and lessons that scored at different levels on the scales 
contained within the rubrics in the next chapter.  These examples supplement a wide 
range of examples already available through the various studies I have cited.  This 
documentation enables the reader to judge whether the rubrics are being applied in a valid 
manner.   
Content validity is maximized when a researcher ensures that all the relevant 
content domains that are incorporated as part of a construct are clearly defined (Trochim, 
2006).  Strong content validity can mitigate some of the subjectivity associated with 
rubrics.  When applied to authentic intellectual work, content validity means clearly 
defining what is meant by such domains as higher order thinking and substantive 
communication.  Newmann?s research since the 1960s goes a long way towards meeting 
this requirement.  A brief review of some of the critical works includes research on 
higher order thinking (Newmann, 1991a), substantive communication (Nystrand & 
 
89 
 
Gamoran, 1990), and student engagement (Newmann, Wehlage, & Lamborn, 1992).  In 
applying the rubrics, I referred to these studies to clarify points of confusion.   
 When Newmann field tested these rubrics he consulted subject matter experts (in 
writing and math for example) to enhance content validity.  I ensured content validity in a 
similar manner.  This study served as the pilot for a larger social studies inquiry project 
involving research sites across the nation.  As part of this larger project, I worked with 
experienced social studies researchers and historians to ?norm? the use of the rubrics and 
establish how to legitimately score social studies instruction based on the degrees of 
higher order thinking, depth, conversation, and other elements represented in authentic 
intellectual work.  The goal of this process was to ground scoring interpretations in the 
disciplinary knowledge of history and social studies.   
Another way to determine the validity of a construct is to determine if its presence 
leads to likely outcomes.  In the case of authentic pedagogy, Newmann and others 
hypothesized that higher levels of authentic pedagogy would result in high quality student 
work as measured by the student work AIW rubric.  Several studies have confirmed this 
relationship (Avery, 1999; King, Schroeder, & Chawszczewski, 2001; Newmann, Lopez, 
& Bryk, 1998; Newmann, Marks, & Gamoran, 1996).  
Given the strong validity of the rubrics, my main task in this study was to ensure 
that the use of the rubrics conformed to their use in earlier research.  My affiliation with 
the larger national study (Social Studies Inquiry Research Collaborative ? SSIRC) 
focused on the same basic research questions enabled me to attend an authentic pedagogy 
workshop by Dr. Bruce King.  Dr. King is one of the original designers of the AIW 
rubrics.  At this workshop, he shared his knowledge of how to score tasks and 
 
90 
 
observations.  I feel confident that my use of the rubrics reflects the most current thinking 
on how to measure authentic pedagogy.   
Reliability.  Another important issue to consider when using the rubrics is their 
reliability.  In an effort to enhance the reliability of their use in this study, 22% of the 
lessons I observed were also rated by my advisor, John Saye.  Dr. Saye served as the 
project director for the SSIRC.  The lessons that he observed with me are listed in Table 
9.  As the table indicates, five of the twenty-three lessons were observed by a second 
rater.  A slightly higher percentage of tasks were also evaluated by a second researcher 
from the SSIRC project.   
The degree of inter-rater reliability for the observations and tasks is depicted in 
the following ways:  1) by the extent to which scorers had exact agreement on each of the 
standards and 2) the extent to which agreement was off by 1 point.  In every instance, the 
raters were able to achieve agreement after discussion.  Prior research has established a 
standard of greater than 65% exact agreement and agreement within 1 point to exceed 
90% (Newmann & Associates, 1996). Table 10 shows the degree of inter-rater reliability 
for this study.  The lower degree of inter-rater agreement for the substantive conversation 
and deep knowledge standards was often due to the raters intentionally sampling different 
groups during group activities.  This made it easier to accurately reconstruct classroom 
events in field notes.  However, it also reduced inter-rater reliability when one rater 
witnessed an interaction the other missed.   
 
 
 
 
91 
 
Table 9   
 
Summary of Inter-Rater Reliability Observations 
 
Date Lesson 
Apr. 2, 2008 Industrial Revolution 
Apr. 28, 2008 Reformers lesson 
Sept. 22, 2008 An Absolute Monarchy of your Own 
Sept. 29, 2008 Declaration of Independence Activity 
Classroom Video British Imperialism in India 
 
Table 10   
 
Inter-Rater Agreement on Instruction and Assessment Tasks 
 
 Exact 
Agreement (%) 
Exact or Off 
by 1 (%) 
Instruction (N= 5 lessons/22% of total) 
Standard 1:  Higher Order Thinking 
Standard 2:  Deep Knowledge 
Standard 3:  Substantive Conversation 
Standard 4:  Connectedness to the Real World 
80 
60 
40 
80 
100 
100 
100 
100 
Tasks (N=  6 tasks/25% of total) 
Standard 1:  Construction of Knowledge 
Standard 2:  Elaborated Communication 
Standard 3:  Connection to Students? Lives 
50 
100 
100 
100 
100 
100 
 
 
 
Characteristics of the Task Rubric.  The structure of the task rubric can be seen 
in Appendix E.  The task rubric used in earlier AIW studies included seven standards 
organized into three broad categories (Newmann & Associates, 1996; Newmann, Secada, 
& Wehlage, 1995).  Later studies revised the rubrics (Newmann, Bryk, & Nagaoka, 2001, 
see note 10) based on input from experts in specific disciplines. The scoring rubric used 
in this study included three standards:  Construction of Knowledge, Elaborated 
Communication, and Connection to Students? Lives.  Each standard has three levels with 
the exception of elaborated communication which has four.  Tasks which score high in 
 
92 
 
the construction of knowledge category require students to ?interpret, analyze, 
synthesize, or evaluate information, rather than merely to reproduce information? 
(Newmann, King, & Carmichael, 2007; Schroeder, Braden, & King, 2001, p. 31).  This 
might involve defending a position on a particular issue or developing a solution to a 
problem.   The elaborated communication standard is a measure of the extent to which 
students must explain their understanding of the social studies concepts embedded in any 
particular task.   This standard can be met in a variety of ways to include writing, oral 
presentations, and projects.  The final category on the task rubric measures the extent to 
which the assignment has a connection to students? lives.  High scoring tasks on this 
standard must do two things:  engage students in a problem or issue that has relevance in 
the real world and provide students with an opportunity to relate to it personally.  All of 
these standards are important and a task only achieves high levels of authenticity by 
scoring well in each area.   
Characteristics of the Instruction Rubric.   The instruction rubric includes four 
standards:  Higher Order Thinking Processes (HOTS), Deep Knowledge, Substantive 
Communication, and Connectedness to the Real World.  Each of the scales on the rubric 
has five levels.  As in the task rubric, lessons need to score well on each standard to 
achieve a high level of authenticity.  The first standard, higher order thinking, is 
demonstrated when students are asked to actively manipulate information to solve 
problems that cannot be solved by simply recalling previously learned material 
(Newmann, 1991a; Newmann, King, & Carmichael, 2007).  This involves engaging 
students in such processes as analysis, synthesis, and evaluation.  The second standard is 
deep knowledge.  A lesson features deep knowledge when sustained attention is given to 
 
93 
 
a significant disciplinary topic and students are able to demonstrate a thorough and 
complex understanding of the problem or topic under consideration.  The substantive 
conversation dimension of the AIW framework is a scale that ?measures the extent of 
talking to learn and to understand in the classroom? (RISER, 2000, p. 6).  In evaluating 
this standard, I looked for discussions that featured sustained dialogue focused on 
disciplinary topics and concepts.  Ideally, the dialogue included higher order thinking, the 
sharing of ideas among participants, and development of coherent understandings. The 
final standard is connectedness to the real world.  In order to score high in this category, 
teachers must successfully establish the relevance of a lesson to life outside of school.  In 
addition, students must show a personal interest in the topic and attempt to use their 
knowledge to influence a larger audience other than their classmates (Newmann, King, & 
Carmichael, 2007).           
Applying & Scoring the Rubrics.  In applying the rubrics in this study, I first 
asked teachers to submit three tasks that they believed demonstrated their students 
thinking at a high level about the subject matter of their course (see Appendix G).  I 
referred to these tasks as their most ?challenging? in conversations with teachers when 
asked for clarification about what to submit.  I established ?curricular validity? by 
collecting tasks that were designed and/or used by the social studies teachers at the study 
schools (Ladwig, Smith, Gore, Amosa, & Griffiths, 2007, p. 4).  The tasks could be 
created by someone else (i.e. History Alive) or even be the same as another teacher as 
long as they represented the teacher?s perception of an assignment that required students 
to demonstrate thinking at a high level.   
 
94 
 
Once tasks were submitted, I interviewed or emailed teachers to gain a better 
understanding of the broader context of how the tasks were used as part of instruction.  
The intent was to try to connect the three observations directly to the task or to the 
instruction immediately preceding the tasks.  Another goal was to try to set up 
observations that spanned the course of a semester to provide teachers with a better 
opportunity to demonstrate the standards associated with authentic pedagogy.  If a 
teacher taught both advanced placement courses and general level courses, I tried to 
observe at least one class of each type.  If a teacher had three general level classes and 
one advanced placement course, then my observations were weighted more heavily 
towards the general level classes.  Finally, I also took into consideration that a teacher 
might teach the same lesson differently to different class periods or blocks.  In order to 
address this concern, I attempted to observe different blocks for each teacher.  I also 
asked teachers about this issue during the interview.   
Most of the important guidelines for scoring the tasks and instruction are provided 
on the rubrics themselves.  However, a few points should be emphasized.  First of all, 
scoring proceeds from the bottom category and moves up the scale for each standard.  In 
scoring a task or an observation, the next level in a category is assigned only when 
sufficient evidence is provided to indicate that all of the requirements for the next level 
are met.  When in doubt, the procedure is to score down.  Tasks are scored based on the 
materials provided by the teacher.  The interview data and observations yielded 
additional insights into the dominant expectations a teacher had for any particular task.  It 
was for this reason that I tried to score the tasks after the observations were complete.  
 
95 
 
The instruction score is based entirely on what is observed during the course of a single 
class period.   
The process for scoring instruction is fairly complicated when recording 
equipment or multiple observers are not available.  My field notes usually included as 
much dialogue as I was able to capture, my comments or thoughts during the lecture, a 
class diagram with symbols to represent each student, and a marking system to try to 
record patterns of conversation.  Once I observed a lesson, Iat down as soon as possible 
to complete my field notes while the information was fresh on my mind.  I would go 
through my notes and highlight or mark areas that represented higher order thinking, 
students demonstrating depth of knowledge, areas of substantive conversation, and any 
attempts by the teacher to connect the lesson to the real world.  The final step was to 
assign scores for each standard along with a written justification for each scoring 
decision. 
The mathematical process for scoring requires the development of a composite 
authentic pedagogy score for each teacher.  The scoring of teacher tasks is relatively easy.  
The task rubric is broken down into three components:  Construction of Knowledge, 
Elaborated Communication, and Connection to Student?s Lives.  Each category is based 
on a three point scale except for elaborated communication which extends to a 4.   The 
scores on the three criteria are added together to achieve the authentic pedagogy score for 
a particular task.  Possible scores therefore range from 3 to 10.  The scores on each of the 
three most challenging tasks are averaged to obtain the overall score which is carried 
forward to the equation used to calculate the final authentic pedagogy score.   
 
96 
 
 The observation rubric is a little different from the task rubric.  It has four 
components:  Higher Order Thinking, Depth of Knowledge, Substantive Conversation, 
and Connectedness to the Real World.  Each scale has five levels.  Scores for each 
category are added together to form the overall score for each observation.  Scores range 
from 4 to 20.  Once the scores on the three observations are determined, they are 
averaged to obtain the overall observation score. 
The final authentic pedagogy score is calculated by adding the average observation 
score with the average task score.  Once this score is determined, a tenth grader?s scores 
on the designated achievement measures can be compared with the intellectual rigor of 
the pedagogy he/she experienced in social studies during the ninth and tenth grades 
(Newmann, Marks, & Gamoran, 1996, p. 16).  Additional sub-analyses were conducted 
on the final authentic pedagogy score for each teacher to determine whether task scores 
or instruction scores had a greater impact on the dependent variables.      
 
Determining student prior knowledge.  Data from four different sources were 
collected to control for student prior knowledge and abilities that have the potential to 
influence outcomes on the Alabama High School Graduation Exam and the higher order 
essay assessment.  These measures included student end of semester grades in social 
studies for their eighth, ninth, and tenth grade years.  It also included several reading 
achievement measures because Newmann and others believe that strong readers have an 
advantage on standardized tests regardless of the content area being assessed (Newmann, 
Bryk, & Nagaoka, 2001).  The first reading prior achievement measure was derived from 
the Alabama Reading and Mathematics Test (ARMT).  This test is administered in the 
 
97 
 
eighth grade and provides scaled scores that range from level 1 to a level 4.  The second 
reading prior achievement measure was the Stanford 10.  It was also administered in the 
eighth grade.  Finally, the Alabama High School Graduation Exam includes a reading 
component that students take in the tenth grade during the same week that they take the 
social studies exam.  Ultimately, due to the high number of predictor variables and small 
teacher sample, I only incorporated prior grades into my statistical analyses.  This 
measure was the best determinant of student prior knowledge in social studies.  
 
Assessing Student Performance.  Two instruments were used to measure student 
achievement.  The first was the Alabama High School Graduation Exam (AHSGE) which 
best captures student retention of lower order factual content knowledge and basic social 
studies skills.  The second instrument was a researcher developed editorial writing 
assessment used to measure higher order thinking objectives.  The graduation exam was 
an appropriate instrument for this study because it is the high-stakes social studies test 
that all high school students must take in the state of Alabama. The test is based on the 
Alabama Course of Study for Social Studies (Morton, 2004).  According to a personal 
email communication from Dr. Gloria Turner who served as the Director of Assessment 
for the Alabama State Department of Education, the content standards are ?considered to 
be minimum, required, fundamental, and specific? (G. Turner, personal communication, 
February 11, 2008).  No actual versions of the test have been released to the public 
making it difficult to conduct a good content analysis.  The state has released eighty-four 
sample item specifications that let students know the general format of the test as well as 
 
98 
 
the relative weight given to each objective in the social studies curriculum (Richardson, 
2000).     
 I used the item specifications bulletin to analyze the objectives, questioning 
format, and eligible content to see how they relate to the authentic intellectual work 
criteria.  This analysis, described in chapter 5, and Dr. Turner?s correspondence make me 
confident in describing the test as a measure of lower order content knowledge.  Dr. 
Tommy Bice, the Assistant Superintendent of Education in Alabama also confirmed this 
in a speech he gave at the Alabama Social Studies Conference in October, 2008.  It is 
therefore the most appropriate instrument to use for research question two. 
 The test itself covers seven U.S. history standards encompassing America?s 
exploration to World War II.  The 10th grade curriculum in Alabama only consists of U.S. 
history through 1877 and therefore does not cover all of the material on the test.  Students 
would have last experienced post-1900 U.S. history content during the sixth grade.  The 
test consists of 100 multiple choice questions each worth 1 point.  The results, once 
scaled, range from 200 to 800 points.  The mean score on the test is 500 with a standard 
deviation of 100 (S. Dubose, personal communication, 2006).  Information on the 
reliability and validity of the test is not readily available to the public.  
 The high school graduation exam for tenth graders is meant to be a practice run to 
allow students to become familiar with the test.  However, schools obviously want as 
many students as possible to pass to eliminate the ?train wreck? effect that can happen in 
the later grades when many students still need to pass the exam.  The main test 
administration takes place during an entire week of the Spring semester.  Students 
typically take a graduation exam each day with the testing period lasting all morning.   
 
99 
 
Students usually take the tests in an assigned classroom accompanied by at least two test 
proctors.  The state has strict testing procedures in place to prevent cheating and 
encourage standardization in how the test is administered.  The week offers students a 
strong break from the routine.   This can impact student motivation, especially when the 
social studies exam is given later in the week.  Also, some tenth graders may lack the 
sense of urgency felt by the seniors to put forth their best effort.       
     The higher order thinking assessment designed for this study provided an 
additional measure of student learning with a focus on several goals that are largely 
omitted from the graduation exam.  The two higher order instruments (one U.S. History, 
one A.P. European History) were meant to determine the extent to which students were 
able to analyze arguments made in source documents, weigh competing arguments to 
arrive at a decision, use historical evidence and prior knowledge to construct a persuasive 
argument, and apply historical knowledge and critical reasoning to contemporary issues.   
  The first instrument that I developed was administered to the regular 10th grade 
U.S. history classes.  I asked the social studies teachers to select a common topic for the 
exam; preferably one later in the semester to maximize the potential benefits of 
instruction on students? performance. The teachers chose Manifest Destiny.  This unit 
was the final unit in the semester before exam review for many of the students.  I decided 
to have students consider the concept of Manifest Destiny through an analysis of the 
Mexican-American War.  In designing this instrument, I kept several things in mind.  I 
did not want to penalize students or restrict their ability to demonstrate higher order  
reasoning simply because their teacher didn?t spend as much time on the Mexican-
American war.  As a result, I provided students with two resources to assist their 
 
100 
 
thinking: a timeline of critical events associated with the war and excerpts from two 
primary source documents.  Students with greater prior knowledge could probably do 
more with these resources.  However, I anticipated that a student with a basic 
understanding of Manifest Destiny, accustomed to classroom experiences that required 
critical analysis and higher level thinking, would be able to use the documents to 
comprehend the potential implications of this ideology and frame an argument that scored 
well. 
The Manifest Destiny instrument is included in Appendices H and I.  The 
instrument included two parts.  Part I was a structured essay editorial where the student 
assumed the role of a journalist from the 1840s.  The central question asked:  Is using 
Manifest Destiny to justify war [in Mexico] a violation of American ideals or does 
pursuing Manifest Destiny in Mexico ultimately promote the greater good?  Students 
answered this question while adhering to a format that required them to not only lay out 
their position, but to also address opposing points of view.  Part II of the assessment was 
where students applied their knowledge of Manifest Destiny to contemporary times.  The 
question asked the following:  Consider the role of the United States in world affairs 
today.  Does America still have a special destiny or mission in the world?  If so, what is it 
and how should it be accomplished? If not, explain why you think it does not.  This part 
of the assessment was used as way to examine the connections students were able to 
make between an historical topic of study and contemporary times. 
 This was a valid assessment of student learning for several reasons.  First of all, 
the content adhered to the required tenth grade curriculum which covers U.S. History 
through 1877.  Since the topic was suggested by the social studies teachers, I know that 
 
101 
 
the students received instruction pertaining to Manifest Destiny and, to at least some 
extent, the Mexican-American War.  The instrument was created in conjunction with my 
advisor and reviewed for face validity by two other social studies teacher educators and a 
secondary social studies classroom teacher.  Each of these social studies professionals has 
significant experience and expertise in the field.  In their opinion, the instrument included 
appropriate content that was realistically formatted for students at this grade level.  They 
also found the instrument to be an adequate measure of the types of higher order thinking 
that I envisioned.    
 Having established content and face validity, the next concern was whether this 
instrument truly measured the types of higher order thinking processes commonly 
associated with authentic tasks.  The instrument evaluated some of the same lower order 
knowledge as the graduation exam (i.e. the definition of Manifest Destiny).  However, it 
provided a greater overall challenge by requiring students to take a position on a central 
question using extended writing.  In order to do this effectively, students had to be able to 
extract important details from the supporting materials (timeline, documents) and 
synthesize them into a coherent argument.  This required not only understanding the 
viewpoints represented in the documents, but also connecting the information to prior 
knowledge.  Since students were ultimately evaluating the justness of America?s policies, 
their answer required logical reasoning, generalizing from evidence, making distinctions, 
and a host of other possible higher order processes.  In recognizing opposing arguments 
and responding to them, students were also demonstrating their ability to use dialectical 
reasoning.  Dialectical reasoning is a central component in the process of building a 
decision-making model for critically evaluating public issues.  It simply involves being 
 
102 
 
able to critically analyze a problem and understand perspectives different from your own 
(Parker, 1989, p. 9).  Finally, by adding a real world component where students 
connected the principles of Manifest Destiny to modern times, the essay included all 
three elements of an authentic task (construction of knowledge, elaborated 
communication, and a connection to students? lives).  The task was challenging for tenth 
graders and adults alike.   
 In order to check the reliability of scoring on the higher order tasks, I had another 
doctoral student in social science education evaluate a random sample of editorial using 
the same rubrics.  The percentage of agreement with my original scores was 55% on the 
German Unification editorial.  Table 11 provides a breakdown of the inter-rater 
agreement for the various rubric categories.   
The degree of inter-rater reliability on the AP editorials for the persuasiveness 
standard (Part I ? 26%) is a bit misleading. In certain cases, the scores that I assigned 
disagreed with those of the other rater, but they both represented ?minimal? 
persuasiveness (1 vs. 2 on the rubric).  If you count minimal scores, whether 1 or 2, as 
agreeing, then the level of exact agreement rises to 52%.   Furthermore, out of the 23 
editorials examined, we agreed that the majority (87%) represented adequate 
persuasiveness at best.  The position statement IRR score (43%) is also quite low.  In our 
follow-up conference I determined that the other rater had misinterpreted the standard 
and was counting any statement that argued for unification as providing a clear position 
on the question.  I was looking for students to argue for a particular vision of unification 
(i.e. small German solution, Germany with Austria, etc.).  This misunderstanding 
definitely caused our level of agreement on this standard to be artificially low. 
 
103 
 
Table 11 
 
Inter-Rater Agreement on Higher Order Editorial Tasks 
 
 Exact 
Agreement (%) 
Exact or Off by 
1 (%) 
German Unification AP Task (N=23; 25% of total) 
Part I   
  Standard 1:  Position 
  Standard 2:  Historical Context 
  Standard 3:  Persuasiveness 
43% 
65% 
26% 
N/A 
97% 
78% 
  Standard 4:  Low-Level Dialectical Reasoning 30% 83% 
  Standard 5:  Quality of Final Position 70% 100% 
Part II:   
  Standard 1:  Decision-making 87% N/A 
  Standard 2:  Persuasiveness 65% 96% 
 
The second testing instrument, used for the Advanced Placement European 
History classes, focused on German unification.  It is included in Appendices J and K.  
This instrument adheres to the same basic format as the U.S. History assessment.  The 
topic of German unification is routinely covered in the AP curriculum and was suggested 
by the AP teachers involved in the study.  The instrument was created in conjunction with 
my advisor and reviewed for face validity and content validity by a social studies teacher 
educator, a doctoral student with experience teaching a similar course, and a secondary 
social studies classroom teacher at the study school.  In their opinion, the instrument 
included appropriate content and was a realistic assessment for the target population of 
students.  They also felt it measured the higher order thinking objectives for which it was 
designed. 
The essay question for AP students was the following:  Should the unification of 
all Germanic peoples within one nation be endorsed (supported) by the German people?  
Would other nations likely support it?  Students were provided a timeline of significant 
 
104 
 
events leading up to 1870; the decision point in this exercise.  They were also provided 
with primary documents which advocated unification on different terms.  These were 
used to evaluate potential courses of action (i.e. Lesser or Greater Germany?) for solving 
the German question.  Students essentially had to take a stand on the principles that 
should guide unification. Should unification be based on nationalism or self-
determination of peoples?  Of course, students could also argue against unification. 
The AP essay also included a connection to contemporary issues.  In this case, 
students were asked to answer the following question:  To what extent, if any, should the 
U.S. support the ambitions of ethnic, cultural, or religious groups seeking to secure their 
own nation-states today?  The students were provided with several examples of groups 
seeking independence to help them better understand the issue and frame a response (i.e. 
efforts to secure a Palestinian state; Kurdish uprisings in Iraq, etc.).  This task places 
similar cognitive demands on students as the previously described U.S. history 
assessment.  Students must construct persuasive arguments regarding German unification 
and modern nation building.  In doing so, they engage in extended writing about an issue 
with real world significance.  The task can therefore be considered authentic.   
Both higher order assessments were administered by the classroom teachers that 
participated in this study.  I provided instructions for the teachers to read to their students 
as part of this process.  These instructions are provided in Appendix L.  Students had one 
hour to complete the assessment.  Once they finished the assessment, the exams were 
collected by the classroom teacher and forwarded to the department head.  I then picked 
up the exams for scoring.  The editorials were anonymous to me since they contained 
only a student number and no reference to the class or the teacher.  I scored all of the 
 
105 
 
editorials before entering the results in my database.  This helped to ensure my scoring 
was not biased to favor students from a particular teacher.   
The rubric that I developed for the editorials was most influenced by the scoring 
guide created by Newmann to evaluate persuasive writing (Newmann, 1990).  I also 
decided to incorporate scoring elements from two other relevant studies that measured 
competencies such as dialectical reasoning that were present in the editorial task (Parker, 
Mueller, & Wendling, 1989; Saye & Brush, 1999b).  The AP editorial rubric evaluated 
students in five categories for Part I (See  Appendix M).  The first category was the 
position statement.  Students received a point if they provided a clear statement of one to 
two consecutive sentences that explained their stance on the question.  For instance, ?The 
unification of all German peoples within one nation should be endorsed by the German 
people because?.?  In many cases, I was able to infer a student?s stance based on 
statements made throughout the editorial.  However, I asked students to provide an 
explicit statement.  If students did not follow the instructions, they did not receive the 
point for this category.   
The next scoring category evaluated how well students set up their editorial in the 
first paragraph.  The historical context scale extended from 0 to 2.  Students who 
provided no background information whatsoever received a 0.  Level one scores required 
at least some historical context.  This typically consisted of one or two sentences of 
background information that closely followed the language used in the timeline provided 
with the task.  Simply mentioning relevant events like the unification wars or Bismarck?s 
influence on the unification process would qualify.  Level two scores were reserved for 
students who demonstrated some knowledge beyond what was provided on the timeline 
 
106 
 
or for students who provided a more detailed introduction that used language that differed 
from the timeline.  Level two introductory paragraphs had to clearly present accurate 
information to set up the student?s position statement. 
The next category on the scoring rubric was persuasiveness. The persuasiveness 
score was derived from a close reading of the entire editorial even though paragraph two 
was the main paragraph designated for supporting arguments in the assignment 
instructions.  In order to evaluate persuasiveness, I generated a list of plausible arguments 
that could be made to support either side of the focus question.  This list was consulted 
when making decisions about the number of distinct arguments being made in any 
particular editorial.  
 The persuasiveness scale had five possible levels. Editorials that scored at the 
first two levels were considered minimally persuasive and unlikely to persuade the 
reader.  In order to receive a ?1?, the student had to provide one persuasive argument to 
back up his/her stance on the question.  The argument could conceivably consist of only 
one sentence and did not require any elaboration or inclusion of historical evidence from 
the source documents.  Level ?2? scores were usually assigned to students who 
misunderstood the question.  These students provided multiple arguments related to the 
pros or cons of unification instead of defending a position regarding nationalism and the 
territory a unified Germany should include.  Students who made this mistake could not 
earn a higher score than a 2.     
A level 3 ?adequate? persuasiveness score was assigned when students were able 
to provide two reasons to support their position or one reason that included useful 
elaboration.  The main consideration in assigning a 3 was whether the editorial ?had a 
 
107 
 
chance of persuading the reader? given the elaboration provided by the student.  The next 
scoring level of ?elaborated? required the student to provide either more elaboration (i.e. 
citing historical evidence, use of examples, etc.) or additional reasons to back up his/her 
stance (at least 3).  Elaborated editorials were considered likely to persuade the reader.  
The level 5 scoring category was referred to as ?exemplary?.  Level 5 scores were even 
more persuasive than the level 4 editorials mainly due to especially clear and coherent 
argumentation.  These editorials were polished enough (i.e. no major grammatical 
mistakes) to be considered for public display as outstanding accomplishments for tenth 
grade students.  In general, the persuasiveness score reached at least the adequate level 
when students are able to accurately reference the primary source documents or integrate 
valid historical analogies or examples into their writing.  However, students could also 
hurt their score by providing inaccurate statements or statements that undermined their 
overall argument.   
   The final two scoring categories were low-level dialectical reasoning and 
quality of the final position: a standard that measured the ability of students to engage in 
high level dialectical reasoning while crafting a persuasive closing argument. To engage 
in low level dialectical reasoning, students had to correctly identify and explain opposing 
viewpoints.  The scale for this category went from 0 to 3.  The lowest score of ?1? 
required students to correctly state an opposing view in minimal terms.  For example, a 
student who argued for the ?Greater German solution? might discuss the opposing view 
of how nationalism could promote further warfare as Germany sought to incorporate 
German-speaking territories not presently under their control.  I considered a response 
?minimal? when the student provided a single sentence explanation of the opposing 
 
108 
 
perspective.  Students who provided multiple opposing viewpoints that were briefly 
articulated received a two.  The highest score in this category was reserved for students 
who explained at least one opposing view in greater detail by providing examples and 
evidence from the supporting documents.      
In looking at the quality of the student?s final position, I analyzed the level of 
persuasiveness and dialectical reasoning demonstrated in the fourth paragraph.  I was 
looking for students to frame their closing arguments around a thoughtful response to the 
critics.  Students also needed to restate their thesis and most significant points.  Students 
who did not provide a fourth paragraph received a 0 for this scoring category.  A ?1? 
score required students to respond to the arguments of critics and briefly mention or 
restate at least one key point from the editorial.  This ?adequate? conclusion represented a 
minimal response that really didn?t add anything to the persuasiveness of the overall 
editorial.  A ?2? conclusion required either a particularly strong (as in elaborated and 
persuasive) response to the critics or a more detailed summary of the key arguments from 
the editorial.  A level 2 paragraph was more persuasive than a level 1, but did not feature 
the advanced dialectical reasoning needed for the highest score in this category.   
Level 3 scores were reserved for conclusions with tight argumentation and 
genuine consideration of opposing views.  After reading a paragraph at this level, the 
reader should have very few, if any, unanswered questions.  Advanced dialectical 
reasoning is demonstrated when students fairly characterize the view of critics and 
respond to them in a thoughtful and respectful manner (i.e. While my opponents make 
some valid points, I still feel that?.).    
 
109 
 
Part I of the Manifest Destiny editorial was evaluated using a similar rubric (see 
Appendix N).  The range of scores in each category was identical.  In the historical 
context category, I looked for students to provide at least some information about the 
border dispute that directly preceded the Mexican-American War to receive the 
maximum points.  A level 2 score on the persuasiveness scale once again mainly captured 
those students that didn?t quite understand the question.  In this case they didn?t provide 
any comments related to Manifest Destiny, choosing instead to provide multiple 
arguments for or against going to war with Mexico.  In order to receive a higher 
persuasiveness score, students had to relate their response to the concept of Manifest 
Destiny.   The wording in the ?quality of final position? category is slightly different for 
this rubric mainly because many of the students responded to critics in paragraph 3 
instead of the final paragraph.  However, students were still evaluated on persuasiveness 
and advanced dialectical reasoning. 
In Part II of both editorials, the goal was for students to connect their historical 
knowledge to a modern issue.  I evaluated the question provided to the AP students using 
two scoring categories: decision-making and persuasiveness.  The decision-making 
category was similar to the position category in part I.  I looked for how clearly the 
student defined his/her position on the question of whether the U.S. should support the 
formation of new nation-states.  Students who took a clear stance received a point.  The 
persuasiveness scale was essentially the same as the one used in part I, but had four levels 
instead of five (the ?2? score was removed from the Part I scale).          
The question for the regular students was evaluated based on the extent to which 
students? appeared to make connections in their response between modern ideas of 
 
110 
 
American exceptionalism and ?mission? and the historic concept of Manifest Destiny.  
Scores were broken down into three levels:  0 = no connection, 1= possible connection, 
and 2 = explicit connection.  The ?no connection? responses didn?t provide any indication 
of whether the student recognized any parallels between U.S. actions today and the ideas 
associated with Manifest Destiny.  These responses were sometimes completely off topic 
reflecting a misunderstanding of the question.  The ?possible connection? score was 
assigned to students who made some valid historical connections in their response or 
perhaps touched on some of themes associated with American exceptionalism.  The 
explicit connection score was reserved for students who compared America?s modern 
mission (as they perceived it) directly with its historic destiny as it was conceived by 
advocates of Manifest Destiny in the 1800s.  Students who referenced Manifest Destiny 
in their response in some valid way could receive a ?2? score. 
   Researcher as Instrument.  As in any study that includes a qualitative  
component, the researcher is an important instrument to consider in the analysis.  Most of 
my professional background in teaching social studies is based on an inquiry model.  I 
have a bias towards this instructional approach and feel that its objectives are largely at 
odds with the type of learning encouraged by standardized, multiple-choice tests.  
Recognizing this bias, I attempted to mitigate the impact of my personal feelings by 
confirming my analyses with other researchers (inter-rater reliability) and through the use 
of multiple sources of data (tasks, interviews, observations, standardized tests) which 
support triangulation to corroborate findings made in the study. 
 
111 
 
Study Phases  
 This section describes the various phases of the study along with the data 
collection process.  The study was broken down into four phases.  The first phase was the 
planning and design refinement stage of the research.  This involved selecting a 
meaningful topic, clearly defining the purpose of the research, and developing the 
research questions.  During this phase I also prepared the basic study design while 
obtaining approval to proceed with the study from the school system and the Institutional 
Review Board (IRB).  In Phase II, I implemented the study by beginning the process of 
collecting teacher data.  At the same time, the school system began to organize the 
necessary student data into spreadsheets for the 2007/08 data set.  Phase III began in the 
Fall of 2008 and continued through the school year.  During this time I collected the 
remaining teacher data while also administering higher order essays.  I also collected the 
student data (class rosters, demographics, test scores) for the 2008/09 data set.  The final 
phase was the data analysis stage.  While this is presented as a distinct phase, it actually 
occurred throughout the study.  This information is summarized in the Figure on the next 
page.   
 
112 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 2.  Summary of Research Phases
 
113 
 
 
Phase I:  Planning and Design Refinement. This study was first conceptualized 
in 2005.  During the process of developing the topic, I did a review of the relevant 
research and narrowed down a list of potential research questions.  The specifics of the 
study design evolved over time and were finalized in 2007.  A four hour consultation 
with Dr. Bruce King at the November annual conference of the College and University 
Faculty Assembly (CUFA) of the National Council for the Social Studies (NCSS) helped 
with this process.During the Fall semester of 2007, I approached school system officials 
with the idea for the study.  I met with the assistant superintendent, central office 
personnel, and the school principals.  Once I had IRB approval and their support, I began 
the process of recruiting the 9th and 10th grade social studies faculty. 
 In January 2008, I conducted separate meetings with the 9th grade social studies 
faculty at the junior high and the 10th grade social studies faculty at the high school.  I 
went over the details of the study using a briefing script prepared for the IRB as a guide 
(Appendix B).  Teachers were encouraged to ask questions and informed of their right to 
not participate or opt out of the study at any time.  Each teacher agreed to participate and 
signed the consent form.   
I did not have to recruit students since the student data are anonymous secondary 
data that do not require participant consent.  The student data consisted of demographic 
and achievement reports already collected by the system or collected as part of a system 
sponsored pilot assessment.  I obtained student results in a coded form that prevented me 
from knowing any student names.  Observations were not videotaped or recorded in any 
 
114 
 
way other than through general field notes, so student anonymity was maintained 
throughout the study.  
As the study progressed, I had to make occasional adjustments to the initial plan 
based on unforeseen circumstances (i.e. changes in teachers, etc.).  I also worked through 
the process of finalizing specific instruments or protocols for later stages of the study.  
The process of design refinement was therefore initiated in the first phase and returned to 
throughout much of the study. 
 
Phase II ? Implementation ? 2007/08 School Year.  Phase II began in February 
2008.  The first step in the data collection process for this phase involved the collection 
of three tasks from the study teachers.  This was a departure from previous AIW studies 
in several ways.  Earlier studies required teachers to submit examples of typical 
assignments and challenging assignments.  During the AIW workshop, Dr. King noted 
that this did not substantially alter the types of assignments submitted by the teachers.  
On his recommendation, I decided to forgo the request for typical tasks in order to have 
teachers focus on choosing the three tasks that they felt best represented their students 
thinking at a high level.  The language used in the protocol requesting these tasks (see 
Appendix G) mirrors what was used in the previous authentic intellectual work studies 
conducted by Newmann.  Teachers were told that they could either submit their tasks 
electronically or arrange a time for me to collect them in person.  I provided a deadline 
for submitting tasks of February 15, 2008.  Some teachers needed additional reminders 
causing tasks to be submitted sporadically throughout the semester.  
 
115 
 
 Another decision made in consultation with Dr. King involved the amount of data 
to collect from teachers.  Newmann and King tried different approaches for collecting 
teacher data and found that additional tasks and observations (beyond three each) did not 
enhance their ability to differentiate between teachers.  The adoption of the three 
task/three observation design thus stems from ?lessons learned? from earlier AIW 
research and my desire to minimize the demands on teachers during this study.  
 One other difference between this study and previous ones of this type was the 
attempt to link the tasks submitted by teachers to observations.  Dr. King recommended 
this due to difficulties associated with interpreting tasks as stand-alone artifacts. The 
evaluation of tasks in this study generally followed the observations.  I restricted my 
analysis of tasks primarily to the materials a teacher submitted.  However, in judging the 
overriding instructional intent of a teacher for a particular task, I did take into 
consideration insights from the lesson observation. 
 I negotiated with teachers to schedule the lesson observations.  Written tasks, such 
as essays or reports, provide little opportunity for me to observe criteria specified on the 
AIW observation rubric.   In order to afford each teacher the opportunity to score well on 
the observations I employed the following strategy:  when the task involved a debate, 
simulation, or some other form of ?live? presentation, I observed the actual class period 
associated with the task unless the teacher had a specific rationale for observing another 
day.  However, if a task involved a written assignment that was difficult to observe, I 
negotiated with the teacher to observe the day that best demonstrated how students were 
prepared to complete the task.    
 
116 
 
 The collection of some teacher data had to be pushed back to phase III for several 
reasons.  First, three teachers had interns during the Spring 07 semester.  The interns were 
teaching units that in some cases corresponded with the challenging task(s) the teacher 
wished to submit.  It was also evident that scheduling observations was going to be 
difficult due to the intern observation schedule.  The second factor that caused the 
collection of some teacher data to be postponed related to delays in initiating the study.  
Data collection did not formally begin until mid-February.  Some teachers submitted 
challenging tasks that they had already taught to their students.  I tried to allow teachers 
to stick with their original selection unless the teacher legitimately felt another task was 
just as challenging.  Also, I wanted to observe tasks/lessons that were spaced throughout 
the semester realizing the potential increase in difficulty for teachers to score well on the 
AIW standards early in the semester.  Finally, despite repeated communication attempts, 
some teachers did not provide their tasks in a timely fashion or did not respond to 
attempts to schedule observations.   
In addition to collecting tasks and conducting observations, I also completed 
interviews with some of the study teachers (see interview script, Appendix A).   My study 
design, approved by the IRB, only permitted one brief interview of approximately fifteen 
minutes instead of the pre/post interview schedule eventually adopted by the larger 
SSIRC study.  I preferred to conduct the interview after all the teacher data had been 
collected.  However, it was more difficult to negotiate the observation dates with some 
teachers than I originally anticipated.  As a result, I set up some meetings during phase II 
when observation dates were finalized and the interview questions were completed 
simultaneously.  I also went ahead and did interviews with some teachers after the 
 
117 
 
majority of their data had been collected.  I was unable to conduct an interview with one 
teacher who retired during the study.   I was also only able to collect background data 
from another teacher.      
Finally, I worked to improve the inter-rater reliability of the rubrics during this 
phase and subsequent phases.  A steering committee conference of the Social Studies 
Inquiry Research Collaborative was held at Auburn University on March 5-7, 2008.  This 
meeting included substantial time for practice scoring with the AIW rubrics.  The 
observation rubric was used in conjunction with video footage and an actual classroom 
visit.  A subsequent meeting, which I was not able to attend, was held at the American 
Educational Research Association (AERA) conference on March 26.  This meeting also 
included scoring practice with an on-site evaluation of a class in New York.  I benefited 
from the minutes and discussion that came out of this conference.  Following each of the 
sessions noted in Table 12, I revisited my field notes for completed observations.  My 
field notes included a specific rationale for each scoring decision on the AIW rubric.  
When my thinking was ?normed? by more precise interpretations of the rubrics, I could 
easily determine whether a score needed to be adjusted.   
 
Phase III ? Implementation ? 2008/09 School Year.  During phase III, I 
focused on collecting the remaining teacher data, analyzing phase II data, and 
creating/implementing higher order assessments.  I worked on the Manifest Destiny 
assessment in the Fall and provided it to the tenth grade U.S. history teachers in 
December.  A total of 184 students took the exam.  The exam for the Advanced 
Placement students was administered during the Spring semester.  
 
118 
 
Table 12   
 
Summary of Inter-Rater Reliability Sessions 
 
Data Location Purpose Role 
Nov. 2007 San Diego, CA CUFA meeting at NCSS ? Consultation 
with Dr. Bruce King. 
Participant 
Mar. 5-7, 2008 Auburn, AL Steering Committee of SSIRC ? Task 
and observation rubric practice using 
actual tasks, video, and an on-site 
classroom visit. 
Participant 
Mar. 26, 2008 New York AERA Conference ? Task and 
observation rubric norming using actual 
tasks, video, and an on-site classroom 
visit. 
Access to 
minutes 
June 19, 2008 Internet Video 
Conference 
Steering Committee of SSIRC ? 
Observation rubric norming using 
Geovanis video. 
Access to 
minutes 
July 24, 2008 Internet Video 
Conference 
Steering Committee of SSIRC ? 
Observation rubric norming using 
Eubanks video. 
Participant 
Sept. 22, 2008 Auburn, AL Norming session of task and observation 
rubrics. 
Participant 
Oct 3, 2008 Auburn, AL Alabama Conference of SSIRC ? 
Observation and task rubric norming 
Participant 
Nov. 13-14 Houston, TX CUFA meeting at NCSS ? Task and 
Observation rubric norming 
Participant 
Dec. 15, 2008 Internet Video 
Conference 
Online Norming session  Participant 
Jan. 17, 2009 Charlottesville, 
VA 
CUFA Retreat at the University of 
Virginia 
Access to  
audio 
recording 
Apr. 15, 2009 San Diego, CA AERA Conference ? Task rubric 
norming using a task collected by a 
SSIRC researcher. 
Access to 
minutes 
 
 
Phase IV ? Final Data Analysis.   Data analysis was an iterative process during 
this study.  It began during phase II with the collection of teacher data and continued until 
the end of the study.  Phase IV began in July 2009 with the culmination of the second 
year of data collection.  
 
119 
 
Data Analysis Procedures 
 The data analysis process involved the analysis of student and teacher data to 
ascertain the impact of instruction on social studies learning outcomes.  My analysis 
focused specifically on five research questions.  The questions are listed below: 
 
Research Question 1:  To what extent do teachers utilize authentic pedagogy and how  
much variation exists within the sample of teachers in this study?   
Research Question 2:  Do students that have been taught by teachers demonstrating 
higher levels of authentic pedagogy score higher on the AHSGE 
than students taught by teachers with lower levels of authentic 
pedagogy? 
Research Question 3:  What is the impact of authentic pedagogy on student performance 
on an assessment that requires them to apply knowledge from a 
 previous unit to a challenging new task? 
Research Question 4:  Does the ability to apply knowledge in these situations improve  
   with repeated exposure (multiple courses) to classroom   
                                     experiences that require students to perform challenging  
                                     intellectual tasks? 
Research Question 5:  To what extent does authentic pedagogy bring different  
   achievement benefits to students of different social and academic  
   backgrounds? 
Table 13 depicts the hypotheses associated with each of these questions.  It also provides 
an overview of the data analysis methods.
 
120 
 
Table 13   
 
Summary of Research Questions and Data Analysis Methodology 
 
Research Question Hypothesis Method of Analysis 
To what extent do teachers  
utilize authentic pedagogy 
and how much variation 
exists within the sample of 
teachers in this study?   
 
The mean score of the teacher 
sample will not reach the 
mean of the authentic 
pedagogy scale.   
Application of Newmann?s 
task and instruction AIW 
rubrics. Analysis of 
descriptive data. 
 
 
Do students that have been 
taught by teachers 
demonstrating higher 
levels of authentic 
pedagogy score higher on 
the AHSGE than students 
taught by teachers with 
lower levels of authentic 
pedagogy? 
 
Students who experience 
higher levels of authentic 
instruction in their tenth grade 
social studies courses will on 
average achieve higher scores 
on the graduation exam. 
Multiple regression using 
students as the unit of 
analysis & 
a comparison of classes 
using one-way ANOVA 
What is the impact of 
authentic pedagogy on 
student performance on an 
assessment that requires 
them to apply knowledge 
from a previous unit to a 
challenging new task? 
Students who experience 
higher levels of authentic 
pedagogy in their social 
studies class will on average 
achieve higher scores on the 
higher order essay 
assessments. 
Factorial MANOVA 
analyzing performance of 
students who received 
minimal, limited, or 
moderate levels of 
authentic pedagogy 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121 
 
Table 13 
 
Summary of Research Questions and Data Analysis Methodology (Cont.) 
 
Research Question Hypothesis Method of Analysis 
 
Does the ability to apply 
knowledge in these situations 
improve with repeated 
exposure (multiple courses) 
to classroom experiences that 
require students to perform 
challenging intellectual 
tasks? 
Achievement benefits 
associated with authentic 
pedagogy will be enhanced 
by increased exposure 
(multiple semesters) to social 
studies coursework that 
meets the standards of high 
quality designated by this 
construct. 
 
Analysis of post-hoc 
tests from one-way 
ANOVA and multiple 
regression analysis 
To what extent does authentic 
pedagogy bring different 
achievement benefits to 
students of different social 
and academic backgrounds? 
Achievement gains 
associated with higher levels 
of authentic pedagogy will be 
equitably distributed among 
the student population 
associated with this study. 
 
Analysis of bivariate 
correlations  
 
122 
 
 
Data Preparation.  Data was received from the school system in the form of 
spreadsheets that had to be reorganized and merged into a coherent database.  The data 
collection process proceeded incrementally since the study overlapped two school years.  
Standardized test results were only available at certain times based on the reporting cycle 
followed by the state.  I also had to follow the district schedule to obtain student grades, 
course schedules, and other information.  A consequence of obtaining student data 
piecemeal was an increased likelihood that the spreadsheets would not match perfectly 
and therefore some data was missing. Whenever possible, I tried to reconcile 
discrepancies and obtain missing data from the district.  However, the final dataset was 
still incomplete in some areas.  When running statistical tests, I only included students 
with complete records for the variables under analysis.     Once I had a coherent 
database that incorporated all of the different spreadsheets I had received from the school 
system, I began the process of preparing the data for analysis in SPSS.  I created new 
categorical variables for race, gender, SES, limited English proficiency, and special 
education based on the mean Alabama High School Graduation Exam scaled scores of 
students in these categories.   
Analyzing Teacher Data.  In order to address the first research question, I 
assigned authentic pedagogy scores to the teachers based on task and observational data.  
The process for assigning these scores is described in the instrumentation section of this 
chapter.  The final authentic pedagogy scores (average task score plus average 
instruction) were used as the basis for categorizing teachers into four groups: minimal 
authentic pedagogy, limited authentic pedagogy, moderate authentic pedagogy, and 
 
123 
 
substantial authentic pedagogy.  These categories were developed by evenly breaking the 
final authentic pedagogy scale (7-30) into quartiles as follows:Q1 = between 7 and 11.99, 
Q2= between 12 and 17.99, 
Q3= between 18 and 23.99, 
Q4= above 24 
Analyzing Student Learning Outcomes.  Multiple levels of analysis were 
necessary to analyze the impact of authentic pedagogy on student performance on the 
social studies graduation exam.  This was mainly due to the small size of the teacher 
sample at grade 10 (N=4) which made it more difficult to determine if the effects of 
instruction were significant.  My initial examination of the data focused on students as 
the unit of analysis (N= 805) and utilized a multiple regression model whereby student 
variables known to influence achievement were controlled to reveal the independent 
effects of authentic pedagogy on the dependent variable (Alabama Graduation Exam 
scaled score).  The independent variables listed in Table 14 were analyzed when 
addressing research question two and the other research questions. 
 
124 
 
Table 14 
 
Overview of Independent Variables Used During Regression Analyses   
 
Variable Coded Name Value 
Gender Sex M=Male, 
F=Female  
Ethnicity Race W=White, 
A=Asian 
B=Black 
H=Hispanic 
N=Not 
Reported  
SES Lunch 1=Free, 
2=Reduced, 
3=Paid 
Disability Status SpEduc 0=No, 1=Yes 
English Proficiency LEP 0=No, 1=Yes 
Course Name   CourseName AP 
US 
USAlt. 
USCo. 
Course Type CourseType 1=Fall, 
2=Spring, 
3=All Year 
Prior Grades (10) Average10 0-100 
Prior Grades (9) Average9 0-100 
Average Task Authenticity TaskComposite 3-10 
Average Instruction Authenticity InstructionComposite 4-20 
Authentic Pedagogy Score APScore 7-30 
Note.  All variables were not retained in the final analysis.  I recoded the first seven 
variables as criterion-coded variables using the mean scale AHSGE scores for students in 
each particular category (i.e. the mean scores for males & females, etc.). 
 
 
 In order to investigate the relationship between the many predictor variables, I 
conducted the regression sequentially.  In step/block one, I entered the student 
demographics.  Then, in step/block two, I entered the prior achievement measure of 
student grades in social studies.  Finally, in step/block three I entered the authentic 
pedagogy variables.   
 
125 
 
In trying to determine the usefulness of the various predictors, I utilized the 
following criteria.  First, I verified that the regression model itself had a high F 
(ANOVA) that wasn?t likely to occur often by chance.  During each stage of analysis, I 
examined the R? of the predictor variables.  A high R? with a low standard error of 
estimate was desired.  I also looked for the variables that had the highest Betas with a 
high t-value that was significant (.05 or less).  Finally, I looked for variables with a high 
semi-partial correlation.  This was probably the best indicator because it showed how 
much a variable contributed on its own to predicting the criterion variable.  After looking 
at each of these indicators, I was able to determine the extent to which each variable 
influenced the achievement outcomes of this study and to rank order the predictor 
variables in order of their importance. 
I tested four critical assumptions associated with this statistical approach 
(Osborne & Waters, 2002).   I checked to see if the variables were normally distributed, a 
linear relationship existed between the independent and dependent variables, the 
variables were reliably measured, and if the residuals were normally distributed among 
the independent variables.  Each of these assumptions was met.  I also analyzed the 
correlation matrix and collinearity statistics within the SPSS reports to ensure that the 
predictor variables were not highly correlated with each other.   
The multiple regression analysis related to this research question determined the 
extent to which a relationship existed at the individual student level between authentic 
pedagogy and student performance above and beyond any of the other variables.  My 
conclusions must be viewed as extremely tentative since teacher characteristics were not 
controlled.  Also, the variability of instruction when using students as the unit of analysis 
 
126 
 
was very limited since all of the students were associated with only four high school 
teachers.   
In order to make a stronger case regarding the impact of authentic pedagogy on 
student performance I also ran some analyses using the classroom as the level of analysis.  
I used one-way ANOVAs to compare classes from specific authentic pedagogy 
categories (minimum, limited, moderate).  I was very careful to identify classes for these 
comparisons that had similar students.  I did this by generating contingency tables in 
SPSS using the crosstab command.  The crosstab command results in a Pearson Chi-
Square test for each variable making it easy to identify statistically significant differences 
between classes.  In addition to matching classes based on student characteristics 
(demographics, prior achievement), I also attempted to match classes based on some 
teacher characteristics.  Ultimately, this process held the teacher and student 
characteristics constant, thus focusing the ANOVA on the specific impact of authentic 
pedagogy on student performance.  The combination of one-way ANOVAs and multiple 
regression analysis enabled me to more effectively address the second research question.  
 Research question three focused on the ability of students to apply the knowledge, 
skills, and dispositions gained from instruction on a particular topic to a challenging new 
task.  In this case, the new task was an editorial writing assignment that required a good 
deal of higher order thinking and elaborated communication.  The task was also 
structured to measure the ability of students to connect historical knowledge to 
contemporary events and issues.  Higher order tasks of this nature fit more readily into 
the stated goals of authentic intellectual work.  Two different essays were prepared for 
the students.  The advanced placement students taking European History completed an 
 
127 
 
essay focused on German unification.  The regular education students in the U.S. History 
courses were administered an essay on Manifest Destiny during the Mexican-American 
War.  These essays were evaluated using the rubrics in Appendices M and N.   
The procedure for analyzing this research question involved several steps.  First, I 
isolated the sample of students who had taken the higher order editorial by filtering out 
the other students in the database. Then, I created a new teacher grouping variable based 
on the authentic pedagogy categories I had previously established (1=minimal; 2=limited; 
3=moderate).  I combined the students who received minimal authentic pedagogy (from 
Andy and Jason) into one group.  The other two groups (limited & moderate) included 
one teacher in each group.  Next, I ran an analysis using SPSS to determine the extent to 
which significant differences existed between the three groups of students on certain 
demographic variables (gender, SES, and ethnicity).  This process was essentially the 
same as what I did for research question two.  I generated contingency tables for each 
variable using the crosstab command.  The resulting Pearson Chi-Square was used to 
identify statistically significant differences between the three groups.  I also ran a 
oneway- ANOVA to determine whether significant differences existed between the 
groups based on their social studies grades from the current year.  The goal of each of 
these steps was to control for as many factors as possible other than authentic pedagogy 
that could influence student performance on the higher order editorial.  Finally, I ran a 
MANOVA to test the hypothesis that students who experience higher levels of authentic 
pedagogy achieve higher scores on the higher order assessment.  The dependent variables 
were the rubric categories associated with the higher order assessment.  The fixed factor 
 
128 
 
was the level of authentic pedagogy students experienced (represented by the three 
teacher groups I had established).   
Although the advanced placement and regular editorials were formatted similarly 
and the rubrics were virtually the same, I decided to analyze the work separately.  I didn?t 
feel confident comparing the performance of these students because I couldn?t be certain 
that the challenge they experienced was the same.  I did, however, apply the same 
statistical procedures to each set of data.   
 Research question four focused on whether there was a performance benefit 
associated with taking multiple social studies courses that featured higher levels of 
authentic pedagogy.  In attempting to address this question, I did not observe firsthand 
the type of instruction a group of students received over the course of two years.  Instead, 
I collected teacher data for the entire sample of 8 teachers for one year and performed my 
analyses (on two years worth of student achievement data) based on the assumption that 
teachers do not radically alter their instruction between semesters or consecutive school 
years.  For example, I placed one of the teachers in the limited authentic pedagogy 
category based entirely on observations during the spring semester, 2008.  I made the 
assumption that students who had this same teacher during a different semester also 
experienced limited amounts of authentic pedagogy.   
 I created a new ?prior moderates? variable for this research question that was a 
measure of the number of social studies courses each student experienced that were at the 
moderate authentic pedagogy level.  The prior moderate variable had three possible 
options.  A student with a ?0? designation did not experience any social studies courses at 
the moderate authentic pedagogy level in the ninth or tenth grade.  A ?1? indicated that 
 
129 
 
the student had at least one course at the moderate level in either the ninth or tenth 
grades.  Finally, a ?2? meant that both the ninth grade and tenth grade social studies 
courses the student took were at the moderate level. 
 I ran several one-way ANOVAs to analyze this question.  In each case, I factored 
out students who had more than one social studies course during either the ninth or tenth 
grades.   The first one-way ANOVA included all of the students in the sample.  The 
second removed the advanced placement students and just looked at the impact of 
multiple years of moderate authentic pedagogy on regular education students.  The final 
ANOVA just featured the advanced placement students.  In each instance, the ANOVA 
provided an indication of whether the overall model was significant.  I ran several post-
hoc tests to see if any significant differences in performance existed between students 
with 0, 1, or 2 course featuring moderate levels of authentic pedagogy.   
 In addition to the ANOVAs, I also ran a sequential multiple regression analysis 
that was identical to the one described for research question 2.  However, instead of 
including the task and instruction authentic pedagogy variables in step 3, I entered the 
prior moderate variable. This enabled me to have a measure of the unique impact of the 
prior moderate variable above and beyond the influence of the demographic and prior 
achievement variables.   
Finally, I was interested in determining the achievement effects of authentic 
pedagogy for specific subgroups of students.  Using the demographic and achievement 
data obtained from the school system, I ran a series of bivariate analyses in SPSS to 
determine whether the achievement benefits associated with authentic tasks and authentic 
instruction were equitably distributed among the students at Central High School.  I 
 
130 
 
analyzed the impact of authentic pedagogy based on gender, race, SES, and prior 
academic achievement in social studies.  Each bivariate analysis resulted in a Pearson 
correlation statistic that served as the main indicator of whether a correlation was 
significant.  I compared the direction of correlation and the level of significance for each 
subgroup of students to determine whether certain students (i.e. males) were more or less 
likely to be advantaged by higher levels of authentic pedagogy.     
Conclusion 
 In conclusion, this was a four phase study focused on understanding the 
relationship of authentic intellectual work to student learning outcomes (both higher and 
lower order) in social studies.  It was a mixed method analysis of the social studies 
instruction at two study schools in East Alabama.  The study isolated the impact of 
authentic pedagogy on student performance primarily through regression analyses by 
controlling for a variety of predictor variables that were likely to have at least some 
impact on achievement.  The study ultimately included 805 students, eight teachers, and 
data collected over the course of two school years. 
 
131 
 
CHAPTER FOUR:  TEACHER USE OF AUTHENTIC PEDAGOGY 
 
 
The purpose of this chapter is to present findings related to my first research 
question: to what extent do teachers utilize authentic pedagogy and how much variation 
exists within the sample of teachers in this study?   This chapter includes raw scores from 
my analysis of this question as well as descriptive accounts of teacher practice at various 
levels of the authentic pedagogy continuum.  These accounts are intended to help the 
reader form a more complete understanding of the types of intellectual challenges 
students experienced in their history courses. As discussed in the previous chapter, the 
authentic pedagogy scores are based on an analysis of tasks and instruction.  Teachers 
were asked to submit three tasks that best indicate how well students understand their 
subject at a high level.  These tasks were then each linked to a classroom observation.  
The observation sometimes featured students actually engaged in the associated 
assignment.  In other cases, I observed the instruction that prepared students to be able to 
do the task.  The average task score and average observation score were added to develop 
a final authentic pedagogy score (which could range from 7-30).  Table 15 depicts the 
final authentic pedagogy (AP) scores along with demographic information associated 
with each teacher. 
 
 
 
132 
 
 
Table 15   
 
Teacher Profiles   
 
 Roy Andy Jason Amy Phillip Lauren Ryan Lee 
AP Score 9.6 10.9 11.6 12.9 13.3 18 20.9 21.2 
Age 26-35 36-35 36-45 46-55 26-35 46-55 26-35 36-45 
Ethnicity White White White White White White White White 
Experience 4 11 14 15+ 6 15+ 11 12 
Grade 
Taught 9 10 10 9 10 9 10 9 
 
It should be emphasized that teachers in this study were not labeled as ?authentic? 
or ?traditional?.  The authentic pedagogy scores represent a continuum.  Teachers who 
scored at the high end of the continuum occasionally used strategies that would be 
considered more traditional (i.e. lecture; multiple-choice tests).  However, the observation 
and interview data suggested that this type of instruction was not their dominant practice.  
These teachers seemed to have a fundamentally different conception of high level 
understanding than their peers.  The next chapter will explore the extent to which this 
resulted in differences in student learning on the outcome measures. 
When considering the first research question, I hypothesized that the mean score 
of the teacher sample would not reach the mean of the authentic pedagogy scale.  I also 
believed, based on my purposeful selection of the research site to increase the likelihood 
of having some high scoring teachers, that enough variation would exist among the 
teachers in the sample to ascertain the impact of authentic pedagogy on student learning.  
My hypothesis was supported.  The average authentic pedagogy score of 14.8 did not 
reach the midpoint of the authentic pedagogy scale which is 18.5.  There was enough 
spread among the teachers to be able to address my other research questions.  The final 
 
133 
 
authentic pedagogy scores were organized into four categories: minimal, limited, 
moderate, and substantial.  The cut scores for these categories are listed in Table 16.  The 
dividing points represent a breakdown of the totality of possible scores into approximate 
quartiles. They also correspond with those used by the broader Social Studies Inquiry 
Research Collaborative (SSIRC) study. 
 
Table 16   
 
Cut Scores 
 
 Average 
Task  
Average 
Instruction 
Cut Offs 
Minimal 3-3.99              4-8 7-11.99 
Limited 4-5.99              8-12 12-17.99 
Moderate 6-7.99 12-16 18-23.99 
Substantial        8-10 16-20 Above 24 
 
Some general statements and trends are evident based on an analysis of these data.  
First, when applying the cut scores to the authentic pedagogy scores in Table 15, it is 
evident that no teachers reached the highest ?substantial? category of authentic pedagogy.   
This finding is not surprising given the difficulty associated with achieving the top levels 
of the rubrics. Three teachers, however, did score in the moderate range.  These teachers 
were Lauren, Ryan, and Lee.   They scored a good deal higher than the rest of the sample.  
Roy, Andy, and Jason were on the opposite end of the continuum in the minimal 
authentic pedagogy category range.  The remaining teachers in this sample could best be 
characterized as using limited authentic pedagogy.   
Lauren barely reached the moderate category with a score of 18.  However, this 
score was likely influenced by circumstances surrounding the collection of her data.  
 
134 
 
Lauren retired during the study before I could observe the tasks she submitted.  In order 
to document the degree of authentic pedagogy students experienced in her class, I rated 
her instruction based on two videotaped lessons that were a part of a previous lesson 
study project.  During this project, Lauren created an inquiry-based lesson with the 
assistance of her 9th grade social studies colleagues as well as teacher educators and 
historians from Auburn University.  As a result, the task scores associated with the 
videotaped lessons were very high.  The three original tasks that Lauren submitted were 
not as authentic.  It is likely that her final authentic pedagogy score was inflated. 
Nevertheless, Lauren represented a teacher who required students to complete some 
intellectually challenging tasks during the course of the semester. 
The two highest scoring teachers in this study were formally trained in inquiry-
based instructional practices.  Lee had a master?s degree in social studies education while 
Ryan earned his doctorate in the same field during the study.  In addition to their formal 
training, Lee and Ryan also had extensive experience in applying this knowledge in the 
classroom.  They were actively involved in inquiry-based professional development 
programs as participants, leaders, and mentor teachers.  Their scores in this study suggest 
that a combination of graduate work and experience, filtered through a disposition that is 
amenable to the assumptions of inquiry based instruction, may contribute to higher levels 
of authentic pedagogy.  These factors will be discussed further in the generalizations 
section of this chapter. 
The teachers in the lowest category, by way of comparison, did not have the same 
extended experience with inquiry-based teaching.  Roy and Andy held graduate degrees 
in fields other than social studies; administration and special education.  Andy was a 
 
135 
 
veteran teacher who was actively transitioning into an administrative role during the 
study.  He had attended at least one inquiry-based professional development workshop, 
but there was less evidence that he applied what he had learned in his teaching.  Roy was 
the novice teacher in the study having taught social studies for only two years.  Roy?s 
understanding of inquiry-based teaching did not appear to be much different from some 
of the teachers in the limited category.  However, his ability to implement this type of 
instruction was likely influenced by the growing pains associated with being a new 
teacher.  Roy and Andy could best be described as teachers who predominantly used a 
traditional instructional approach.  As Andy told me during one of our meetings, ?I pretty 
much just lecture.? Jason was also an experienced teacher who had over ten years in the 
classroom.  However, most of this was in another state.  In his two years at the study 
school, he had participated in at least one inquiry-based professional development 
workshop.   Jason seemed comfortable using certain inquiry-oriented strategies, but still 
maintained a predominantly traditional instructional approach.      
Amy and Phillip were in the limited authentic pedagogy category.  Amy was a 
veteran teacher who had recently been involved in an intensive inquiry-based lesson 
study project.  There was some evidence that she was becoming more comfortable and 
proficient in implementing problem-based lessons. Phillip was less experienced, but had 
more formal training in this type of instruction since he had graduated from an 
undergraduate social studies program that emphasized this approach.   
The next section includes examples of the types of intellectual challenges students 
experienced at each of the levels of authentic pedagogy represented by this sample.  The 
examples do not contain the level of detail associated with audio or videotaped 
 
136 
 
transcripts.  However, they do help to explain the teacher?s placement on the authentic 
pedagogy continuum.  In each category (minimal, limited, moderate), at least one 
teacher?s authentic pedagogy task scores are described to provide the reader with a 
general sense of the type of tasks that were submitted.  This is followed by a more 
detailed explanation of a specific task/lesson combination which describes what students 
experienced and the scoring rationale.       
Minimal Authentic Pedagogy 
Roy was the best example of a teacher who utilized minimal authentic pedagogy.  
Roy taught 9th grade World History to three blocks of students each day.  The classes I 
observed typically averaged about 18 students.  His students represented a broad range of 
ability levels to include some special education students. 
Roy was a younger teacher (between 26 and 35), but not fresh out of college.  His 
undergraduate teaching preparation was in social studies education.  He later attained a 
master?s degree in mild/moderate disabilities.  Roy?s first two years of teaching at the 
study school were in special education.  He was completing his first year of social studies 
teaching when this research project was initiated.  In addition to his teaching 
responsibilities, Roy served as a football and baseball coach.        
In many ways Roy seemed to be still getting the feel for teaching.  He was not as 
confident as his veteran colleagues in maintaining classroom discipline.  His style tended 
to be inconsistent (of course this could have been due to my presence in the classroom).  
At times, he was overt and somewhat aggressive when addressing behavioral issues.  In 
other situations, he was overly permissive with students.  Routine administrative tasks 
and minor behavioral issues consumed a good deal of this teacher?s time and energy.  
 
137 
 
This possibly played a role in his hesitancy to adopt a more student-centered classroom 
environment.   
Table 17 depicts the tasks that Roy submitted and the final scores that were 
assigned using the task and instruction rubrics (Appendices C-F).  Roy selected a political 
cartoon, an illustrated timeline, and an activity where students taught their classmates as 
his three tasks that most represented students thinking at a high level.  The intent of the 
political cartoon activity, as explained in the interview, was for students to demonstrate a 
deeper understanding of some of the causes of World War I.  Roy wanted students to 
show they understood complex terms like militarism and nationalism.  The second task, 
the illustrated timeline, was part of a unit on the Industrial Revolution.  In this lesson, 
students were asked to draw the larger significance of a list of events, inventions, people, 
or concepts associated with this time period.  Finally, the ?teach a lesson? task involved 
students teaching their classmates about events in a chapter on Nationalism.  Roy 
believed the students would have to use higher order thinking to prepare an activity for 
the class to complete.      
Table 17   
 
Overview of Roy?s Authentic Pedagogy scores 
 
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Political Cartoon 4 5  
Industrial Revolution Illustrated Timeline 6 6  
Teach a Lesson 4 4  
Average Task/Instruction Scores 4.6 5 9.6 
 
 
138 
 
Each of these tasks had the potential to be intellectually challenging.  Political 
cartoons, for instance, can be used to convey complex messages and subtle nuances about 
a topic.  With proper scaffolding, students can learn how to question cartoons like they 
would any historical artifact.   A great deal of higher order thinking is often needed to 
uncover the potential bias of an artist or interpret the meaning of symbols.  Teachers can 
lead students to question the artist?s intent in including certain design features (i.e. color, 
symbols, etc.) or even have them create a cartoon expressing an opposing viewpoint. As 
part of an inquiry based activity, students might investigate the details surrounding an ill-
structured problem or question.  They might construct a cartoon that supports their view 
or the view of some historical group associated with the topic.  In this case, the cartoon?s 
message constitutes the basis of an argument ? something the students can articulate as 
part of a class discussion or debriefing.  Symbols and other features of cartoons are used 
to convey a message that has some real depth.  
Roy?s tasks did not live up to their description in the interview.  Challenging tasks 
usually require detailed instructions and precise scaffolding to help students successfully 
think at a higher level.  These features were noticeably absent in most of the assignments 
Roy provided to his students.  In some instances, Roy seemed to confuse comprehension 
level tasks with higher order thinking.  There was an overall disconnect between Roy?s 
stated intentions and his ability to implement lessons which required the types of thinking 
envisioned by the authentic pedagogy model.  
Roy?s students had very little opportunity to complete assignments that required 
construction of knowledge or elaborated communication.  The construction of knowledge 
scores for the three tasks (1,2,1 on a three point scale) suggest that Roy?s dominant 
 
139 
 
expectation was for students to ?reproduce information gained by reading, listening, or 
observing? (instruction rubric).  This was especially true of the ?teach a lesson? task that 
will be described later in this section.   
Roy?s use of a political cartoon activity and illustrated timeline showed that he 
was willing to try alternative forms of assessment.  Visual tasks such as these can require 
a great deal of elaborated communication on the part of students.  However, Roy?s scores 
in this category were also fairly low (2,3,2 on a four point scale).  His visual tasks were 
not as demanding as similar ones used by other teachers in this study.  The political 
cartoon activity, for example, required students to draw a picture of one of the causes of 
World War I (a subject just covered in class).  This doesn?t really constitute a political 
cartoon.  Political cartoons usually involve attempts to persuade others or convey a point 
of view.  The WWI drawings in the class that I observed were very simple.  When 
students presented them, they usually could describe their idea in one or two sentences.  
Most of Roy?s tasks could best be described as the equivalent of a short answer exercise 
(level two on the task rubric).   
Finally, all three of Roy?s tasks had virtually no connection to students? lives 
(1,1,1 on a three point scale).  The tasks did not require students to explore the modern 
relevance of historic events like World War I and the Industrial Revolution. They also 
didn?t provide students with much of an opportunity to study these topics in a way they 
would find personally meaningful.      
The task that most exemplifies the type of instruction students received in this 
category was called ?teach a lesson.?  The scores for each standard of the task and 
 
140 
 
 
instruction rubric are provided in Table 18.  Most of the standards received the lowest 
possible score.   
 
Table 18   
 
Scores for ?Teach a Lesson? Task 
 
Task Scores  Instruction Scores 
Construction of Knowledge 1  Higher Order Thinking 1 
Elaborated Communication 2  Deep Knowledge 1 
Connection to Students? Lives 1  Substantive Conversation 1 
   Connectedness to the Real World 1 
Total Task: 4  Total Instruction: 4 
 
Roy had used the ?teach a lesson? activity at other times during the semester with 
different topics in the curriculum.  The instructions he provided to students on this 
particular day were very brief (see Figure 3).  The students worked on this task during the 
last day of a seven day unit.  It was designed to prepare students for an upcoming test.  
Roy?s other classes did not do this assignment.  He felt that it would work best with his 
first period; his strongest group of students. 
 
Teach a lesson 
 
You will be assigned a section from the Nationalism unit. 
 
1. As a group you need to decide on 10 facts you and your classmates need to 
know.  You need to write these 10 facts on your paper to present to the class. 
 
2. Come up with an activity for the class to do. You must also create an example 
of your activity for the class to see.  
 
Figure 3.  The ?Teach a Lesson? Task 
 
 
141 
 
When Roy implemented this task he organized students into five groups of three 
to four students each.  The groups were assigned one of the sections from the textbook 
chapter they had been covering.  Their task was to identify ten facts for their classmates 
to know.  Once identified, the tasks were to be transferred to ?cheerleader? paper for 
display during the presentation phase of class.  In addition to identifying facts, each group 
was responsible for preparing an activity.  The activity portion of the assignment could 
involve a demonstration of some sort by the group or a more interactive activity that 
involved the class.   
 Students worked on the activity for an hour.  The groups divided up the task as 
might be expected.  Those with better handwriting or artistic ability did the fact poster.  
The other group members thought of the activity or found the facts.  In each group, it was 
usually one or two students that really looked through the materials (textbook, 
worksheets, notes) for the facts to incorporate into the presentation.  This activity did not 
require students to justify why they chose certain facts as important.  As I watched the 
lesson, the students mostly pulled sentences verbatim from their source.  A girl seated 
near me opened up the textbook, identified the section, and looked at the sub-topics.  She 
then said, ?We can just do two from that, that, that, that, and that.?  In this simple 
manner, the student selected two facts from each of the highlighted topics in the section.  
I asked a student from another group how she identified facts to include on the poster.  
She told me she was basically pulling the main points from the class notes.  I viewed 
similar patterns from the other groups. 
 The talk within the groups was almost entirely procedural and often off topic.  
Each student had their own laptop computer.  Several of the groups typed rough drafts 
 
142 
 
and then read the facts to the person writing on the butcher paper.  This process was very 
inefficient when you consider that the class was equipped with an electronic whiteboard.   
Rather than use butcher paper, the files could have easily been pulled up on the 
whiteboard for everyone to see.  Students seemed to sense the lack of urgency in the 
lesson and stretched out the task to take full advantage of the time provided by the 
teacher.   
 The presentations took up the final portion of class.  Each group read their ten 
facts to the class.  The first group played a game of ?trashketball.?  One student read 
multiple choice questions from a worksheet while the other two students in the group 
took turns answering.  Approximately four questions were asked (related to the ten facts).  
A point was awarded for each correct answer.  A correct answer also entitled the student 
to shoot a wadded up piece of paper into the trashcan.  If the student made the basket, 
he/she got another point.  The class looked on while the group played the game.   
The second group also played a short game with multiple choice questions.  This 
time, two questions were asked to the class.  If a student got the question right, he/she got 
to ?make a beat? (by banging on the desk).  The third group had a word find that 
appeared to come from a textbook or perhaps online (not created by the students).  The 
teacher showed it to the class.  However, neither the class nor the group did anything with 
it.  The fourth group located a map to support their facts.  The teacher had one of the 
students explain the map.  It showed the size and power of the Ottoman Empire.  Finally, 
group five asked the class to write a half page diary entry on a serf.   The class ended 
immediately after this presentation and there was no indication that the students were 
going to actually do this assignment. 
 
143 
 
The instruction rating for this task was fairly straightforward.  It received a one on 
the higher order thinking scale because I did not notice any students engaged in higher 
order thinking during the lesson.  The students simply took facts from their textbook, 
notes, or worksheets and transferred them to butcher paper.  The activities created by the 
groups were often games that involved questions taken directly from worksheets.   
Since the entire purpose of the activity was to help students memorize content for 
the test, the depth of knowledge standard also received a one.  It clearly corresponded 
with the instruction rubric statement ?students were involved in the coverage of simple 
information which they are to remember.?  The students were not required to organize 
their facts into any sort of an argument to assist with learning.   
The substantive conversation standard received a one because the conversation 
during class (that was on topic) was almost entirely procedural.  I did not witness any 
instances of students grappling with the meaning of an idea or concept in the unit.  They 
did not argue within their groups over which facts should be included in the presentation.  
The lesson also received the lowest score on the connectedness standard.  This was a 
strictly ?school? task.  At no point did the teacher justify it beyond doing well in the 
course.   
Other tasks in the minimal category provided a similar level of intellectual 
challenge to students.  A comparison of Roy?s task with one submitted by Andy (see 
Figure 4) illustrates this point.  Both teachers seemed to equate high level student 
understanding with activities that mainly required students to master large amounts of 
factual material.  Roy?s ?teach a lesson? activity had students pull ten facts from the 
book.  Andy?s Reformer?s powerpoint asked students to describe specific information 
 
144 
 
related to reforms enacted in the 1800s.  In each instance, the teacher used the tasks to 
help students learn nearly an entire chapter of information from the textbook. 
Andy?s task was a little more demanding than Roy?s in the elaborated 
communication category simply because students had to include more information in 
their presentations.  However, the quality of the information was essentially the same.  
The students were either summarizing information or listing facts from the textbook.  
They were not asked to form a generalization about the time period and back it up with 
supporting evidence.  The task did not require analysis or persuasion and therefore fell 
short of the type of elaborated communication envisioned by the authentic pedagogy 
model.   Finally, the task didn?t score very well on the connection to students? lives 
standard since it was focused entirely on life in the 1800s.  The final scores for this task 
were 1,3,1.   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145 
 
 
 
 Reformers of the 1800s  
Rationale:  
Students are to create a power point presentation that identifies 5 areas of the reform 
movement during the 1800s. This will give students an understanding of the social 
changes America experienced during the rapid growth of urbanization of the 1800s. 
Students will be able to better understand the concepts, developments, and consequences 
of industrialization and urbanization. 
  
Procedure:  
You are to identify 5 areas in which significant reform occurred during the 1800s and 
create a 20 slide power point presentation. In each area identify the most significant 
leaders, 3 supporting facts or reasons for the reform, and the legislation that was enacted 
because of the reform.  
Set Up:  
Slide One-Name of Reform  
Slide Two-Leaders of the Movement  
Slide Three-Supporting facts/reason for the reform  
Slide Four-Legislation  
 
Due Date:  
TBA. This should be delivered to my R drive and also be located on the student's P drive.  
 
 
 
Rubric for Assignment  Name_______________________________       
 
This assignment is worth 100 points as a major test grade 
 
CATEGORY  POSSIBLE PTS.  POINTS AWARDED 
Set up Criteria   40   __________________ 
 
Accuracy   40   __________________ 
 
Timely Delivery  10   __________________ 
 
Creativity   10   __________________ 
 
TOTAL   100   __________________ 
 
Figure 4.  Reformers of the 1800s Task  
 
 
146 
 
Limited Authentic Pedagogy 
The teachers in the limited authentic pedagogy category generally had higher task 
scores than the teachers in the previous category.  However, their instruction scores 
remained relatively low.  They struggled to provide the support needed for students to 
accomplish their higher order thinking goals.  This section uses Amy?s tasks as the basis 
for understanding the types of intellectual challenges students experienced in classrooms 
featuring limited authentic pedagogy. 
Amy was a veteran teacher with over fifteen years of teaching experience.  She 
had taught World History at the study school for five years.  Her professional training 
was in elementary education.  The undergraduate and graduate programs she completed 
provided enough social studies credits for her to be certified to teach in this field. 
As might be expected, Amy?s classroom had a different feel from Roy?s in terms 
of basic management.  Amy was an excellent classroom manager who exhibited a no-
nonsense approach to instruction.  All three of her tasks involved substantial groupwork 
and movement of students within the class.  She was able to seamlessly transition through 
the different stages of these lessons with little difficulty.  The classroom atmosphere was 
relaxed, yet focused.  Her students seemed to genuinely enjoy coming to class.   
Some veteran teachers settle into a teaching routine and become resistant to ideas 
that challenge their status-quo.  Amy was not this type of teacher. She sought out 
opportunities for professional development and growth.  Amy was involved in the same 
inquiry-based lesson study project as Lauren and some of the other teachers in this study.   
The teachers in this project worked closely with a professional historian to develop in-
 
147 
 
depth content knowledge of several major topics from the World History curriculum.  
They then used this knowledge to prepare inquiry-based lessons.   
In observing Amy?s instruction, it was evident that she knew her subject very 
well.  It was also obvious that she had incorporated specific strategies from the lesson 
study project into her instruction.  However, the tasks she submitted for this project 
suggested that her adoption of an inquiry-based approach was still mostly limited to the 
lessons she had developed with her peers.  Table 19 provides an overview of the three 
tasks she submitted and how they scored on the task and instruction rubrics.    
 
Table 19   
 
Overview of Amy?s Authentic Pedagogy scores 
 
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Absolute Monarchy of your Own 4 8  
Ideal Form of Government Debate 9 10  
Renaissance Ball 4 4  
Average Task/Instruction Scores 5.6 7.3 12.9 
 
Amy had two lower scoring tasks and one that was significantly higher which 
helped her overall authentic pedagogy score to extend into the limited range.  The first 
task she submitted was called an ?Absolute Monarchy of your Own?.  The purpose of this 
task was to reinforce students? understanding of the term ?absolute monarchy? and how 
one would likely function.  Amy?s intent was to have students synthesize what they had 
learned about various absolute monarchies in history into a fictitious example that they 
could relate to personally.  She also wanted students to evaluate absolute monarchies and 
the effect they could personally have on others.  Students worked in groups to create a 
 
148 
 
fictitious kingdom where they were the absolute ruler.  The handout for this activity 
required students to explain how their kingdom would function (i.e. how will you direct 
your subjects to worship?).  This allowed them to see the type of power a king would 
actually have under this system. 
Amy?s second task was a debate where students represented the views of different 
philosophers (Locke, Plato, etc.) and discussed the ideal form of government.  At the end 
of the debate, they had to step out of their assigned roles to argue for the form of 
government they considered the best.  The final task was another perspective taking 
exercise where students assumed the role of a Renaissance figure to participate in a 
Renaissance Ball.  Amy wanted students to empathize with the historical figures and 
what it was like to live during this time period.  The activity included an initial ?meet and 
greet? session where students got to know the cast of characters at the Ball.  Then, 
students settled into their seats and were called on by the teacher to discuss their greatest 
accomplishments using various props to support their presentations. 
   At first glance, the tasks submitted by Amy appeared to differ significantly from 
those of Roy or Andy.  They certainly required a good deal of active participation and 
engagement on the part of the students.   However, two of the tasks were essentially 
creative alternatives to lecture which required little construction of knowledge (1,3,1).  
The rubric for the absolute monarchy assignment provided the greatest insight into the 
teacher?s expectations for this task.  In order to get full credit, students simply had to 
follow directions and include each of the required elements listed on the assignment 
handout.   The task itself was not very intellectually challenging because students could 
develop their monarchy in any way they deemed appropriate (which admittedly was the 
 
149 
 
point).   It was perhaps challenging from a creativity standpoint, but it did not require the 
disciplined use of higher order thinking processes to solve a problem.  Amy attempted to 
address the synthesis and evaluation goals for this assignment during the lesson and this 
is reflected in her instruction score.   
The Renaissance Ball task mainly involved students reporting factual information 
about their character.  The students were not trying to master the information in order to 
formulate an argument related to a problem or central question.  The manner in which 
this task was implemented suggested that its dominant purpose was to help students 
remember the main achievements of each Renaissance figure in order to perform well on 
the upcoming unit test.  The final task, the Ideal Form of Government, did involve 
substantial construction of knowledge and will be discussed later in this section.   
Amy?s elaborated communication scores were also fairly low (2,4,2).  The 
Absolute Monarchy and Renaissance Ball tasks required students to do a great deal.  For 
instance, as part of the Renaissance Ball, students prepared a short poem, a mask of their 
Renaissance figure, and a bust of their figure with accomplishments listed on it.  They 
also designed props for their presentations and in some case wore costumes.  However, 
the elaborated communication standard isn?t as concerned with how much students do, 
but rather the extent to which they are required to explain and defend their understanding 
of historical concepts.  Both of these tasks elicited very brief responses from the students 
in Amy?s class.  In most cases, the students answered the questions in a couple of 
sentences. 
Finally, most of the tasks did not require students to connect what they had 
learned to something significant in their lives (1,2,1).  Why might it be important to 
 
150 
 
understand absolute monarchies or the lives of Renaissance figures?  It was likely that 
some students recognized the contemporary relevance of these topics, but the tasks 
themselves did not press them to investigate these connections in any detail.  
Amy?s highest scoring task featured a debate on the ideal form of government.  
The class I observed was fairly evenly divided among boys and girls and included a total 
of sixteen ninth grade students.  The students were mostly white (11/16) and considered 
regular education in terms of their ability.  The task was introduced near the beginning of 
a unit on the Enlightenment. The students prepared for the debate during the course of 
several class periods.  I observed the debate itself on the last day of the unit.  Table 20 
provides a breakdown of the authentic pedagogy scores associated with this task. 
 
Table 20   
 
Scores for ?Ideal Form of Government? Task 
 
Task Scores  Instruction Scores 
Construction of Knowledge 3  Higher Order Thinking 3 
Elaborated Communication 4  Deep Knowledge 3 
Connection to Students? Lives 2  Substantive Conversation 3 
   Connectedness to the Real World 1 
Total Task: 9  Total Instruction: 10 
 
The teacher?s intent was for students to learn the views of nine historic thinkers 
on the ideal form of government and their beliefs regarding the role people should play in 
governing.  After considering the various perspectives, students were to evaluate the 
different forms of government in order to decide which one they considered to be the 
best.  Amy had the students defend their choice through an editorial assignment.  The 
 
151 
 
ideal form of government task is based on a History Alive activity.  Amy?s editorial was 
added in place of the debriefing.   
Amy set up the debate so that students were arranged in a two-deep semi-circle 
with ?actors? seated in front of their ?press agents?.  The actors wore paper masks 
resembling the historic philosopher they were attempting to portray.  Some also wore 
togas.  They had nameplates on their desk for easy identification.  Due to absences, Amy 
allowed some of the more academically able students to not have a press agent.  At the 
beginning of class, the teacher passed out a data retrieval chart for students to complete 
during the debate.  The press agent?s responsibility was to introduce their assigned 
thinker.  Each agent delivered a prepared statement with pertinent background 
information for their actor.  During the debate, the press agents really did not participate 
other than taking notes. 
Amy served as the moderator of the debate.  She called on one of the actors 
playing an historic figure, their press agent would do the introduction, and then the actor 
would explain the symbol on his/her nameplate.  The symbol had to represent their views 
on the ideal form of government.  Amy would then ask for questions.  After a few 
questions and some discussion, she would move to the next historical figure and repeat 
the process.  The discussion surrounding each philosopher was usually 8-10 minutes in 
duration.  The following dialogue provides a general sense of how this lesson was 
implemented and the type of discussion that took place.  It is not a verbatim transcript, 
but it does capture the essence of what was said during the discussion. 
     Teacher: Any more questions?  Alright Hobbes.  Your press agent isn?t here today.  
        Can you tell us a little bit about yourself? 
 
152 
 
     Student playing Hobbes reads his biography from a card. 
 
     Teacher:  What is your symbol? 
 
     Hobbes:  People with crowns on their heads ? governing themselves.  I resent that. 
 
     Teacher:  What is the ideal form of government?  Why do you consider it ideal and    
          can we trust people to govern themselves?     
     Hobbes:  The ideal is an absolute monarchy.  People can?t govern themselves. 
 
     Teacher:  Why? 
 
     Hobbes:  People are selfish.  They are out for their own selfish interest. 
 
     Plato:  Doesn?t that contradict your previous statement? 
 
     Teacher:  What do you mean? 
 
     Plato:  The absolute monarch is a person. Could he be corrupt himself?  
 
     Hobbes:  One person is no big deal.  If everyone is corrupt and in charge, things are 
 worse. 
     John Locke: But wouldn?t he act selfishly? 
 
     Hobbes:  (hard to understand his response ? seems to be trying to understand the 
 question) 
     Locke:  If the absolute monarch is selfish and corrupt ? how would that work out? 
 
     Hobbes:  If one person is in charge, even if he is corrupt ? it is still better than people 
governing themselves. 
     Plato:  Why do you think people are so corrupt? 
 
     Hobbes:  People have to carry guns and lock their doors. 
 
     Plato:  I don?t carry a gun or lock my door. 
 
     Hobbes:  the high rate of crime. 
 
 
153 
 
     Wolstonecraft:  The absolute monarch is better than the people so he should rule ? is 
that what you are saying? 
     Hobbes:  (mumbled response ? teacher had to ask the student to speak up) 
 
     Rousseau:  How do you believe in passing on a monarchy?  Would that be fair?  
 
      Hobbes ? (response isn?t clear) 
 
     John Locke:  Everyone is corrupt.  That is correct? 
 
     Hobbes:  Yes 
 
     Locke:  Does that include you? 
 
     Plato: You seem to put a lot of trust into one person. 
 
     Hobbes- keeps saying the same thing ? one ruler is better than everyone governing 
  themselves. 
     Locke:  Uses a tug of war analogy.  He suggests that people pulling on both ends 
 would achieve a position in the middle (negating some of the corruptness) 
 whereas if there is only an absolute monarch on the line there would be nothing 
 to counter his selfishness. 
     Teacher:  Hobbes is staying true to his beliefs despite much controversy.  Any  
           questions? (a few more comments that are similar to the previous ones) 
     Teacher:  O.K. ?whether we agree or not, I think we understand Hobbes? position. 
 
The full segment lasted for approximately eight minutes.  Much of this time was 
spent with students trying to get Hobbes to justify why an absolute ruler could be trusted 
to lead.  This excerpt illustrates the redundant nature of some of the dialogue.  It also 
shows how some students struggled to fully grasp their character?s perspective.  In this 
case, the student playing Hobbes had a basic understanding of his beliefs, but when 
 
154 
 
pressed to defend his position he was unable to discuss factors that made his rule 
legitimate (i.e. divine right).  He also had a difficult time supporting his statements with 
historical evidence. Other students were more proficient.  The teacher allowed students to 
freely debate with each other.  This made the debate feel more authentic. 
Most actors accurately portrayed the point of view of their historic figures.  The 
students were respectful towards each other and some genuinely seemed interested in 
trying to understand the perspective of the other thinkers.  Certain students dominated the 
question/answer period and were obviously more knowledgeable about the subject.  In 
particular, the student playing John Locke succeeded in raising some significant 
questions that led to some higher order conversation.  A debriefing which allowed 
students to step out of character would have been helpful in enabling the class to come to 
some shared understandings of what these different philosophers believed.  This would 
also allow the teacher to discuss some of the interesting comments made during the 
debate and address any misconceptions prior to the editorial assignment.   Amy seemed 
to expect students to be able to synthesize and form important connections from this 
lesson on their own.  She may have underestimated the complexity of the activity and the 
level of scaffolding needed to help students accomplish its higher order goals.  Another 
possibility is that, having worked with these students for a while, she may have formed 
preconceptions regarding their abilities. Amy might not have realized how powerful a 
good debriefing could be in helping students, from a range of abilities, to achieve at a 
higher level.  Overall, the class seemed to really enjoy the assignment and most students 
were paying attention even if they didn?t contribute to the conversation. 
 
155 
 
When scoring the lesson, I worked from the bottom level of the instruction rubric 
to the top.  The difference between levels, particularly for a three or higher, often comes 
down to a numbers game.  In order to assign a ?five? to a standard, I had to observe 
?almost all? of the students engaged in the desired behavior (i.e. higher order thinking, 
substantive conversation).  Some categories, such as substantive conversation, are easier 
to score because they are overt behaviors.  The higher order thinking standard is probably 
the most difficult.  During the ideal form of government debate, I had to listen closely to 
student comments for indicators that higher order thinking was taking place (i.e. 
synthesis, evaluation, analysis).  It was likely that comments made during the debate 
provoked higher order thinking in members of the audience (i.e. the press agents), but 
they didn?t have the opportunity to make any comments during the lesson.  My scoring 
on this standard and the others was limited to what I could physically observe.   
The Ideal Form of Government lesson received a three on the higher order 
thinking standard because I did not observe ?many? students engaged in higher order 
thinking; a requirement for the next level.  The rubric defines ?many? as at least 1/3 of 
the class.  In this case, the 1/3 standard required roughly five of the sixteen students to 
demonstrate higher order thinking.  This standard was difficult to accomplish because the 
press agents didn?t participate in the debate.  The omission of the press agents left nine 
remaining students on the panel of historic characters.  Three of these students provided 
little input during the debate beyond their own presentations.  The six panelists who were 
active throughout the debate demonstrated higher order thinking sporadically at best.   
Most of the opening statements were scripted.  The higher order thinking mainly occurred 
when students asked original probing questions to uncover weaknesses in another 
 
156 
 
character?s argument or when they justified their own position on the ideal form of 
government.  The criteria for a three, ?some students perform some HOT operations?, 
best describes what I observed in this lesson. 
The depth of knowledge standard also received a three.  This standard measures 
the extent to which students achieve a nuanced understanding of the lesson content.   A 
lesson can also score relatively well if the teacher demonstrates deep knowledge.  A level 
three score indicates that knowledge was treated ?unevenly? during the lesson.  During 
the ideal form of government debate students demonstrated deep knowledge in some 
areas, but only superficial understanding in others.  Most of the students seemed to be 
able to describe the key differences between the various forms of government featured in 
the debate.   They also seemed to understand the idea of divine right to rule.  However, it 
was evident that the students playing Montesquieu, Hobbes, Plato, and perhaps others did 
not have a deep understanding of their character.  They could not address questions that 
required them to go off script and make generalizations based on information from their 
research.   
High scoring lessons in the depth of knowledge category must also maintain a 
sustained focus on a significant topic.  This lesson initially seemed to meet this criterion 
since it was oriented around an important central question.  However, the bulk of the 
class period was spent exploring the views of the different philosophers.  The debate was 
segmented to ensure enough time was allocated to hear from all of the actors.  As a result, 
the students had a limited opportunity during class to synthesize and apply their 
knowledge to the central question.  The lesson could have reached a four if the teacher 
 
157 
 
took a more active role in asking probing questions and challenging students? to defend 
their point of view at some point during the lesson.  
   I also assigned this lesson a three for substantive conversation.  A level four 
requires all elements of substantive conversation to be present (sharing, coherent 
promotion of collective understanding, higher order thinking).  The coherent promotion 
of collective understanding aspect of the standard was missing in this lesson due to the 
fragmented nature of the debate (moving from philosopher to philosopher with no free 
debate out of character).  The students didn?t really form any conclusions on the central 
question during the class.  The lesson definitely featured sharing of ideas between 
students and at least one example of sustained conversation which is defined as at least 
three consecutive interchanges (a statement by one person and a response by another).  
As a result, it best fit the requirements for a level three score.   
The final scoring category is value beyond school.  Amy never provided any 
justification for studying the philosophers? views on government.  According to the 
instruction rubric, in a class with little value beyond school, ?activities are deemed 
important for success only in school (now or later), but for no other aspects of life. 
Student work has no impact on others and serves only to certify their level of competence 
or compliance with the norms and routines of formal schooling.?  The editorial 
assignment seemed to fit this description; certifying competence according to the norms 
of formal schooling.  Students may realize this lesson has value outside of school, but 
they didn?t verbalize this understanding during class and as a result I assigned this 
category a one. 
 
158 
 
The big difference between the teachers in the limited category and the moderate 
category was in the instruction students experienced.  Teachers in the limited category 
developed some challenging tasks, but they were not as successful in helping students 
accomplish their objectives.  Phillip?s tasks provide another example of this overall 
theme. His authentic pedagogy scores are summarized in Table 21.  Phillip submitted a 
document analysis task, an 18th century reformer?s task similar to Andy?s, and a painting 
analysis.  The first task was an analysis of George Washington?s Farewell Address.  
Students read his address and answered comprehension questions.  The class then 
discussed Washington?s views regarding foreign alliances and political factions.  After 
the discussion, the students read an article about an Iraq war appropriations bill being 
considered by Congress.  The concluding discussion focused on whether the United 
States was doing a good job of heeding Washington?s advice today. 
Phillip?s second task had students research a portion of the textbook chapter 
dealing with reformers from the 1800s.  Students had to develop a powerpoint 
presentation describing the three most significant ways an individual or event contributed 
to reforming America.  The central question associated with this task asked: Did the era 
of reform serve to better America in ways that are still represented in today?s United 
States? 
The final task was an analysis of the painting ?American Progress? by John Gast 
(see Appendix P).  Phillip used this painting to help students understand the concept of 
Manifest Destiny.  The class examined positive and negative consequences associated 
with America?s expansion during this time period. 
 
 
 
159 
 
Table 21 
 
Phillip?s Authentic Pedagogy Scores 
   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Washington?s Farewell Address 8 8  
Reformers Lesson 7 4  
Manifest Destiny Painting Analysis 6 7  
Average Task/Instruction Scores 7 6.3 13.3 
  
Phillip?s tasks scored at the mid-point of the construction of knowledge standard 
scale (2,2,2 out of 3).  Each task included at least ?some expectation for students to 
interpret, analyze, synthesize, or evaluate information, rather than merely to reproduce 
information.?  The Manifest Destiny Painting Analysis task included a series of questions 
that students were to answer in order to better understand the artist?s perspective on the 
time period.  One of the more challenging questions called for students to compare 
Gant?s painting to Emmanuel Leutze?s ?Washington Crossing the Delaware?, a painting 
they had studied earlier in the semester.  The task probably would have reached a level 
three if the questions more consistently called for students to defend their responses.  The 
document analysis task also involved at least some construction of knowledge.  Students 
had to apply their knowledge of Washington?s address to a contemporary foreign policy 
issue.  The final task based on the 19th century reformers barely met the standard for a 
two in this category.  Phillip?s intent was for students to develop an argument justifying 
the significance of their reformer.   
Each of Phillip?s tasks scored near the top of the scale in the elaborated 
communication category (3,3,3).  His tasks required students to explain their 
 
160 
 
understanding of historical concepts in ways that exceeded short answers or one word 
responses.  In most cases, the students had to provide a short summary of their 
conclusions.  The limited space for student responses on the questioning scaffolds for the 
document analysis and painting analysis provided the best indication of the teacher?s 
expectations.  The tasks did not require extended responses where students had to make 
generalizations and support them with evidence.   
  The final category was connection to students? lives.  Phillip?s scores spanned 
the entire range of the scale (3,2,1).  His highest scoring task was the analysis of 
Washington?s Farewell Address.  This task had students consider issues of contemporary 
relevance; the influence of political factions and U.S. involvement in foreign alliances.  
Students expressed their own views on these topics while considering whether the U.S. 
was doing a good job of following Washington?s warnings.  The Reformer?s lesson also 
provided students with at least some opportunity to consider the modern significance of 
reforms enacted in the 1800s.  The lowest scoring task in this category was the painting 
analysis, which was situated entirely in the past.   
In summary, the tasks associated with the limited authentic pedagogy category 
were typically more ambitious than those from the minimal category.  They were more 
likely to elicit at least some higher order thinking from the students.  In implementing the 
tasks, the teachers sometimes struggled to maximize their instructional value.  Important 
learning opportunities were missed for a variety of reasons (i.e. inadequate scaffolding, 
the absence of a debriefing).  The teachers seemed open to engaging students in inquiry, 
but had not reached the level of expertise of their peers at the moderate level. 
 
161 
 
Moderate Authentic Pedagogy  
The final authentic pedagogy category featured three teachers from the sample.  
The tasks submitted by these teachers often achieved the highest levels of the 
construction of knowledge and elaborated communication standards.  Students were 
engaged in activities that required higher order thinking and significant writing designed 
to argue, convince, or persuade rather than just summarize or report information.  Lee 
and Ryan, in particular, seemed to effectively couple these tasks with rigorous 
instruction.  It takes a great deal of skill to exceed a level three score for any category on 
the AIW instruction rubric.  These teachers routinely received four?s and Ryan was the 
only teacher to achieve the maximum score for any of the authentic pedagogy standards. 
Ryan?s authentic pedagogy scores were fairly representative of this category.  As 
a result, I will focus on describing what students experienced in his class.  Ryan was a 
white, male teacher in his mid thirties.  His professional degrees were in general social 
science education.  During the study, he achieved his doctorate.  Ryan had over eleven 
years of experience and was responsible for teaching both regular U.S. History and 
Advanced Placement European History classes.  He also had experience teaching 
undergraduate classroom management and social studies methods courses. 
Like Amy, Ryan was very effective in managing the learning environment.   
This went well beyond avoiding disruptions and ensuring students were on task.  Ryan?s 
classroom conveyed his passion for learning and discovery.  It was essentially a miniature 
library.  Ryan?s desk was surrounded by stacks of books covering a wide range of social 
studies topics.  Another set of books spanned the row of desks situated along the entire 
back wall of the classroom.  The overall classroom environment suggested that the 
 
162 
 
teacher was probably well read (most of the books were his) and the students used more 
than just the textbook to understand the past.  This was confirmed during one observation 
when I watched students use these books to look up information.  Ryan?s classroom 
environment (in and of itself) likely triggered the intellectual curiosity and interest of at 
least some students.   
 In general, Ryan was very good at motivating students to think.  His students 
appeared to love his lectures because they included a good mixture of humor and 
sarcasm.  However, they also required significant student involvement and discussion.  
Ryan pushed his students to consider more than just the facts.  Ryan asked the hard 
questions and required his students to clearly explain their thinking. His discussions were 
also demanding.  He acted as a true facilitator, allowing students to do most of the work 
while occasionally intervening to ask probing questions or shift the conversation in a new 
direction.   
 Ryan submitted two tasks for this study which were used primarily with 
advanced placement students.  The first task was a think aloud activity where students 
assumed the role of Czar Nicholas during World War I.  The students read a document 
(the think aloud) which provided a monologue of Czar Nicholas considering Russia?s 
problems and various options at his disposal.  The students used this document to 
formulate a realistic decision for improving the situation facing Russia in 1916.  The 
second task was a World War II political cartoon analysis.  The cartoon featured the 
major leaders of WWII seated around a dominoes table (Roosevelt, Churchill, Stalin, 
Hitler, Mussolini, Tojo).  The position of the leaders and the dominoes on the table 
suggested that the Allies were winning the war and the Axis powers were nervous (See 
 
163 
 
Appendix Q).  The class analyzed the cartoon and then the students had to write an 
original dialogue featuring all of the leaders.  Each student was assigned a writing prompt 
from either an Allied or Axis perspective.  The students had to comply with a number of 
requirements in writing the dialogue. 
The third task was used in both AP and regular classes.  Students completed the 
?Me Card? task during the first few days of school.  The task required students to design 
a three by five card that answered the question:  How should the world see me?  The card 
could include virtually anything (i.e. collage clippings, origami, pictures, quotes, etc.).  
The students also had to answer eight sets of follow-up questions that covered a wide 
range of topics (i.e. Is the card a primary or secondary account?; Would the collection of 
these cards be an accurate depiction of this class?).  The students presented the cards in 
class and then Ryan led a debriefing using the follow-up questions as a guide.  The 
discussion introduced students to some of the challenges associated with interpreting the 
trustworthiness of historical artifacts.  The task exposed students to the epistemological 
foundations of the discipline.     
 The authentic pedagogy scores for these tasks are listed in Table 22.   The three 
tasks scored very well on the construction of knowledge scale (3,3,2 on a three point 
scale).  Tasks that predominately engaged students in higher order processes such as 
interpretation and evaluation of information received the highest score on this standard.  
The Czar Nicholas think aloud was a three because students had to evaluate a situation 
and determine a solution to a historic problem.  The political cartoon analysis also placed 
significant higher order thinking demands on the students.  Students had to assume a 
particular perspective (Axis or Allied), synthesize relevant factual information, and 
 
164 
 
generate a plausible dialogue addressing a significant issue from WWII.   The students 
had to understand and accurately represent competing views in order to include all of the 
leaders in the dialogue.  The only task that didn?t achieve the maximum score for the 
construction of knowledge category was the Me Card.  This assignment featured some 
interpretation and synthesis of information.  However, in order for a task to reach a three, 
it must force students to consider the nuances of a topic beyond surface level exposure or 
familiarity.  This task was part of an introductory lesson for the semester.  It was 
designed to simply introduce students to some significant historical thinking concepts.  
Ryan intended to build on this knowledge as the semester progressed.    
   
Table 22 
 
Ryan?s Authentic Pedagogy Scores 
 
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Czar Nicholas Think Aloud 8 14  
Political Cartoon Analysis 8 12  
Me Card 7 14  
Average Task/Instruction Scores 7.6 13.3 20.9 
  
 The scores for the elaborated communication category were similar (4,4,2 on a 
four point scale).  The first two tasks required significant writing.  In order to achieve a 
four, a task must call for generalization and support.  In the first task, students were 
presenting an argument regarding what the Czar should do to improve Russia?s situation 
using historical evidence as support.  The WWII dialogue also went beyond just reporting 
or summarizing information.  The students had to ground their interpretation of the focus 
 
165 
 
question/prompt in factual information and details from the time period.  The ?Me Card? 
assignment could best be described as a short answer exercise which fits the criteria for 
level two.   
Ryan?s authentic pedagogy scores were lower in the connection to students? lives 
category (1,1,3 on a three point scale).  The problems or questions associated with the 
first two tasks were not the type that students are likely to encounter in their own lives.  
Students won?t have to figure out a way to save Russia in 1916.  These tasks fit the 
criteria for a level one score because they ?offer very minimal or no opportunity for 
students to connect the topic to experiences, observations, feelings, or situations 
significant in their lives.?  In the cartoon task, the students were writing entirely from the 
perspective of the WWII leaders.  The same was true for the Czar Nicholas task.  
Students might find a way to personally relate to the Czar?s circumstances, but the task 
did not require it.   
The Me Card task was designed to promote a personal connection with the 
student and to help students understand how historical thinking skills might apply today.  
Significant elements of the students? lives were used as the basis for this activity.   One 
question in particular had students evaluate the trustworthiness of information on the 
cards.  This was something that students will have to do in their daily lives.    
I?ve highlighted Ryan?s WWII political cartoon analysis task as an example of a 
lesson in the moderate authentic pedagogy category.  Scores on this task are provided in 
Table 23.  This task was implemented with an Advanced Placement European History 
class.  The class was relatively small.  Sixteen students were present on the day I 
 
166 
 
observed.  The students were predominately white and female.  However, some Asian 
and African American students were also in the class. 
 
Table 23 
 
Scores for ?WWII Political Cartoon Analysis? Task 
 
Task Scores  Instruction Scores 
Construction of Knowledge 3  Higher Order Thinking 3 
Elaborated Communication 4  Deep Knowledge 4 
Connection to Students? Lives 1  Substantive Conversation 4 
   Connectedness to the Real World 1 
Total Task: 8  Total Instruction: 12 
 
The political cartoon task was implemented after students had already received a 
good deal of instruction on World War II.  They had covered America?s initial entry into 
the war and the major events through the victory in Europe.  Ryan intended to discuss the 
war in the Pacific after the cartoon activity.  The class began with students taking a 
practice Advanced Placement test. The test lasted for approximately thirty minutes. When 
all the students were finished, Ryan went over the answers and provided the students with 
some test taking tips and strategies. The next portion of class focused on the political 
cartoon analysis activity.  The cartoon analysis lasted for nearly half an hour.  At the 
beginning of the analysis, the class attempted to determine the cartoon?s source, its 
context (time frame), and its bias (Axis or Allied?).  Ryan listened to initial ideas and 
then the class examined the details of the cartoon more closely.  The students identified 
all of the World War II leaders seated at the table in the cartoon.  They argued about 
specific elements of the drawing (i.e. is that a bead of sweat on Mussolini?s forehead or a 
strand of hair?).  Once students identified what was being portrayed in the cartoon, Ryan 
 
167 
 
pushed students to analyze the details in greater depth.  For instance, he asked why the 
artist decided to use dominoes instead of cards.  He also had students consider the 
positions of the various leaders in the picture (why is Stalin standing behind Churchill?).  
Each of these questions elicited significant discussion. Ryan emphasized that nothing was 
an accident in a created piece of artwork.      
The use of a heuristic (Source, Analyze, Contextualize, Corroborate, Think 
Deeply) helped guide the discussion and prevent it from proceeding in a random, linear 
fashion.  Instead it was more recursive.  The students took an initial guess at whether the 
cartoon was public or private, the time period, and what the artist was attempting to 
convey.  Then, they examined elements of the cartoon more closely.  As they understood 
more of the symbolism, they returned to aspects of the heuristic and re-examined earlier 
comments.  The class understanding of the cartoon built throughout the activity.  Ryan 
guided the discussion by asking questions and modeling the analytic process.  When 
student comments were brief, he asked follow-up questions to force them to elaborate and 
support their opinion.  The following brief dialogue regarding whether the cartoon was 
public or private illustrates this point:    
Student:  Public. 
Teacher:  convince me. 
Student:  The drawing isn?t very exceptional.  It doesn?t look good enough for  
               someone to have commissioned it. 
Teacher:  O.K. - the old copy of the drawing hurts.  It does look grainy. 
Student2:  Maybe it was published in a newspaper.  That might explain the poor  
              quality of the image. 
 
168 
 
Teacher:  Does this look like something you might expect to find in a newspaper? 
Several students agree and provide reasons to support their opinion. 
           Teacher:  We touched on the point of view or possible bias in this picture earlier  
      (how Hitler was being depicted).  Public seems to be a good guess. 
 
Student comments during the cartoon analysis were often more than just a couple 
of words.  The analysis felt more like a true discussion.  While Ryan was clearly in 
charge, the student comments reflected sensitivity to the ideas of others.  For instance, 
one student began a statement about why she thought an element of the cartoon 
represented Churchill?s last gamble by saying ?I?d like to add to Maggie?s comment 
about the U.S. holding all the dominoes.?  At another point in the analysis, a student 
admitted she was confused about part of the cartoon and several students explained it to 
her further.  This type of student to student interaction was more commonly observed in 
Ryan?s classes than the other classes I observed from this sample of teachers.   
Once most of the major ideas in the cartoon had been teased out, the teacher 
introduced the writing assignment. The students were to complete a dialogue based on the 
scene in the political cartoon. Ryan described it as a movie short (imagine a camera that 
zooms from face to face). The students were assigned a particular perspective (Axis or 
Allied) and a central question to address.  Students worked on this assignment for the 
remainder of the class period. 
This political cartoon task was similar to the Manifest Destiny painting analysis 
task (from the limited category) taught by Phillip. Both activities involved the analysis of 
visual media.  Ryan perhaps had some advantages when he implemented the cartoon 
analysis lesson. His students seemed motivated to practice a skill they would use on the 
 
169 
 
AP exam.  They also had more background knowledge at their disposal since they 
experienced the activity later in a unit.  Taking this into account, Ryan?s lesson still 
seemed more effective.  This was mainly due to the way he used scaffolding to support 
student inquiry.   
Ryan?s method of using open-ended discussion, guided by a heuristic, seemed to 
engage the students more than the painting analysis scaffold used in Phillip?s class (see 
Appendix P).  Phillip?s scaffold was used more as a worksheet to record answers.  
Phillip?s class never really analyzed the context of the American Progress painting.  
Phillip told the students it was painted in 1870 by John Gast.  The students never tried to 
figure out who John Gast was or his motives for painting the picture (was it 
commissioned?, etc.).  They didn?t discuss the time period of the painting and how it 
might have influenced Gast?s worldview.  Ryan?s heuristic seemed to help students 
develop a deeper interpretation of the cartoon.  In Ryan?s class, sourcing and 
contextualization was a central feature of the discussion.  In Phillip?s class, they were 
largely omitted.  
The cartoon analysis was a fairly strong example of authentic intellectual work.  
The task received eight out of ten possible points.  The instruction score was also above 
average (12 out of 20 points).  It probably would have scored higher if the first part of 
class was not dedicated to the practice exam.  A review of the instruction scores provides 
a better sense of what made the lesson score so well.  The first standard is higher order 
thinking.  I assigned this lesson three out of five possible points.  A level three score 
indicates that the majority of the class was spent with students engaged in lower order 
thinking, but one significant question caused some students to engage in higher order 
 
170 
 
thinking.  The cartoon analysis fits the description for this level since higher order 
thinking was more than just a minor diversion in the lesson.  The class worked together 
for an extended period of time to apply the heuristic and determine the meaning of the 
cartoon.  They applied the knowledge they had learning previously in the unit to a new 
task.  In doing so they utilized a range of higher order operations. The students were able 
to deduce the time range of the cartoon based on nuances in the picture (i.e. Mussolini is 
still there, so it can?t be later than 1943?).  The students also engaged in higher order 
thinking when they tried to determine the artist?s motives (i.e. why did the artist use 
dominoes instead of cards?) and perspective (was this published in Britain or the United 
States?  Why?).   
In order to assign a level four score, many students must be engaged in higher 
order thinking for a substantial period of time (at least 1/3 of the lesson).   The cartoon 
analysis by itself lasted for 26-27 minutes.  The AP exam and writing activity made it 
impossible for this lesson to meet the substantial portion threshold.  It is possible that 
these activities evoked at least some HOT, but this could not be observed.   
The lesson received a four on the depth of knowledge standard.  It clearly met the 
criterion for this standard.  The students analyzed a complex political cartoon which 
required a nuanced understanding of events from WWII and the ability to recognize the 
perspectives of the major leaders. The analysis was sustained for a significant period of 
time.  Many students were actively engaged in this activity (at least 6 of the 16).  Most 
importantly, they were doing most of the work.  Ryan resisted the temptation to give 
students his own interpretations.  The students formed reasoned, supported conclusions 
about the meaning of the cartoon with limited guidance and support.  The follow-on 
 
171 
 
movie short activity was a perspective-taking exercise which also required a great deal of 
background knowledge and the ability to empathize with the views of historical figures. 
This standard did not reach a five because "almost all" of the students had to demonstrate 
depth of knowledge.  This is very difficult to achieve.  As in most classes, certain 
students dominated the discussion.  These students seemed to have a strong 
understanding of the content, but I was less certain about the others.  I could not state 
with confidence that 14 out of the 16 students? reasoning during the main activity 
reflected ?fullness and complexity of understanding.?  
As previously stated, Ryan was particularly good at leading productive 
discussions.  This was reflected by the substantive conversation score (4 out of 5) for this 
lesson.  Many students actively participated in the cartoon analysis and the dialogue 
writing activity. I witnessed the sharing of ideas, students making distinctions and 
building off the comments of their peers, and higher order thinking. Like the depth of 
knowledge standard, this lesson did not reach a five due to the ?almost all? students 
requirement.  
The main weakness in this lesson was the connectedness to the real world 
standard (1 out of 4).  This means the lesson didn?t have a clear connection to anything 
beyond school.  The most obvious connection for these students was the activity?s 
relation to the AP exam.  It was evident that the activity was good practice for analyzing 
visual media.  The score on this standard could have been improved if Ryan connected 
the ability to interpret WWII era cartoons to effective citizenship perhaps by providing 
examples of how visual media is currently used to shape public opinion.  The themes 
 
172 
 
embedded in the task also represent some persistent problems that could have been 
mentioned.   
In summary, Ryan?s cartoon analysis task resulted in students creating an original 
dialogue among the WWII leaders.  Students successfully analyzed and interpreted the 
meaning of the cartoon and responded to some of the major themes associated with 
WWII (i.e. why didn?t the U.S. enter the war sooner?).  The lesson required elaborated 
communication both orally and in writing.  The task didn?t afford students much of an 
opportunity to form a personal connection to the content and its relevance to life beyond 
the classroom was not explicitly established.  Nevertheless, it was still quite challenging 
and Ryan was able to effectively structure the lesson so students were able to meet his 
expectations. 
Lee?s ?Truman Think Aloud? provides another strong example of the type of 
intellectual challenges students experienced in classes with moderate authentic pedagogy.  
This task was a perspective taking exercise where students attempted to get into the mind 
of President Truman during a pivotal event of the Cold War (See Appendix R).  It 
involved an in-depth analysis of the Berlin Crisis.  Students analyzed four historically 
authentic courses of action for dealing with this event and ultimately had to assume the 
role of President Truman to decide the best way to resolve the crisis.   
The think aloud reached the highest level of the construction of knowledge 
standard (3) on the task rubric.  The dominant expectation was for students to analyze an 
historic problem and evaluate the various options available to President Truman.  The 
students ultimately had to present an argument representing the best possible approach 
 
173 
 
for resolving the Berlin crisis.  Lee provided students with ample materials and resources 
to be able to formulate a deep, nuanced position.   
The think aloud also scored well on the elaborated communication standard (4 on 
a scale of 4).  Students had to write a speech explaining their solution to the Berlin crisis 
and in doing so they were making claims and supporting them with evidence from the 
various advisors.  The task clearly exceeded a fill in the blank or short answer activity.  It 
was also more than a report or summary.  Students were engaged in writing meant to 
?convince or persuade? others.      
The only area where the task didn?t score well was in the connection to students? 
lives standard.  The think aloud did not explicitly require students to discuss the modern 
significance of historical events.  The scenario was entirely situated in the Cold War time 
period.  Table 24 depicts the scores Lee received on both the task and instruction rubrics. 
 
Table 24  
 
Scores for ?Truman Think Aloud? Task 
 
Task Scores  Instruction Scores 
Construction of Knowledge 3  Higher Order Thinking 3 
Elaborated Communication 4  Deep Knowledge 4 
Connection to Students? Lives 1  Substantive Conversation 3 
   Connectedness to the Real World 1 
Total Task: 8  Total Instruction: 11 
 
Generalizations 
The difficulties associated with implementing instruction consistent with the 
authentic pedagogy model are well documented (Onosko, 1991; Rossi, 1995; Saye & 
Brush, 2004).  The authentic pedagogy scores from this study suggest that some teachers 
 
174 
 
were more successful than others in overcoming these challenges.  In this section, I 
identify and discuss some trends that were apparent among the teachers in each of the 
authentic pedagogy categories.  Due to the inherent complexity associated with teaching 
and the classroom environment, these assertions should be viewed as very tentative.   
Roy and Andy?s practice varied the most from the authentic pedagogy model.   
Andy seemed to equate good teaching with getting students to pass the graduation exam.  
When asked to submit tasks that demonstrated students thinking at a high level, he 
selected a research project, powerpoint presentation, and a chapter worksheet.  The nature 
of these tasks suggested that he assigned greater significance to tasks that required the 
mastery of larger quantities of factual information.  Higher order thinking and reasoning 
goals were noticeably absent.   Like Andy, Roy?s tasks also tended to reinforce basic 
knowledge (i.e. teach a lesson) with the possible exception of the illustrated timeline task. 
It is possible that the pedagogical beliefs of these teachers conflicted with 
elements of the AIW model.  Their decision to focus primarily on transmitting factual 
knowledge to students could stem from any number of factors (influence of the high 
stakes test, perceived ability of students, belief that basics must be learned before 
advanced work can be considered, time involved for inquiry lessons, heritage based view 
of history, etc.).  It was difficult to determine exactly what motivated their instructional 
decision-making since I was unable to schedule an interview with Andy and Roy?s 
interview was relatively brief.     
My personal sense from observing these lessons was that these teachers were 
engaged in defensive teaching (McNeil, 1986).  Teachers who engage in defensive 
teaching limit the knowledge they make accessible to students in order to efficiently 
 
175 
 
cover information and maintain classroom control.  Roy?s defensive teaching probably 
stemmed from his inexperience and difficulties with classroom management.  He didn?t 
appear willing to ?rock the boat? very often by pushing students towards more 
challenging work (Sizer, 1984).  I believe that students become more proficient in 
completing authentic tasks and respond more favorably to them with routine exposure 
and coaching from the teacher.  I believe this is one reason why Lee (from the moderate 
category) seemed to have more success in implementing the same illustrated timeline 
activity Roy used with his class.  Lee?s students had encountered similar activities in the 
past and had a better sense of what was expected.       
Andy?s motives for engaging in defensive teaching might have been similar to 
Roy?s since his classes tended to be large and fairly diverse.  However, the defensive 
teaching technique that was most noticeable during observations of his lessons was 
simplification of knowledge (McNeil, 1986).  Andy appeared to possess relatively strong 
content knowledge, but he simplified topics in order to more efficiently move through the 
curriculum.  Lesson material was covered with relatively little debate or discussion.  I 
didn?t get the sense that Andy was trying to engage students in the examination of 
conflicting interpretations of the past.  Andy?s goal, from what I could tell, was to strictly 
stick to the requirements identified in the course of study.   
Jason?s teaching was on the borderline between minimal and limited authentic 
pedagogy.  Some of his tasks were challenging (i.e. rewriting the Declaration of 
Independence), but they either weren?t very interesting to students or they didn?t include 
enough support to help most students be successful.  While Jason seemed open to 
allowing more student inquiry, his dominant instructional approach was likely more 
 
176 
 
traditional.  Student comments during the observed lessons, sometimes very revealing, 
supported this conclusion.     
In summary, students who experienced minimal levels of authentic pedagogy 
were rarely pushed to think and reason at levels beyond basic recall and comprehension.   
The tasks submitted by teachers in the limited authentic pedagogy category (Amy and 
Phillip) were not always that different.  However, there was evidence that these teachers 
had internalized certain elements of the authentic pedagogy model and were more 
receptive to an inquiry-based instructional approach.  Their tasks were more likely to 
include higher order thinking elements and elaborated forms of communication.  Students 
had the opportunity to create documentaries, participate in debates, and analyze 
paintings; tasks that were very different from those in the minimal category.    
While students were afforded these opportunities, the teachers in the limited 
category sometimes struggled to provide the support necessary for quality work.  The 
lessons didn?t reach their full potential for a number of reasons.  In Amy?s case, most of 
her activities came from pre-packaged curriculums (i.e. History Alive). These activities 
promoted active learning and at least some higher order thinking. However, the teachers 
at the moderate level likely achieved higher implementation scores, in part, because they 
were involved in the creation (or at least modification) of their tasks.  They were able to 
take their lessons to a deeper level as a result of the ?sweat equity? involved in the 
creation process.    
The lack of a debriefing also caused some lessons to not reach their full potential.  
There may have been an assumption that if thought provoking ideas were presented 
during the course of a lesson, then students could synthesize them on their own.  During 
 
177 
 
one of Amy?s lessons in particular, students debated the views of various philosophers 
regarding the best form of government.  However, they were not afforded the opportunity 
to step out of character and discuss their own perceptions of the ideas being expressed.  
They were expected to complete a demanding follow-up essay without any closure to the 
lesson.  Successful inquiry teachers view the debriefing of a lesson as the big ?pay off?; 
the time when students are pushed to make connections and think at a higher level.   
Attempts to elicit higher order thinking were also sometimes undermined by the 
actions of the teacher.  For example, Phillip moved quickly through the Manifest Destiny 
material in order to be able to spend more time on the Civil War.  This sent a message to 
students about the relative importance of the challenging task.  Even in instances where 
the task was fully implemented, the teachers in this category sometimes sent an 
unintentional message through their assessment practices by emphasizing the easier to 
grade lower order aspects of the activity.       
The teachers in the moderate authentic pedagogy category probably had greater 
success achieving higher scores because their vision of powerful social studies instruction 
most closely aligned with the authentic pedagogy model.  Lee and Ryan were both very 
articulate in expressing their goals for social studies instruction.  They clearly identified 
the need to create competent citizens as the overarching purpose for social studies 
instruction. Their curricular decision-making was driven by goals related to this purpose.  
They wanted students to not only build content knowledge, but the capacity to think and 
make decisions.  In addition, they wanted students to connect history instruction with 
contemporary issues.   
 
178 
 
The experience for students in classrooms which featured moderate levels of 
authentic pedagogy was different from those mentioned previously in several important 
ways.  First, students in Ryan and Lee?s classroom were more likely to be challenged 
with meaningful historical problems that required higher order thinking (Czar Nicholas 
Think Aloud, Berlin Crisis Think Aloud, Industrial Revolution Editorial, etc.).  The 
higher order elements of the activities usually took precedence over everything else.  The 
students appeared to be accustomed to these types of challenges and aware that the 
challenging aspects of the assignment were going to be evaluated. 
The student experience was also different in terms of the support they received 
during the learning process.  Ryan and Lee had a keen understanding of the cognitive 
demands being placed on their students.  They helped students manage these demands in 
a number of ways.  They provided thorough instructions and made their expectations 
clear by going over examples and non-examples of quality student work.  Students 
received hard scaffolds to help them with the thinking tasks embedded in the activities.  
These were not the worksheets or handouts associated with pre-packaged curriculum 
materials.  These were frequently designed by the teacher and placed at strategic points in 
the lesson.  When the hard scaffolding wasn?t enough, Ryan and Lee were able to 
effectively diagnose student difficulties and provide timely scaffolding without diluting 
the overall challenge of the task.  Finally, the students were more likely to participate in a 
debriefing after challenging activities.  The debriefings were another form of scaffolding.  
They were instrumental in helping the class develop some shared understandings about 
the lesson prior to follow-on individual assignments.  
 
179 
 
Ryan and Lee clearly understood the challenges and opportunities associated with 
inquiry based instruction.  They possessed key dispositions often associated with 
successful inquiry teachers.  They were open to constructivist ideas about learning and 
the nature of historical knowledge.  They also possessed the internal drive and 
intelligence needed to successfully negotiate the cognitive challenges associated with 
inquiry based instruction.  The scores they achieved in this study were likely the result of 
sustained professional development through the pursuit of advanced degrees and active 
involvement in a professional learning community focused on advancing inquiry based 
teaching.   
 This chapter has described the range of instructional experiences students 
encountered in their social studies classes.  It seems clear that the experiences of students 
in classrooms with higher levels of authentic pedagogy were quite different from those in 
the lowest category.  The next chapter will present research findings related to the effects 
of different levels of authentic pedagogy on the acquisition of basic content knowledge.  
It will also discuss the results of the higher order editorial assessment. 
 
 
 
 
 
 
 
 
 
 
 
180 
 
CHAPTER FIVE:  STUDENT LEARNING OUTCOMES 
 
 
 In the previous chapter, I organized the sample of teachers in this study into three 
categories (minimal, limited, and moderate) based on their authentic pedagogy scores.  In 
doing so, I provided examples of the tasks and instruction students received at these 
levels to highlight different ways teachers conceptualized intellectual challenge.  The 
purpose of this chapter is to analyze the effects of authentic pedagogy on student 
learning.  I begin with a review of the sample used in this study.  This is followed by the 
results of the study presented in order by research question.   
Description of the Sample 
 This study included eight social studies teachers.  Four of the teachers taught 9th 
grade World History at the junior high level.  The remaining teachers taught 10th grade 
social studies.  Every 10th grade social studies teacher at the high school in 2008 was a 
participant in the study.  Information about the teacher sample (i.e. demographics, 
experience) was presented in the previous chapter and is reproduced in Table 25 for easy 
reference. 
A broad range of student level data (anonymous to the researcher) was collected 
as part of this study.  Table 26 presents descriptive statistics associated with the student 
database.  Additional information about the student database is provided in Appendix T. 
 
 
181 
 
Table 25   
 
Teacher Profiles     
 
 Roy Andy Jason Amy Phillip Lauren Ryan Lee 
AP Score 9.6 10.9 11.6 12.9 13.3 18 20.9 21.2 
Age 26-35 36-35 36-45 46-55 26-35 46-55 26-35 36-45 
Ethnicity White White White White White White White White 
Experience 4 11 14 22 6 15+ 11 12 
Grade 
Taught 9 10 10 9 10 9 10 9 
 
Table 26   
Descriptive Statistics for Student Sample 
 
 2008 
(N=351) 
2009 
(N=454) 
 Percent Percent 
Gender 
     Male 
     No Data 
 
49.3 
5.1 
 
50.0 
2.4 
Ethnicity 
     White 
     African American 
     Asian 
     Hispanic 
     Not Reported/No Data 
 
63 
24.8 
6 
1.1 
5.1 
 
55.7 
26.7 
5.5 
1.5 
10.5  
SES ? based on lunch status  
     Paid 
     Reduced  
     Free 
     No Data 
 
74.1 
4 
16.8 
5.1 
 
69.8 
4.8 
15.2 
10.1 
Special Education 
     Yes 
     No/No Data 
 
6 
94 
 
4.8 
95.2 
English Proficiency 
     Limited 
     Proficient 
     No Data 
 
3.4 
91.5 
5.1 
 
3.1 
85.9 
11 
New to System (arrived 2006-2009) 
     Yes 
     No 
     No Data 
 
17.7 
76.6 
5.7 
 
15.4 
74.4 
10 
     
 
182 
 
Results of Inferential Analyses 
Research Question II.  Do students that have been taught by teachers 
demonstrating higher levels of authentic pedagogy score higher on the Alabama High 
School Graduation Exam (AHSGE) than students taught by teachers with lower levels of 
authentic pedagogy?   Null Hypothesis:  The level of authentic pedagogy a student 
receives in their tenth grade social studies course does not have a statistically significant 
effect on students? graduation exam scores. 
  The first step in analyzing this question was to conduct a content analysis of the 
graduation exam.  The content analysis, using item specifications released to the public, 
confirmed that the test was a measure of lower order knowledge and therefore 
appropriately being used in this study.  Results of the analysis are explained in greater 
detail in Appendix S.   
I used two different statistical approaches to address this research question.  First I 
used multiple regression with students as the unit of analysis.  This had some significant 
limitations since the students were only associated with four possible teachers.  In an 
attempt to gain more meaningful results, I also ran ANOVA tests comparing specific 
classes with varying levels of authentic pedagogy. The results of both of these approaches 
are described in sequence in this section.    
Regression.  In conducting the multiple regression analysis, several initial models 
were produced to ascertain whether any predictor variables overlapped in explaining 
student performance.  After identifying and eliminating the areas of overlap (see 
Appendix U for further discussion of this process), the final model included 427 students 
who took regular 10th grade United States history over the course of the two years 
 
183 
 
covered by the study.  The results of the regression analysis are provided in Table 27.  
The overall model was able to account for 44% of the variance in the social studies 
graduation exam scores.  Demographic variables had the most influence on achievement 
(26%).  When tenth grade social studies averages were added, another 15% was 
explained.  With all of these variables controlled, authentic pedagogy was able to 
contribute an additional 3%.   
   
Table 27   
 
Sequential Multiple Regression Analyses Predicting Impact of Authentic Pedagogy on 
Graduation Exam Results  
 
       Model Variable R Square/ Change Beta Semi-partial 
     
   1. Demographics  .261***   
 Gender  .235*** .229*** 
 Ethnicity  .189*** .139*** 
 LEP  -.047 -.047 
 Special Ed  -.141*** -.138*** 
 SES  .068 .050 
     
   2.  Achievement  .149***   
 10
th 
Average  .444*** .380*** 
     
   3.  Authentic Pedagogy  .027***   
 Task  -.151*** -.134*** 
 Instruction  .163*** .144*** 
     
OVERALL MODEL  .437***   
     
Note.  N=427; *p<.05, **p<.01, ***p<.001 
 
The best predictor of scores on the graduation exam was a student?s tenth grade 
average in social studies.  This was followed by a student?s gender and then the level of 
 
184 
 
authentic instruction they received.  Authentic instruction had a positive effect on student 
graduation exam scores while authentic tasks had a negative influence.  In both instances 
the relationship was significant, although the negative influence of authentic tasks was 
not very strong when compared to the ethnicity and special education variables.  
 ANOVA.  In order to better gauge the effects of authentic pedagogy, I also tried 
using the class as the unit of analysis.  A class level analysis required selecting two 
similar classes for comparison from teachers who utilized different levels of authentic 
pedagogy.  I first paired a class from the minimal authentic pedagogy category (Andy) 
with one from the limited authentic pedagogy category (Phillip).  I conducted statistical 
tests to ensure significant variances didn?t exist between the classes on key variables 
likely to influence achievement (i.e. demographics, social studies grades, etc.).  These 
tests are described in Appendix V.   
 The one-way ANOVA comparing achievement on the graduation exam between 
the minimal and limited authentic pedagogy classes indicated that the minimal authentic 
pedagogy class performed significantly better, F(1, 44) = 9.516, MSE = 2591.5, p = .004, 
?2= 0.18.  Table 28 provides additional details regarding the performance of the two 
classes. 
Table 28 
 
One-way ANOVA Comparing Graduation Exam Scores for Minimal & Limited Classes 
 
Class  Mean SS 
AHSGE 
SD Range Difference 
Limited AP 512.27 46.809 491.52 to 533.03 F=9.516 
Minimal AP 558.62 54.380 535.66 to 581.59  
 
185 
 
This finding should be considered with some caution.  While a number of 
variables were controlled, it is still possible that uncontrolled variables played a role in 
contributing to the difference in outcomes (i.e. teacher variables such as experience in the 
classroom, etc.).  Also, the two teachers were fairly close on the authentic pedagogy scale 
(10.9 compared to 13.3).  A subsequent analysis comparing Andy?s class with another 
period of Philip?s yielded similar results that were not statistically significant.   
I also compared Andy?s minimal authentic pedagogy class with a class taught by 
the highest scoring tenth grade teacher (Ryan).  Both classes were regular U.S. History 
courses, although Ryan taught a class that met on alternate days for the entire school 
year.  Ryan and Andy?s classes were similar in terms of gender, SES, and prior social 
studies achievement (see Appendix V).  The main area of concern was race.  The 
difference between classes was significant with Andy?s class having the larger number of 
African Americans.  In order to make a more valid comparison, I focused my analysis on 
white students only. 
 The results of this analysis revealed a difference in graduation exam scores that 
differed very slightly in favor of the low authentic pedagogy class, but the results were 
likely due to chance, F(1, 29) = .000, MSE = 3033.055, p = .986.  The moderate authentic 
pedagogy class had the advantage of a year round schedule and their social studies grades 
were slightly higher.  This might suggest that there should have been a greater difference 
in the mean scores in favor of the moderate AP class.  On the other hand, authentic 
instruction focuses primarily on learning outcomes that are not measured by the 
graduation exam.  Since the cut score on the graduation exam was a 509 and the mean 
 
186 
 
score of the moderate class was 558, white students in Ryan?s class were clearly not put 
at a disadvantage on this test.  
 
Table 29 
 
One-way ANOVA Comparing Graduation Exam Scores for Minimal & Moderate Classes 
 
Class  Mean SS 
AHSGE 
SD Range Difference 
Moderate AP 558.45 51.855 534.18 to 582.72 F=.000 
Minimal AP 558.82 60.720 518.03 to 599.61  
 
Once again, caution is in order in interpreting the results of this analysis.  A 
variety of additional factors that were not controlled in this analysis could explain the 
difference in performance between these two classes.  The next step after analyzing the 
impact of authentic pedagogy on lower order outcomes was to determine its effect on 
another type of assessment designed to measure more advanced thinking processes. 
 
Research Question III.  What is the impact of authentic pedagogy on student 
performance on an assessment that requires them to apply knowledge from a previous 
unit to a challenging new task? 
Null Hypothesis:  The level of authentic pedagogy a student receives in their tenth 
grade social studies course does not have a statistically significant effect on their ability 
to apply knowledge from a previous unit to a challenging new task.  
  In order to address this research question I created a writing task that required 
students to construct an editorial based on an authentic historical problem.  The tasks and 
rubrics are discussed in greater detail in chapter three.  The regular and AP editorials 
 
187 
 
were analyzed separately.  I will begin by discussing the results of the regular U.S. 
History assessment. 
 I began my analysis by organizing the database of students who took the Manifest 
Destiny higher order editorial into three groups based on the level of authentic pedagogy 
they experienced (1=minimal, 2=limited, 3=moderate).  The minimal group included all 
the students who took history from Andy or Jason.  The limited group included all the 
students who took history from Phillip.  The remaining students in the moderate group 
took Ryan?s history classes. The unit of analysis was students within the three large 
groups, not specific classes. After establishing the three groups, I wanted to see if they 
differed significantly on specific demographic characteristics (race, gender, SES).  In 
taking this step I was attempting to control for factors, other than authentic pedagogy, 
that might influence student performance.  Appendix W provides more information 
regarding the statistical tests I used to establish the comparability of the groups. 
The final step was to run a factorial MANOVA using the rubric sub-categories for 
Part I of the editorial as dependent variables:  position, context, persuasiveness, low level 
dialectical reasoning, and quality of final position (see rubric in Appendix N). The 
independent variables were the level of authentic pedagogy students experienced (as 
represented by the three teacher groups) and race.  Race was included as an independent 
variable because I was not able to establish that the three groups had similar black/white 
ratios (Hotelling?s Trace p=.039).  The results associated with the descriptive statistics 
are presented here first before examining the outcome of the factorial MANOVA. 
The total score from Part I of the editorial had a possible range of 0 to 14.  The 
distribution of scores is provided in Table 30.  The total score is derived from several 
 
188 
 
scoring categories on the rubric. Some of these, like position and historical context, can 
be a little deceptive because students can score relatively well if they simply follow 
directions. The persuasiveness and low-level dialectical reasoning scales provide a 
clearer window into what students were able to do on this assessment. It is for this reason 
that I decided to highlight these categories.  The table indicates that nearly half of the 
students scored between 0 and 3 on Part I.  No students reached the top end scores of 11-
14.  Analysis of the persuasive and dialectical categories revealed a similar pattern.  Most 
students did not reach the upper end of these scales. Student editorials, for the most part, 
were not very persuasive. Students struggled to provide elaborated arguments that were 
backed by historical evidence and/or examples.  The low-level dialectical reasoning scale 
measured the extent to which students were able to identify and explain the opposing 
viewpoint from the one they were arguing.  Most students (53%) either did not include 
opposing arguments or did not provide enough information to clearly demonstrate they 
understood an opposing point of view.  Students who provided opposing views often 
immediately refuted them in the same paragraph without ever clearly laying out a well 
developed opposing perspective.  The lack of a strong third paragraph made it difficult 
for students to achieve a high score on the final paragraph where more advanced 
dialectical reasoning was measured. 
Part II of the editorial assessed whether students saw any connection between 
Manifest Destiny and contemporary U.S. policies.  The highest scores (level 2) were 
reserved for students who explicitly mentioned Manifest Destiny in their response and 
tied it to their explanation of America?s mission (or lack of a mission) in the world today.  
A small number of the responses made this type of connection (4.5%). 
 
189 
 
 
 
Table 30 
 
Distribution of Manifest Destiny Editorial Scores for Part I and Select Rubric Sub-
Categories 
 
Total Score - Part I Persuasiveness Score ? 
(Part I) 
Low-Level Dialectical 
Reasoning Score ? (Part I) 
Score Percent Score Percent Score Percent 
0 7.1 0 27.7 0 52.9 
1-3 40.6 1 32.9 1 33.5 
4-6 34.8 2 18.7 2 12.9 
7-9 14.8 3 19.4 3 .6 
10 2.6 4 1.3   
11-14 0 5 0   
Note.  N=155  
 
The results of the factorial MANOVA are described in Table 31.  Note that while 
race was included as a variable in this analysis, it did not reach significance in terms of its 
impact on student performance (Hotelling?s Trace p=.107).  A statistically significant 
difference was found between the level of authentic pedagogy a student experienced 
(minimal, limited, or moderate) and their academic performance on the regular U.S. 
History higher order editorial task.  I ran Bonferroni post-hoc tests to determine more 
specifically how the level of authentic pedagogy influenced achievement.  The results 
indicated that the moderate authentic pedagogy group performed significantly better than 
the minimal group (p=.001) and the limited group (p=.018) on the context scale.  In 
addition, the moderate authentic pedagogy group performed significantly better than the 
limited group (p=.001) on the persuasiveness scale.  However, there was not a significant 
effect when comparing the moderate group with the minimal group on this component of 
 
190 
 
the rubric.  The moderate group did perform at a higher level, but this could have been 
due to chance.  These results should be viewed with some caution.  While inter-rater 
reliability figures are available for the advanced placement editorial, they are not for this 
editorial.   It is possible that another rater could come to different conclusions.  Two 
additional social studies graduate students did read numerous Manifest Destiny editorials 
and assisted with the process of refining the rubric.     
 
Table 31 
 
Factorial Manova Comparing the Performance of Authentic Pedagogy Groups on the 
Manifest Destiny Editorial 
 
 Minimal Limited Moderate  
Variable Mean (SD) Mean (SD) Mean (SD) F 
Position .71 (.457) .67 (.480) .71 (.457) .078 
Context .46 (.703) .48 (.643) .92 (.651) 6.059* 
Persuasiveness 1.29 (1.175) .78 (.847) 1.69 (1.071) 4.164* 
Low-level Dialectical Reasoning .58 (.792) .30 (.465) .81 (.754) 2.886 
Quality of Final Position .49 (.653) .44 (.577) .81 (.754) 2.382 
Note. An overall multivariate comparison resulted in a Hotelling?s Trace of .163 (p=.019) 
*p<.05. 
 
 
In reading through the Manifest Destiny editorials some general trends were 
apparent.  Quite a few students didn?t have a firm grasp of the historical context 
associated with the conflict with Mexico leading up to the decision point in 1846.  
Introductory paragraphs reflected confusion over a number of factual details.  Students 
demonstrated misconceptions over who won the Battle of the Alamo, Mexico?s 
relationship with Texas (i.e. the fact that Texas was a part of Mexico before the conflict 
with the U.S.), and whether the Americans who migrated to Texas were invited or not.  
There was also the tendency to overlook the border dispute which most directly 
 
191 
 
precipitated the Mexican-American War.  Some students also used the term ?Manifest 
Destiny? in strange and awkward ways suggesting a lack of in-depth understanding of 
what it meant (i.e. the U.S. will use the Manifest Destiny on you).   Students who did use 
the term correctly weren?t necessarily able to provide much more than a basic definition.  
I expected to see more instances of students making historical connections to reinforce 
the notion that America had a special God-given destiny (i.e. a City upon a Hill reference 
from earlier in the U.S. History 10 course).        
The writing prompt for the Manifest Destiny task also seemed to confuse many 
students. They were inclined to argue for or against the Mexican-American War without 
discussing the concept of Manifest Destiny.  This was a substantial problem because I 
really expected students to focus their response around their understanding of Manifest 
Destiny and how this term was being used at the time. The students who wrote a limited 
response without discussing Manifest Destiny received a maximum of two points on the 
persuasiveness scale for Part I. 
Students used a number of arguments to support their position and write a 
persuasive editorial.  Those who believed American actions related to the Mexican-
American War were justified often claimed that Mexico was at fault for refusing to meet 
with the American representative, John Slidell or refusing to sell land to the United 
States.  Some students argued that Mexico was really the aggressor by attacking U.S. 
troops in the disputed territory.  When students integrated Manifest Destiny into their 
response (required for a higher score), they argued to a greater or lesser extent that U.S. 
actions were for the greater good because America would bring advances (i.e. 
democracy) to a land that couldn?t seem to stabilize its government following its 
 
192 
 
independence from Spain (an argument from the Boston Times source document).  Those 
who argued against America?s actions towards Mexico frequently used the argument by 
Albert Gallatin that America was simply acting out of greed.  This was often paired with 
the idea that Manifest Destiny was being used as a ?cover? to obfuscate America?s real 
intentions.  Of course the argument that God didn?t really support Manifest Destiny was 
commonly used as well.  There was little evidence of students critically examining and 
weighing the arguments contained in the documents.  For example, students used the 
statement ?before this unfortunate war, [America] always acted with justice?The use of 
military force was always in self-defense?.? without ev er seeming to question its 
veracity.  Several students used a phrase from the Boston Times article justifying 
American actions because they would better the lives of the ?great mass of the people, 
who have, for a period of 300 years been the slaves of an overbearing foreign race.?  
While this was a legitimate argument of the time, it is interesting to note that only one 
student referred to America?s own problem with slavery.   
 
German Unification Editorials.  I used a similar procedure to evaluate the 
advanced placement editorials.  Instead of having three authentic pedagogy groups, I only 
had two since I didn?t have an advanced placement teacher score in the minimal range.  
Once again, I attempted to control for any differences between the groups on factors that 
might impact achievement on the editorial assessment.  The limited and moderate 
authentic pedagogy groups were not significantly different in terms of race, gender, or 
socio-economic status (see Appendix W).   
 The descriptive statistics associated with this assessment are reported in Table 32.  
Nearly half of the students either did not provide a clear position statement in their 
 
193 
 
editorial or their statement focused on whether Germany should unify instead of how 
unification should be accomplished (i.e. include Austria?).  Most of the students provided 
at least some background context in setting up the editorial (>70%).  Most students 
scored at the two or three level on the persuasiveness scale indicating editorials that were 
adequate at best in convincing the reader.  Adequate editorials generally included two or 
more persuasive reasons that were described without much elaboration or support.  In 
terms of lower level dialectical reasoning, most students (52.2%) did not provide enough 
information in the third paragraph to indicate they had a solid understanding of an 
opposing perspective.  The final position standard of the rubric evaluated the ability of 
students to provide a persuasive conclusion that also included higher level dialectical 
reasoning:  genuine consideration of opposing viewpoints.  Most students (58.9%) scored 
at the adequate level (1) indicating that they included a basic conclusion that restated 
some key points.  A small group of students (7.8%) scored at a higher level (2) by 
providing a more elaborate conclusion that added to the overall persuasiveness of the 
editorial.  No students scored a three which would have required evidence of advanced 
dialectical reasoning.  The scores for Part I of the editorial had a possible range of zero to 
fourteen.  No students scored over ten.  Most students scored in the four to six range  
(47.9%).  A fairly sizeable group (24%) scored from seven to nine.  Only 1.1% achieved 
a score of ten.   
 Part II of the assessment was evaluated on the basis of decision-making and 
persuasiveness.  The majority of the students scored at the two level out of a possible five 
points.  When combining parts I and II, the range of possible scores was from one to 
nineteen.  No students earned a score over thirteen.   
 
194 
 
Table 32 
 
Distribution of German Unification Editorial Scores  
 
Position 
Sub-Scale 
Context 
Sub-Scale 
Persuasiveness 
Sub-scale 
Dialectical 
Reasoning  
Sub-scale 
Part I Part II Total Score 
Score Percent Score Percent Score Percent Score Percent Score Percent Score Percent Score Percent 
0 44.5 0 26.7 0 6.7 0 52.2 1 6.7% 0 21.1 1 4.4% 
1 55.6 1 48.9 1 20 1 28.9 2 7.8% 1 14.4 2 4.4% 
 2 24.4 2 46.7 2 15.6 3 12.2% 2 31.1 3 6.7% 
  3 25.6 3 3.3 4 16.7% 3 24.4 4 7.8% 
  4 1.1   5 15.6% 4 8.9      5 5.6% 
      5             0  6 
7 
8 
9 
10 
11-14 
15.6% 
14.4% 
6.7% 
3.3% 
1.1% 
0% 
 
 
5 0 6 
7 
8 
9 
10 
11 
12 
13 
14-19 
 
16.7% 
12.2% 
7.8% 
20% 
6.7% 
3.3% 
2.2% 
2.2% 
0 
Note.  N=90 
 
 
 
195 
 
As with the Manifest Destiny editorials, I ran a factorial MANOVA to determine if any 
differences existed in the performance of students from the authentic pedagogy groups 
(limited & moderate) on the higher order editorial.   In this analysis, I examined the 
rubric sub-categories from Part I.  In virtually all instances, the moderate group achieved 
higher mean scores than the limited group.  However, this result did not reach 
significance in any scoring category.  Table 33 provides additional information associated 
with this analysis.   
 
Table 33 
  
Manova Comparing the Performance of Authentic Pedagogy Groups on the Advanced 
Placement German Unification Editorial 
 
 Limited Moderate  
Variable Mean (SD) Mean (SD) F 
Position .50 (.508) .59 (.496) .673 
Context 1.00 (.696) .96 (.738) .052 
Persuasiveness 1.76 (.890) 2.05 (.862) 2.320 
Low-level Dialectical Reasoning .56 (.786) .79 (.889) 1.502 
Quality of Final Position .62 (.551) .82 (.606) 2.556 
Note. An overall multivariate comparison resulted in a Hotelling?s Trace of .062 
(p=.400). 
 
Returning to the null hypothesis for this research question, the results suggest 
higher levels of authentic pedagogy had a positive impact on student performance on the 
higher order writing task.  However, the difference in performance for AP students could 
be due to chance.  The null hypothesis was rejected for the regular history classes and 
retained for the advanced placement groups.
 
196 
 
 As with the Manifest Destiny editorial, the biggest issue I encountered was off- 
topic responses.  Instead of addressing the question of whether a unified German state 
should include all Germans, the students often argued the merits of unification itself.  It 
was common for students to discuss how a larger unified state would increase Germany?s 
military might and prestige among the nations of Europe.  In this context, they would 
often incorporate Mohl?s statement (from the primary source document) of how a Reich 
with 70 million people could not be challenged.  The practical problems associated with 
forming a unified country encompassing German speaking territories in Austria were 
generally not discussed or if they were, only superficially.  Students also discussed the 
economic benefits of being united and perhaps how a country formed by people with 
similar customs, language, and beliefs could function more smoothly.  Some of the off-
topic papers were fairly well written, but they did not exceed a two (out of 5) on the 
persuasiveness scale because they did not really address the issue of nationalism and the 
extent to which it should serve as the basis for German unification. 
 Among the students who did address the appropriate question, most editorials did 
not exceed a level three, ?adequate? score for persuasiveness (see Appendix X for 
examples).  When scoring persuasiveness I evaluated the entire editorial.  However, the 
main portion of the editorial dedicated to providing supporting argumentation was 
paragraph two.  Two of the better supporting paragraphs are provided in Figure 5.  These 
paragraphs demonstrate some of the arguments constructed by students and how the 
source documents were integrated into the editorials.  The excerpt on the left provides an 
argument that unification only serves the interests of Prussia and Bismarck.  It suggests 
that unification would not follow the liberal constitutional course favored by the people.  
 
197 
 
Despite a factual error, this editorial is noteworthy in that only one other student made 
this sort of argument.  Also, the student included relevant information from the textbook 
that was not in any of the source documents.  Later in the editorial (not in the second 
paragraph) the student also makes a political argument.  The language used in the 
?opposing? editorial on the left was not particularly eloquent or clear.  Also, the student 
could have taken a more decisive stance on the focus question.  As a result an adequate 
(3) persuasiveness score was assigned. The excerpt on the right makes military and 
economic arguments for unification.  While it did not contain lengthy arguments, it did 
contain some original elaboration particularly in the area that discusses ports in the 
Mediterranean and coal mines in the Saar.  The overall editorial earned an elaborated (4) 
persuasiveness score. 
The use of the primary source documents by the students often amounted to 
pulling brief passages to supplement an argument or to represent an opposing view.  The 
stronger students integrated the quotes into their argument with a short explanation. In a 
number of instances, however, students simply inserted a quote in the paragraph and 
moved on.   This lack of elaboration and some of the clearly inaccurate statements made 
in conjunction with quotes suggested that the documents were fairly difficult for many of 
the students to understand.  The students particularly had a hard time with the following 
statement:  ?The highest and most fundamental idea in the political life of a state must be 
the internal satisfaction of peoples through their institutions, their right to self-
determination.?  This sentence could have been used to frame an argument against basing 
a state solely on nationalism, but most students missed the point Bebel (the author of the 
speech) was trying to make.  Instead, some students interpreted the statement to mean 
 
198 
 
that the goal of a state is to ensure people are happy and unification would accomplish 
this objective.   
 
      
Opposing Unification Supporting Unification of All Germans 
 
In the 1862s Prussia wanted to increase 
military power.  In order to do that, the 
parliament had to appropriate the budget to 
finance it.  The parliament disapproved 
such action, which shows that most people 
of Prussia didn?t want it.  But, Bismarck 
spent government money on military 
strength, ignoring the parliament.  It shows 
that the decision was by few authority, not 
the people.  And in 1848, there was an 
assembly called Frankfurt.  It was a 
meeting of German states to propose a plan 
for the unification of Germany.  The plan 
was to have a constitutional monarch, that 
is to be more liberal than before.  Prussian 
king refused, saying that unification should 
be obtained by blood.  If Prussia actually 
wanted unification for the German states, 
they would have agreed at the assembly.  
But since they wanted a strong power 
centered in Prussia, they decided to obtain 
unification through war.  The two evidence 
shows that Prussia was solely decided and 
the decision was by a few authorities.  
Thus, it doesn?t need to be supported, not 
only be German people, but also by foreign 
nations. 
 
The unification of Germanic peoples 
should be endorsed because of the potential 
military they would gain from such a 
unification.  If Germany were to unify they 
would, according to document 2 and the 
words of Mohl, have a Reich of seventy 
million people.  This new Reich would be 
able to stand up to even Russia with her 
sixty-six million, and France with her 
thirty-six million.  Having such a superior 
force will make Germany powerful and that 
means unification will help them a lot.  
Another consequence that I believe makes 
unification good for Germany is the 
economic consequence.  With the 
unification of such a vast amount of land it 
stands to reason that they would also gain 
many different forms of economic 
stimulus.  They would gain ports in the 
Mediterrania in Austria-Hungary and coal 
mines in Saar.  This alone will much help 
the German economy and make unification 
well worth it. 
 
Figure 5.  Examples of Supporting Arguments Provided for Part I of the German 
Unification Editorial 
 
As with the Manifest Destiny editorials, I occasionally got the sense that students 
were appropriating arguments from the primary source documents without really 
examining their underlying logic.  For example, students cited the same Bebel speech in 
 
199 
 
their editorial to demonstrate that nationalism would lead to war and that a unified state 
based on nationalism would require Germany to cede territory away (i.e. Slavic-speaking 
areas).  It is true that this point was made in the primary source document.  However, I 
thought at least one student would question this claim.  Would those seeking to unify 
Germany really give away territory in the name of nationalism?  It seems doubtful that 
any state would voluntarily do this.  One student did bring up the scenario of ethnic 
cleansing to develop a pure German state.  This argument is probably not very authentic 
for a citizen living in 1870, but it does suggest the student was thinking about the 
implications of the arguments being made in the source document.   
 
Research Question IV.  Does the ability to apply knowledge on the graduation 
exam improve with repeated exposure (multiple courses) to classroom experiences that 
require students to perform challenging intellectual tasks? Null Hypothesis:  Repeated 
exposure to authentic classroom experiences that require students to perform challenging 
intellectual tasks has no impact on student performance on the Alabama High School 
Graduation Exam. 
 A one-way ANOVA was used to test whether repeated exposure to moderate 
levels of authentic pedagogy resulted in higher scores on the Alabama High School 
Graduation Exam.  I created a variable representing the total number of social studies 
courses each student had at the moderate authentic pedagogy level.  This resulted in three 
possibilities: a student could have 0, 1, or 2 social studies classes at the moderate level in 
their 9th and 10th grade years.  Students who had more than one social studies teacher in 
any particular year were filtered out of the analysis.  The resulting ANOVA included 328 
 
200 
 
students with no exposure to moderate pedagogy, 292 with one class, and 58 students in 
the group who experienced two moderate authentic pedagogy courses.  Table 34 provides 
a breakdown of the results (first row of data).  Repeated exposure to courses featuring 
moderate authentic pedagogy was found to have a significant effect on student 
achievement on the graduation exam (p < .001).  Although the ANOVA showed that the 
means were significantly different, the effect size was very small (?2 = .04). The Eta 
squared was just .04 when .10 is needed for a small effect according to Cohen (Cohen, 
1992, p. 157).   
Hochberg?s GT2 post-hoc comparisons of the three groups indicated that the 
group with two moderate authentic pedagogy classes had significantly higher scores on 
the graduation exam than students with one.  This group also performed significantly 
better than students who did not have any experiences at the moderate level (p < .001).  
The same relationship was found when comparing students who experienced one 
moderate authentic pedagogy course with those who didn?t experience any at the 
moderate level.  The students with one course performed significantly better, p = .004.   
Table 34 
Analysis of the Impact of Courses Featuring Moderate Authentic Pedagogy on 
Graduation Exam Results 
 
 No moderate AP 
classes 
Mean (SD) 
One moderate 
AP class 
Mean (SD) 
Two moderate 
AP classes 
Mean (SD) 
F 
Advanced Placement 
Students included 
539.08 (70.257) 556.38 (65.686) 585.91 (47.924) 14.13*** 
Advanced Placement 
Students excluded 
536.16 (70.263) 534.12 (64.369) 569.53 (51.762) 2.121 
Note.  ***p < .001 
 
 
201 
 
 Each group in this analysis included regular and advanced placement history 
students.  However, a larger percentage of advanced placement students were in the 
ANOVA group that had the most repeated exposure to authentic pedagogy.  It is possible 
that the results of the analysis were influenced by an advanced placement effect instead 
of just authentic pedagogy.  This seems likely since advanced placement students tended 
to do better on the graduation exam than students in the regular U.S. history course.  In 
order to more precisely examine the research question, I ran another one-way ANOVA 
that excluded the advanced placement students (second row of data in Table 34).  Only 
17 students were in the group that experienced two social studies courses at the moderate 
authentic pedagogy level.  The test did not indicate a statistically significant difference 
between the three groups.   Figure 6 graphically displays the results of these two tests as 
well as an ANOVA that included only advanced placement students. 
 
  
Figure 6.  Effect of repeated exposure to moderate authentic pedagogy on student achievement.  The green 
?all students? line was the only one to reach statistical significance.  Results were significant at the .01 
level from 0 to 1 and 1 to 2.   
 
202 
 
 
 
 Finally, I applied the same predictor variable to a sequential regression model 
similar to the one I used to address research question two.  The prior moderate variable 
was entered by itself in model three.  In this analysis, I included advanced placement 
students since the previous ANOVA indicated that only a very small sample of non-AP 
students had multiple classes with moderate authentic pedagogy.  Multiple social studies 
courses with moderate authentic pedagogy had a slight positive impact on student 
achievement on the graduation exam.   The results are displayed in Table 35.  Based on 
the results of the ANOVA models and the regression analysis, the null hypothesis was 
rejected.   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203 
 
Table 35   
 
Sequential Multiple Regression Analyses Predicting Impact of Repeated Exposure to 
Moderate Authentic Pedagogy on Graduation Exam Results 
 
       Model Variable R Square/ Change Beta Semi-partial 
     
   1. Demographics  .263***   
 Gender  .194*** .190*** 
 Ethnicity  .169*** .129*** 
 LEP  -.102*** -.102*** 
 Special Ed  -.106*** -.104*** 
 SES  .097** .073** 
     
   2.  Achievement  .147***   
 10
th Average 
  
.441*** 
 
.389*** 
 
     
   3.  Authentic Pedagogy  .016***   
 Prior Moderate  .131*** .128*** 
     
     
OVERALL MODEL  .426***   
     
Note.  *p<.05, **p<.01, ***p<.001 
 
 
Research Question V.  To what extent does authentic pedagogy bring different 
achievement benefits to students of different social and academic backgrounds? Null 
Hypothesis:  Authentic Pedagogy will result in statistically significant differences in 
achievement on the graduation exam for students from different social and academic 
backgrounds. 
 I used several bivariate correlation tests to address this question. In order to 
maintain consistency with previous analyses, I excluded students who had more than one 
 
204 
 
social studies teacher in the tenth grade and students in advanced placement courses.  I 
also decided to analyze this question using three variables:  authentic pedagogy, authentic 
tasks, and authentic instruction.  I examined the influence of authentic tasks and 
instruction independently to gain a more nuanced understanding of how the components 
of authentic pedagogy impact the performance of various subgroups of students.   
Table 36 depicts the bivariate correlation examining the relation between 
authentic pedagogy and student performance.  The results suggest that authentic 
pedagogy positively impacted graduation exam performance for all subgroups.  The 
correlation for white, male students reached statistical significance at the .05 level.  The 
effect size in both cases (gender and race) was small.  A significant difference in 
achievement benefit did not exist based on socio-economic background or prior academic 
achievement.  The results of the bivariate correlations for authentic pedagogy led me to 
accept the null hypothesis for gender and race, and to reject the null hypothesis for SES 
status and prior achievement. 
The analysis of authentic tasks by themselves yielded a different outcome.  Based 
on this limited sample of teachers, authentic tasks were often negatively correlated with 
student performance on the graduation exam.  There was not a statistically significant 
performance benefit associated with authentic tasks based on a student?s gender, SES, or 
prior social studies achievement.  The performance of African American students was 
negatively associated with authentic tasks at the .05 level, but the effect size was small. 
White students experienced a positive effect, but they could have just as easily 
experienced a negative one since the outcome was not statistically significant.   
 
 
205 
 
Table 36 
 
Bivariate Correlations Examining Relation between Authentic Pedagogy and 
Achievement by Subgroups  
 
 AP 
Mean 
Pearson 
Correlation 
Effect 
Size 
Gender    
     Female (N=214) 14.170       .107 0.01144 
     Male (N=280) 13.388 .139* 0.01932 
Ethnicity    
     White (N=281) 13.932 .127* 0.01612 
     African-American (N=168) 13.407 .051 0.00260 
SES    
     Paid Lunch (N=334) 13.840       .095 0.00902 
     Free/Reduced Lunch     
     (N=135) 
13.565 .136 0.01849 
Social Studies Achievement    
      A/B Student (N=305) 14.025 .081 0.00656 
      C/D/F Student (N=174) 13.264 .010 .0001 
Note.  *p<.05 
 
Table 37 
 
Bivariate Correlations Examining Relation between Authentic Tasks and Achievement by 
Subgroups  
 
 Task 
Mean 
Pearson 
Correlation 
Effect 
Size 
Gender    
     Female (N=214) 6.675 -.110 .0121 
     Male (N=280) 6.508 .025 .000625 
Ethnicity    
     White (N=281) 6.611 .005 .000025 
     African-American (N=168) 6.549 -.180* 0.0324 
SES    
     Paid Lunch (N=334) 6.601 -.065 0.00422 
     Free/Reduced Lunch    
     (N=135) 
6.569 -.058 0.00336 
Social Studies Achievement    
      A/B Student (N=305) 6.634 -.043 0.00184 
      C/D/F Student (N=174) 6.483 -.158 0.02496 
Note.  *p<.05 
 
206 
 
Finally, I examined the performance benefits of authentic instruction as it related 
to achievement among the same subgroups of students.  Authentic instruction was 
positively correlated with student achievement on the graduation exam, regardless of a 
student?s demographic profile or social studies achievement record.   The more authentic 
instruction students received, the better they performed on the graduation exam.  The 
results achieved statistical significance for both genders, white students, and more 
advantaged students (based on paid lunch).  They approached significance for low SES 
students (p=.054).  Authentic instruction was positively associated with performance for 
students with both low and high social studies averages, but the results could have been 
due to chance.  The comparison for each variable did not reveal any drastic differences 
among subgroups in how authentic instruction influenced performance.  
 
Table 38 
 
Bivariate Correlations Examining Relation between Authentic Instruction and 
Achievement by Subgroups 
  
 Instruction 
Mean 
Pearson 
Correlation 
Effect Size 
Gender    
     Female (N=214) 7.495 .160* 0.0256 
     Male (N=280) 6.880 .157* 0.024649 
Ethnicity    
     White (N=281) 7.320 .149* 0.022201 
     African-American (168) 6.858 .131 0.017161 
SES    
     Paid Lunch (N=334) 7.239 .134* 0.017956 
     Free/Reduced Lunch    
     (N=135) 
6.996 .185 0.034225 
Social Studies Achievement    
      A/B Student (N=305) 7.390 .109 0.011881 
      C/D/F Student (N=174) 6.782 .072 0.005184 
Note.  *p<.05 
 
207 
 
Summary 
 In this chapter I?ve provided the results of analyses related to four research 
questions.  My objective was to better understand how authentic pedagogy influences 
student learning in history classrooms. The findings suggest that authentic pedagogy has 
a small, but positive impact on student performance on the Alabama High School 
Graduation Exam.  However, other factors such as grades in social studies and gender are 
stronger predictors.  Classroom level comparisons suggest that students who receive 
higher levels of authentic pedagogy are not put at a significant disadvantage on the 
AHSGE.  It is important to note that they also did not experience the sort of performance 
benefit that would be consistent with the outcomes reported in Newman?s 2001 study of 
authentic pedagogy and standardized tests.  In Newman?s study, students who received 
high quality assignments were likely to outperform their peers on the standardized tests 
by significant margins (i.e. an achievement benefit of 32 points on the IGAP reading 
section).  The results of my analysis should be viewed with caution. This study dealt with 
a very limited sample of teachers (N=4).  The spread of these teachers along the authentic 
pedagogy continuum was also limited with no teacher reaching the substantial level and 
only one teacher in the moderate category. 
 This study also had an equity component. I was interested in whether any 
performance benefits associated with authentic pedagogy would be equally distributed 
among the students.  Authentic pedagogy was found to have a positive impact on student 
performance for all social and prior achievement groups.  White, male students 
experienced a greater achievement benefit than African-Americans and females.  I also 
examined the sub-components of authentic pedagogy: authentic tasks and authentic 
 
208 
 
instruction.  When the analysis focused solely on authentic tasks the results, in most 
instances, suggested a negative influence on student performance on the graduation exam 
when students experienced tasks that scored higher in authenticity.  Male and female 
students were impacted about the same, as were students from different socio-economic 
and academic backgrounds.  The negative impact on African-American students reached 
statistical significance when compared to the very small positive impact of authentic 
tasks on the performance of white students.  The impact of higher levels of authentic 
instruction on student performance, on the other hand, was positive in most instances. 
The results of bivariate correlations revealed no significant difference in performance 
benefit based on gender or prior academic achievement.  African American students were 
positively impacted by higher levels of authentic instruction, but not to the same extent as 
whites.  The same was true for students from lower socio-economic backgrounds based 
on free or reduced lunch when compared to students that paid for their lunch. 
   Another area I chose to investigate was the impact of exposure to multiple classes 
with higher levels of authentic pedagogy.  Would students who experienced two courses 
of moderate authentic pedagogy perform better on the graduation exam than students who 
only receive one or none at all?  The answer to this question was ?yes? based on my 
analysis.  However, the results of this question are a little more complicated due to the 
nature of the student sample.  Many of the students who were in the group who received 
multiple classes of moderate authentic pedagogy were also advanced placement students.  
It was difficult to separate the effect of authentic pedagogy versus a possible advanced 
placement effect.   
 
209 
 
 Finally, I looked at the impact of authentic pedagogy on higher order learning 
outcomes.  I chose to focus my statistical analysis for both tasks on the rubric sub-
categories associated with Part I because they were the most reliable in terms of scoring 
and they offered a targeted look at specific higher order skills (i.e. persuasiveness).  The 
cumulative scoring categories were less meaningful because they could be influenced to a 
greater degree by procedural aspects of the scoring rubric.  When looking at the editorials 
written by the regular U.S. History students, the students who received moderate 
authentic pedagogy wrote more persuasive editorials than classmates who received 
limited authentic pedagogy.  The same was true when comparing the moderate group 
with students who received minimal authentic pedagogy, but the results could have been 
simply due to chance.  There was not a statistically significant difference among classes 
of advanced placement students on the German Unification task.  The students in the 
limited authentic pedagogy group generally did not perform as well as the students in the 
moderate authentic pedagogy group on the rubric sub-categories associated with Part I, 
but this could have been due to chance.  The next chapter will provide a more extended 
discussion of these findings.   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
210 
 
CHAPTER SIX:  SUMMARY, LIMITATIONS, & IMPLICATIONS 
 
 
 This study investigated the impact of authentic instruction on student learning in 
social studies classrooms.  As discussed in the literature review, there is evidence to 
suggest that today?s high-stakes tests serve as a disincentive for those who want to 
provide more in-depth learning experiences for their students.  Teachers need reassurance 
that they are not hurting their students? chances on high stakes tests when they pursue 
more ambitious, and often time consuming, inquiry-oriented activities.  Work by 
Newmann in other subject areas provides evidence that authentic pedagogy can enable 
students to achieve positive results on basic skills tests while also producing complex 
intellectual learning outcomes. This study was an effort to extend this line of inquiry in 
the social studies.  I wanted to more fully understand the types of learning outcomes 
students demonstrate when they receive higher levels of authentic pedagogy in their 
history classes.       
In order to operationalize and analyze the instruction students experienced, I used 
Newmann?s Authentic Intellectual Work (AIW) model.  This framework places greater 
value on teaching that encourages higher-order thinking, in-depth knowledge, substantive 
communication, and real world application - characteristics commonly associated with 
inquiry-based instruction.  Participating teachers in the study were categorized using 
AIW rubrics and placed on a continuum according to the level of authentic pedagogy 
they provided to their students.  
 
211 
 
Once teachers were categorized, I created a database that included students from 
the participating teacher?s classes.  Each student record included demographic 
information, prior achievement data, and social studies graduation exam results. I 
conducted statistical analyses using the database to determine how students that 
experienced varying levels of authentic pedagogy performed on measures of lower and 
higher order knowledge. 
 In previous chapters I?ve discussed the theoretical basis for this study, its 
methodology, and findings.  This chapter offers a more extended discussion of some of 
the major findings.  It places the results within the context of those from similar studies.  
Alternative explanations for the results of the study are provided as well as suggestions 
for further research.              
Summary   
 This study included five research questions.  The first research question was:  To 
what extent do teachers utilize authentic pedagogy and how much variation exists within 
the sample of teachers in this study?  I concluded that high levels of authentic pedagogy 
were not very prevalent in the study schools.  The range of possible authentic pedagogy 
scores (7-30) was broken down into four categories to reflect a continuum from minimal 
use of authentic pedagogy to substantial.  No teachers in this sample provided substantial 
authentic pedagogy.  However, a good deal of variation still existed among study teachers 
with the lowest score being 9.6 and the highest 21.2.  Three teachers were in the 
moderate authentic pedagogy category, two in the limited, and three in the minimal 
category.  The average score in this sample was a 14.8 which was below the mean of the 
scale of 18.5.   
 
212 
 
   The second research question focused on lower order learning outcomes 
associated with authentic pedagogy.  The question asked: Do students that have been 
taught by teachers demonstrating higher levels of authentic pedagogy score higher on the 
Alabama High School Graduation Exam (AHSGE) than students taught by teachers with 
lower levels of authentic pedagogy?  I analyzed this question in several ways.  The most 
precise analysis involved comparing a class that experienced minimal authentic pedagogy 
with one that received moderate authentic pedagogy.  The class that experienced minimal 
authentic pedagogy outperformed the moderate authentic pedagogy class on the 
graduation exam, but the results were not statistically significant.  The results of this 
analysis led me to conclude that authentic pedagogy did not cause students to perform at 
a higher level on this test of basic historical knowledge.  However, it did not appear to 
hurt students? chances either.  With only four teachers in this analysis, these results 
should be viewed as very tentative. 
I also examined the same question using students as the level of analysis instead 
of intact classes. The results of this broader analysis suggested that authentic pedagogy 
played a small positive role in explaining student performance.  When the elements of 
authentic pedagogy were analyzed independently, authentic tasks were found to have a 
negative impact on student performance, while authentic instruction had a positive 
impact.  The results in both cases were statistically significant.     
 The third research question examined the impact of varying levels of authentic 
pedagogy on higher order learning outcomes.  The question asked: What is the impact of 
authentic pedagogy on student performance on an assessment that requires them to apply 
knowledge from a previous unit to a challenging new task?  The ?challenging new task? 
 
213 
 
was an editorial assignment that required students to take and defend a stance on a 
historical problem.  Regular students completed an editorial focused on Manifest Destiny 
while the advanced placement students did a similar assignment on German Unification.  
Most students struggled on the editorial assignment, regardless of their assignment as 
advanced or regular. 
Analysis of student editorials revealed a statistically significant difference in the 
three group?s performance on the Manifest Destiny editorial.  Students in the moderate 
authentic pedagogy group were able to write editorials that contained better introductory 
paragraphs with more historical context than those in the minimal and limited groups.  
They also wrote more persuasive editorials, although the result only reached statistical 
significance when the moderate group was compared with the limited authentic pedagogy 
group.  In analyzing the advanced placement editorials, students were organized into two 
groups: limited and moderate authentic pedagogy.  The students in the moderate 
authentic pedagogy group had higher mean scores on most of the rubric categories (i.e. 
persuasiveness, dialectical reasoning, etc.) that comprised Part I of the assessment task, 
but the differences were not statistically significant.  In general, both the general and 
advanced students were able to use the documents provided to construct basic arguments 
for or against the question under consideration.   
 The fourth research question asked whether the ability to apply knowledge on the 
graduation exam improved with repeated exposure (multiple courses) to classroom 
experiences that required students to perform challenging intellectual tasks.  Students 
performed better on the graduation exam when they had two classes of moderate 
pedagogy as compared to having just one experience or none at all.  When all students 
 
214 
 
were included in the analysis the results were statistically significant.  However, most of 
the students who experienced two classes at the moderate level were in the Advanced 
Placement course in the tenth grade.  It is difficult to know whether the higher scores on 
the graduation exam were because students experienced two moderate level courses or 
simply because the students were more academically advanced.   When the AP students 
were eliminated from the analysis, the results were not statistically significant. 
 Finally I asked: To what extent does authentic pedagogy bring different 
achievement benefits to students of different social and academic backgrounds? 
I analyzed the achievement benefits associated with authentic pedagogy, authentic tasks 
and authentic instruction as part of this question.  Authentic pedagogy was positively 
correlated with achievement on the graduation exam regardless of a student?s prior 
academic ability or demographic group.  The achievement benefits were equitably 
distributed for students based on SES status and academic ability (prior social studies 
grades).  However, white, male students experienced a greater achievement benefit than 
African-Americans and women.   
When authentic tasks were analyzed independently as a sub-component of 
authentic pedagogy, the resulting output indicated a negative correlation for the SES and 
prior academic achievement variables.  In other words, higher scoring tasks were 
associated with lower performance on the graduation exam for students with these 
demographic and achievement characteristics.  The correlation did not reach statistical 
significance for either variable (free/reduced vs. paid lunch or A/B students vs. C,D,F 
students).  When analyzing the influence of authentic tasks based on gender, the 
correlation with student performance was positive for males, but negative for females.  
 
215 
 
However, the difference was not statistically significant.  Finally, the correlation between 
authentic tasks and student performance on the graduation exam for ethnicity was 
positive for whites and negative for African-Americans.  The difference was significant 
at the .05 level.  This suggested that the use of authentic tasks had a greater impact on 
African-Americans than whites, and was associated with lower achievement for this 
group of students.  However, the effect size for this analysis was small (0.03).   
Authentic instruction, the other sub-component of authentic pedagogy, was 
associated with improved performance for all groups.  The more authentic instruction 
students received, the better they performed on the graduation exam regardless of their 
demographic profile or prior achievement, although the correlation did not always reach 
statistical significance.  The lack of statistical significance was particularly evident for 
African-American students and those from lower socio-economic backgrounds.  
Discussion and Alternative Explanations 
AIW and Lower Order Achievement Outcomes.  One of the main aspects of 
this study was determining the impact of authentic pedagogy on the Alabama High 
School Graduation Exam (AHSGE).  The results of my study are consistent with other 
AIW studies that suggest authentic pedagogy does not hurt student performance on 
standardized tests (D'Agostino, 1996; Lee, Smith, & Croninger, 1997).  However, they do 
not support the findings from the study that most directly addressed this relationship 
(Newmann, Bryk, & Nagaoka, 2001).  Newmann?s 2001 study indicated that students 
(grades 3, 6, 8) who received higher quality authentic tasks performed at higher levels on 
basic skills tests in reading, writing, and math.  These results were explained in terms of 
vocabulary acquisition and motivation to learn.  Newmann argued that AIW?s ability to 
 
216 
 
promote these benefits essentially offset any limitations imposed by reduced coverage of 
testable material.   
Why did AIW not have the same impact on student retention of lower order 
knowledge in this study?  One possibility is simply the fact that I had such a small sample 
of teachers at the grade level the graduation exam was administered (N=4).  Larger 
samples may have yielded results more similar to those of past AIW studies.  The sample 
was also less than ideal since it did not include any teachers at the substantial authentic 
pedagogy level.  Perhaps higher levels of authentic pedagogy among the teacher sample 
are needed to achieve the outcomes found in Newmann?s study.  Another explanation 
could have to do with the outcome measure itself.  The graduation exam covers a 
significant period of U.S. history (Beginnings to WWII) and measures retention of 
specific information.  It is possible that Newmann?s theory regarding vocabulary 
acquisition and motivation doesn?t hold true for high school achievement tests of this 
nature. 
This study differed from most AIW studies in that it examined both tasks and 
instruction in determining the authentic pedagogy scores. This research design enabled 
me to examine the impact of tasks and instruction independently and in conjunction with 
the overall authentic pedagogy score.  The negative impact of authentic tasks on student 
performance was initially puzzling to me, but upon further reflection makes sense.  It is 
possible that the teachers adopted more challenging tasks, perhaps as a result of 
professional development or to impress me, without altering their usual instruction to any 
great extent.  A similar theory was suggested in some of the Gates Foundation research 
that used the authentic intellectual work model to analyze school reform (AIW/SRI, 
 
217 
 
2007).  If this is true, the students might have been unable, due to inadequate preparation, 
to fully take advantage of the learning opportunity represented by the authentic tasks.  
Not only would they struggle to achieve the higher order objectives associated with the 
task, but they might also grow frustrated or confused to the point where the lower order 
objectives were compromised.  There are examples from the study of moderate scoring 
tasks coupled with minimal levels of authentic instruction (see Appendix O ? Jason and 
Phillip).  It seems logical that challenging tasks, by themselves, would not really provide 
a big boost in student achievement.  It is very difficult for teachers who do not routinely 
challenge their students to produce positive outcomes with challenging tasks ?on 
demand? or immediately.  Students may need opportunities to build up to the challenging 
tasks.  When I observed certain tasks being implemented, it was apparent that the 
students were experiencing something out of the norm.  For instance, when observing 
Jason?s lesson that required students to rewrite the Declaration of Independence in 
contemporary language, it seemed clear that students were being confronted with a task 
that was much more challenging than usual.  A student asked Jason whether they were 
going ?back to hard.?  Jason replied by saying that it [the lesson] could be challenging, or 
simple and enjoyable depending on the day.  My sense was that Jason offered occasional 
instructional challenges, but this was not a consistent focus of the class.  I got a much 
different impression when watching the teachers in moderate authentic pedagogy classes.  
It is very difficult to successfully plan, scaffold, and implement authentic tasks.  The 
moderate authentic pedagogy teachers might have scored higher on authentic instruction 
because it was something they more routinely did with their students.  It is possible that 
the routine helped hone the skills of the teacher while also conditioning students to react 
 
218 
 
more favorably when challenged.  My hypothesis is that authentic tasks would have a 
more positive impact on student performance if they were a more consistent focus of the 
teacher.   Another aspect of this study was determining if exposure to multiple courses at 
higher levels of authentic pedagogy resulted in improved learning outcomes. Analysis 
revealed that students who had multiple years of authentic pedagogy at the moderate 
level generally had higher graduation exam scores than their peers in classes with lower 
levels.  However, the finding was not significant when advanced placement students were 
removed from the equation. The results of this study are not nearly as strong as those 
identified in Klentschy?s research.  Klentschy and his associates compared the 
performance of elementary science students on two standardized assessments based on 
whether they experienced a constructivist based, hands-on science program or a more 
traditional curriculum.  The students in the constructivist oriented program outperformed 
the students who did not experience the program and their performance improved steadily 
as they experienced more years of the program (Klentschy, Garrison, & Amaral, 2001).  
The disparity between the results in my study and those reported by Klentschy et. al. 
could simply be a result of the small sample size in my study or the fact that the grade 
level and discipline were different.  It was also very difficult to separate and clearly 
define the impact of authentic pedagogy over multiple years from the impact of advanced 
placement courses.  Perhaps a different study design would have yielded better results. 
AIW and Higher Order Achievement Outcomes.  As mentioned at the 
beginning of this chapter, the higher order research findings were based on an analysis of 
a writing task completed by the students. The task provided to the advanced placement 
students focused on the issue of German unification and the regular U.S. History students 
 
219 
 
completed an editorial focused on Manifest Destiny and the Mexican-American War.  
My hypothesis in designing the editorial assessments was that students who routinely 
experienced instruction requiring them to critically examine ideas and formulate 
arguments would be able to develop a well reasoned, persuasive editorial based on a 
problem they hadn?t encountered previously.  They would be able to recognize holes in 
logic or the implications of arguments being made in the source documents even if they 
were not particularly well versed in all of the details associated with German unification 
or Manifest Destiny.  Overall, there was not a great deal of evidence of this type of 
thoughtful reflection to support my hypothesis.  However, this could be a result of a 
number of extraneous factors that have little to do with the overall ability of the students 
to engage in this type of thinking. 
The results could be attributed to the novelty of the task.  The AP students, in 
particular, were accustomed to document-based questions. A scaffolded-essay of this type 
might have appeared foreign to them.  My own experience with students of this age 
suggests that the students in either group (regular or advanced) were probably capable of 
providing better responses with more guidance. In attempting to standardize the 
assignment instructions provided by the teacher, I was left ?out of the loop? and thus 
unable to answer questions or intervene to address misconceptions students might have 
had about the assignment (i.e. clarifying what the question was asking).  In retrospect, I 
should have attempted to gain permission to administer the assignment myself.    
Teachers were also told to provide incentives for the students so they would try 
hard on the assessment. At a minimum I wanted the assignment to be graded so students 
would have some stake in their performance.  I have no way of knowing if all of the 
 
220 
 
teachers followed through on this request. It is possible that some classes were more 
motivated to give their best than others.     
Another complication is that I did not observe the instruction students received on 
this topic.  Although I made a careful effort to consult with the teachers on the broad 
topic of each editorial, it is possible that they emphasized different aspects of Manifest 
Destiny and German Unification in their classes.  Some students, in covering Manifest 
Destiny for example, may have received a blow by blow account of the battles associated 
with Texas? independence and the Mexican-American War with little discussion of the 
motives of the participants.  I also don?t know how much instructional time was 
dedicated to each topic. If the students were really uncomfortable with the topic, they 
might not have performed up to their true potential.  Finally, although every effort was 
made to make the editorials as engaging and relevant as possible it is possible that this 
type of task did not appeal to some students and this could have also influenced their 
effort.           
 Even with these considerations, the results of the higher order task were still 
revealing.  It would be interesting to compare student scores on the Manifest Destiny 
editorial with their subsection scores from the graduation exam that dealt with the same 
topic.  It is possible that many of the students might be able to correctly answer multiple 
choice questions, but the editorials suggest that most students do not adequately 
understand this time period or the concept of Manifest Destiny.  While the editorial task 
is more challenging to score, it certainly provides a better window into student 
misconceptions of history.  Another conclusion I draw from reading the editorials, both 
AP and regular, is that students probably need more opportunities to engage in tasks that 
 
221 
 
develop higher order skills such as persuasive argumentation and dialectical reasoning.  It 
is important for students to be able to ?think like a judge,? evaluate a problem from 
multiple angles, and develop defensible solutions. This has important implications for the 
future of democracy.     
Limitations 
Despite the incredible generosity of the school system and the willingness of the 
social studies faculty to invite me into their classrooms, some limitations still existed in 
this study.  First, I lacked the resources to evaluate student work.  The use of all three 
AIW rubrics makes it easier to form judgments regarding the level of intellectual 
challenge in the pedagogy students experienced as part of their coursework.  However, 
other rigorous studies of this nature have been conducted without student work.  My 
design was particularly strong in that it linked tasks with classroom observations.  This 
enabled me to more accurately ascertain the teacher?s intent and to see how the teacher?s 
instructional approach might either add to or detract from the intellectual challenge 
associated with a particular task.  
Another limitation was the presence of interns in some of the study classrooms.  
Interns were required to teach a minimum of twenty days with the full load of classes.  
Some exceeded this amount based on guidance from their cooperating teacher.  This 
study did not include an evaluation of intern instruction to determine its level of 
authenticity.  It was therefore difficult to determine the impact of their instruction on 
student learning.  However, I had student data for each of the cooperating teachers from 
semesters where no intern was involved in instruction.  Also, one assumes that 
cooperating teachers supervised interns closely to ensure the standards they?ve 
 
222 
 
established for their course were met.  A teacher is likely to intervene if an intern is not 
teaching the things he/she believes are important.  However, in most cases one would not 
expect a novice teacher to provide the same level of authentic instruction as a skilled 
veteran. 
 My association with the study participants was another limitation.  I knew most   
of the teachers through professional development seminars and contacts associated with 
my assistantship (i.e. supervision of interns) at Auburn University.  The potential 
certainly existed for bias in rating.  Ideally, the second rater used for inter-rater reliability 
(IRR) would have no relationship with the teachers involved in the study.  This was the 
case when it came to achieving inter-rater reliability for tasks in this study since they 
were often evaluated by other SSIRC researchers.  However, I did not have the resources 
to train an outside researcher or provide compensation for travel to the observations.  
Instead, my advisor, Dr. Saye served as the second observer.  Dr. Saye and I came to a 
shared understanding of how to apply the rubrics to instruction as a result of the training 
associated with the SSIRC project.  I believe that this understanding significantly reduced 
the likelihood that a teacher would systematically be rated lower due to personal bias.        
 My association with the study teachers had another effect on this study.  It seemed 
at times that some teachers might have been trying to ?game the system? by turning in 
tasks they thought I?d like.  This is understandable, but an ideal scenario would involve 
teachers forming an independent judgment of what to submit based on a professional 
sense of what constitutes instructional quality.  While the study teachers were not familiar 
with authentic intellectual work per se, they did know of my association with curriculum 
development projects that adhered to a problem-based historical inquiry model.   Since 
 
223 
 
this model closely relates to AIW, some teachers were able to guess that I wanted tasks 
that challenged students to apply historical knowledge and think critically.  To the extent 
that teachers had inquiry tasks on hand, this might have inflated some task scores.  
However, it is hard to fake the standards associated with the instruction rubric.  Some 
teachers might have been better served (in terms of their authentic pedagogy score) by 
submitting tasks that better fit their comfort level to execute.                
 I was also limited by the number of blocks that I could reasonably observe.  I tried 
to not only vary the three observations across the course of a semester, but also the blocks 
that I observed.  If a teacher taught AP and regular courses, I tried to see lessons 
associated with each.  However, this was not always possible.  An assumption of this 
research is that a teacher does not vary his/her instruction significantly from block to 
block (or from year to year).   More extensive interviews (pre-post) would have possibly 
helped to determine if this assumption was valid for each participant in the study. 
It is difficult to make wide ranging generalizations from this study since it 
included a very limited sample of teachers and used outcome measures not found in other 
states.  The scoring of the higher order editorials was also very challenging given that the 
range in student performance was not always great.  Subtle distinctions and judgments 
had to sometimes be made to arrive at scores. The rubric could probably still be improved 
to enhance its reliability. 
 
Implications and Areas for Further Study 
The results of this study raise a number of questions and areas for further 
research.  First of all, it is perhaps troubling that authentic pedagogy is not more 
 
224 
 
prevalent in history classrooms. Most of the teachers in this study were classified as 
providing minimal or limited authentic pedagogy.  This is disconcerting since the study 
schools represented one of the best possible areas in the state to look for inquiry-based 
instruction (in terms of resources, reputation, professional development, etc.).  The results 
of this study are consistent with the broader SSIRC study that also documented relatively 
low levels of authentic pedagogy of most school settings (Social Studies Inquiry 
Research Collaborative, 2011). The implication of this from a policy standpoint is that 
considerable time, effort, and resources are likely needed to cultivate meaningful changes 
in teacher practice.  Various professional development initiatives have been implemented 
in the past to improve the capacity of teachers to provide authentic pedagogy.  More 
studies, like those conducted by Avery in the late 1990s, are needed to determine the 
most effective ways to help teachers not only conceptualize challenging tasks, but also 
provide students with the support they need to be successful (Avery, Kouneski, & 
Odendahl, 2001; Avery & Palmer, 1999).  The AIW scoring rubrics are a powerful tool 
that should be used more widely by districts and schools to improve instruction.      
Policy-makers and school officials may be unwilling to promote authentic 
pedagogy without greater evidence of its impact on student learning.  My study was too 
limited in scale to determine conclusively how authentic pedagogy influences student 
performance on standardized history tests.  I can tentatively conclude that it doesn?t hurt 
student achievement.  Further research is needed to confirm the relationship between 
authentic pedagogy and standardized tests in social studies.  The research needs to 
include a larger sample of teachers and a greater variety of standardized assessments.  It 
is possible that certain types of standardized social studies assessments are more likely to 
 
225 
 
reveal performance benefits from higher levels of authentic pedagogy.  Would the results 
of this study be different if the lower order measure was an end-of-course test instead of 
an assessment that included eleventh grade material the students hadn?t covered yet?  
Would it be different if the U.S. History National Assessment of Educational Progress 
(NAEP) was used as the achievement measure or a test from another subject area like 
Economics?   
The strongest claim made for adopting authentic pedagogy has been its potential 
for securing higher order learning outcomes.  Policy-makers would probably be willing to 
make a stronger commitment to this model if it could be tied to gains in 21st century 
skills.  The evidence of improved learning is not as strong in social studies as compared 
to other subject areas.  Researchers often don?t know what students in the ?control? 
classes are capable of doing, because they aren?t given the opportunity to complete 
authentic tasks in their more traditional classroom settings.  There are not many studies, 
like this one, that compare students who experience varying degrees of authentic 
pedagogy on a common task that requires higher order thinking.  Therefore, it is hard to 
say with certainty that inquiry-based instruction, as defined by the authentic pedagogy 
model, is more effective than instruction completely dominated by lecture or some other 
approach.  
The results of the higher order portion of this study are just as tentative as those 
that dealt with the graduation exam.  Analysis of the editorial assessments did not reveal 
a large performance benefit from higher levels of authentic pedagogy (the effect size was 
very small even if the results were statistically significant).  Students in all classes in the 
sample generally struggled with the task, whether they were regular or advanced. This 
 
226 
 
seems to support the view of critics who contend that students lack the foundational 
knowledge or developmental characteristics needed to complete higher level challenges.  
However, this finding can be misleading. What is perhaps not fully captured, because I 
was not able to use audio and video equipment during classroom observations, is the 
difference from a qualitative standpoint in what I observed in the study classrooms. When 
asked to complete tasks that required significant higher order thinking, many students in 
the moderate authentic pedagogy classes were able to rise to the occasion (a fact 
consistently noted in Newmann?s own research). Students in the minimal or limited 
classes either were not afforded the same opportunities or they did not respond as 
favorably for a number of reasons.  The assessment instruments that I created for this 
study were not as effective as I would have liked in measuring the range of higher order 
outcomes associated with authentic pedagogy.  The results might have been different had 
the assessment format allowed for the soft scaffolding and peer support students normally 
receive in a class setting.   
More research is needed to develop common higher order assessments that can be 
reliably scored under similar study conditions to evaluate the impact of different levels of 
authentic pedagogy on student performance.  They need to move beyond paper and pencil 
tasks to include scored discussions and other alternative projects likely to get at the sort 
of outcomes authentic pedagogy is designed to elicit.  A big problem is the overloaded 
testing schedule at most schools. Researchers should continue to focus on states, like 
Washington, that already use classroom based assessments (CBAs) as part of their 
accountability program.  Ideally, partnerships between teachers, researchers, and other 
stakeholders would result in the development of innovative and authentic CBAs.  Then 
 
227 
 
researchers could analyze student performance over a larger geographic area in relation to 
the instruction they received.  The work on ?rich tasks? in Queensland, Australia 
provides a good model for this sort of endeavor.         
Another area related to this study that deserves follow-up related to this study is 
the compounding effect of authentic pedagogy.  This study really doesn?t provide a 
definitive answer to the question of whether student achievement improves with multiple 
courses at the moderate authentic pedagogy level.  The performance trend was positive, 
but it could have been due to chance or other factors discussed in the last chapter.  
Longitudinal studies, with a larger sample of teachers and students, are needed to further 
investigate this question.  This represents a significant research challenge since the 
difficulty associated with achieving top scores on the AIW rubrics is well documented.  It 
will likely take some effort to locate a suitable setting where a substantial sample of 
students experiences a succession of courses at the moderate authentic pedagogy level or 
higher.   
 Several additional areas are also worthy of additional study.  The Gates 
Foundation studies separated their analysis of the tasks provided by teachers to allow for 
two variables: rigor and relevance.  They examined whether a task?s rigor or relevance 
played a bigger role in influencing student achievement (AIR/SRI, 2007).  The rigor 
variable was similar to Newmann?s construction of knowledge standard.  The relevance 
variable measured the extent to which students could have a voice and influence in what 
they were being asked to do, whether the task connected to the real world, and if it 
involved something adults might be plausibly asked to do (AIR/SRI, 2006).  While the 
Gate?s researchers found a task?s rigor to be more directly correlated to quality study 
 
228 
 
work in English and math, this might be different in a study focused on history learning 
outcomes.  I believe social studies researchers should more closely examine the relevance 
(?connectedness to students? lives) portion of the authentic intellectual work model. The 
connectedness scores in my study were noticeably low.  Most teachers did not attempt to 
make explicit connections between the historical topic they were studying and 
contemporary issues or events.  To what extent would student achievement have been 
greater in this study if modifications were made to improve the scores associated with 
just this one particular standard?  What is the unique impact of making a task more 
relevant to a student?s life when it comes to history? 
Finally, future studies would probably benefit from collecting tasks, observing 
instruction, and analyzing student work.  While this can be challenging to accomplish, it 
would likely provide the best picture of what students experienced in their social studies 
class and it would allow for the most precise classification of teachers along the authentic 
pedagogy continuum.  The classification of teachers is important since it forms the basis 
for analyzing student learning outcomes and really determining the impact of authentic 
pedagogy. 
Conclusion 
 This study aimed to better understand the learning outcomes associated with 
authentic pedagogy.  Numerous studies dealing with the authentic intellectual work 
construct have suggested that teachers who assign more challenging work to their 
students receive products of higher quality when compared to teachers who don?t offer 
their students the same types of opportunities.  These studies, however, have often dealt 
with subjects other than social studies.  Very few of them investigate how authentic 
 
229 
 
pedagogy influences student performance on standardized tests.  This study attempted to 
address a need in the field by examining the impact of authentic intellectual work on 
student achievement in history.  The results of the study suggest that authentic pedagogy 
does have a positive influence on student learning, but not to the extent demonstrated by 
most of Newmann?s studies.  However, there is room for cautious optimism.  This study 
suggests that student performance on high-stakes tests is not compromised when teachers 
utilize more in-depth, inquiry oriented instructional approaches.  The positive impact of 
authentic pedagogy may grow for students who experience multiple classes that reach at 
least the moderate level as defined in this study.  The effects of authentic pedagogy are 
equitably distributed among most significant sub-groups of students within schools (i.e. 
gender, race, SES).  These findings will be revisited as data from this study are analyzed 
in conjunction with the larger Social Studies Inquiry Research Collaborative project.   
Hopefully, as the pilot study for this effort, my research will contribute in a small way 
towards providing teachers with useful information that will help them to improve their 
practice and better serve students. 
 
 
230 
 
References 
 
Achieve Inc. (2004). Do graduation exams measure up?  A closer look at state high 
school exit exams. Washington, D.C.: Achieve, Inc. 
Aikin, W. M. (1942). The story of the Eight-Year Study. New York: Harper. 
AIR/SRI. (2004). Exploring assignments, student work, and teacher feedback in 
reforming high schools:  2002-03 data from Washington State.   Retrieved Mar. 8, 
2008, from http://www.air.org/expertise/index/?fa=viewContent&content_id=300 
AIR/SRI. (2006). Evaluation of the Bill & Melinda Gates Foundation's high school grants 
initiative:  2001-2005 final report.   Retrieved March 8, 2008, from 
http://www.gatesfoundation.org/learning/Documents/Year4EvaluationAIRSRI.pd
f 
AIR/SRI. (2007). Changes in rigor, relevance, and student learning in redesigned high 
schools:  An evaluation for the Bill and Melinda Gates Foundation.   Retrieved 
March 8, 2008, from http://www.air.org/reports-
products/index.cfm?fa=viewContent&content_id=295
 
231 
 
Alabama Department of Education. (2009a). Chief State School Officer's Report for 
Alabama High School Graduation Exam.   Retrieved August 14, 2009, from 
http://www.alsde.edu/Accountability/2009Reports/CSSO/CSSOAHSGE.2009.pdf
?lstSchoolYear=7&lstReport=2009Reports%2FCSSO%2FCSSOAHSGE.2009.pd
f 
Alabama Department of Education. (2009b). Process used to determine cut scores for the 
Alabama High School Graduation Exam.   Retrieved Sept. 20, 2009, from 
http://www.alsde.edu/text/sections/documents.asp?section=91&sort=1&footer=se
ctions 
Amosa, W., Ladwig, J., Griffiths, T., & Gore, J. (2007). Equity effects of quality 
teaching:  Closing the gap. Paper presented at the Australian Association for 
Research in Education Conference, Fremantle.  
Armstrong, N. (1970). The effect of two instructional inquiry strategies on critical 
thinking and achievement in eighth-grade social studies.  (Unpublished doctoral 
dissertation): Indiana University, Bloomington, IN. 
Avery, P. G. (1999). Authentic instruction and assessment. Social Education, 65(6), 368-
373. 
Avery, P. G., Bird, K., Johnstone, S., Sullivan, J. L., & Thalhammer, K. (1992). 
Exploring political tolerance with adolescents. Theory and Research in Social 
Education, 20(4), 386-420. 
 
232 
 
Avery, P. G., Kouneski, N. P., & Odendahl, T. (2001). Authentic pedagogy seminars:  
Renewing our commitment to teaching and learning. The Social Studies 
(May/June), 97-101. 
Avery, P. G., & Palmer, E. (1999). Professional development for authentic pedagogy in 
the social studies:  An evaluation. Minneapolis: The Center for Applied Research 
in Educational Improvement. 
Bain, R. (2000). Into the breach:  Using research and theory to shape history instruction. 
In P. Stearns, P. Seixas & S. Wineburg (Eds.), Knowing, teaching, and learning 
history:  National and international perspectives (pp. 331-353). New York, NY: 
University Press. 
Baldi, S., Perie, M., Skidmore, D., Greenberg, E., & Hahn, C. (2001). What democracy 
means to ninth graders:  U.S. results from the International IEA civic education 
study. Washington, D.C.: U.S. Department of Education, National Center for 
Education Statistics. 
Barratt, T. K. (1964). A comparison of effects upon selected areas of pupil learning of 
two methods of teaching United States history to eleventh grade students. 
(Unpublished doctoral dissertation). 
Bartlett, F. C. (1932). Remembering. Cambridge, MA: Harvard University Press. 
Barton, K. (1997). "I just kinda know":  Elementary students' ideas about historical 
evidence. Theory and Research in Social Education, 25(4), 407-430. 
 
233 
 
Barton, K. C. (2008). Research on students' ideas about history. In L. S. Levstik & C. A. 
Tyson (Eds.), Handbook of Research in Social Studies Education (pp. 239-258). 
New York: Routledge. 
Barton, K. C., & Levstik, L. S. (2004). Teaching History for the Common Good. 
Mahwah: Lawrence Erlbaum Associates. 
Bayles, E. E. (1956). Experiments with reflective teaching. In Kansas studies in 
education (pp. 32). Lawrence, KA: University of Kansas Publications. 
Bennett, W. J. (1992). The de-valuing of America:  The fight for our culture and our 
children. New York, NY: Simon and Schuster. 
Berlak, H., Newmann, F. M., Adams, E., Archbald, D. A., Burgess, T., Raven, J., et al. 
(1992). Toward a new science of educational testing and assessment. Albany, 
NY: SUNY Press. 
Boote, D. N., & Beile, P. (2005). Scholars before researchers: On the centrality of the 
dissertation literature review in research preparation. Educational Researcher, 
34(6), 3-15. 
Bransford, J. D., Brown, A.L., Cocking, R.R. (Ed.). (2000). How people learn:  Brain, 
mind, experience, and school. Washington, D.C.: National Academy Press. 
Braun, H. I. (2005). Using student progress to evaluate teachers:  A primer on value-
added models. Princeton, NJ: Educational Testing Service. 
 
234 
 
Brooks, J. G., & Brooks, M. G. (1993). In search of understanding:  The case for 
constructivist classrooms. Alexandria, VA: Association for Supervision and 
Curriculum Development. 
Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of 
learning. Educational Researcher, 18(1), 32-42. 
Bruner, J. S. (1960). The process of education. Cambridge: Harvard University Press. 
Byungro, S. (1991). The comparative effects of problem-solving instruction and 
conventional expository instruction on students' acquisition, retention, and 
structuring of knowledge in high school social studies. (Unpublished doctoral 
dissertation). University of Georgia, Athens, GA. 
Cheney, L. (1994). The end of history. Wall Street Journal. 
Chenoweth, R. W. (1953). The development of certain habits of reflective thinking. 
(Unpublished doctoral dissertation). University of Illinois, Urbana, IL. 
Cizek, G. J. (1991a). Effusion confusion:  A re-joinder to Wiggins. Phi Delta Kappan, 
73, 150-153. 
Cizek, G. J. (1991b). Innovation or ennervation?  Performance assessment in perspective. 
Phi Delta Kappan, 72(9), 695-699. 
Cognition and Technology Group at Vanderbilt. (1990). Anchored instruction and its 
relationship to situated cognition. Educational Researcher, 19(5), 2-10. 
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. 
 
235 
 
Conley, D. (2003). Mixed messages: What state high school tests communicate about 
student readiness for college. Eugene, OR: Center for Educational Policy 
Research, University of Oregon. 
Cornbleth, C. (1985). Critical thinking and cognitive processes. In W. B. Stanley (Ed.), 
Review of research in social studies education:  1976-1983.  Bulletin No. 75 (pp. 
11-63). Washington, D.C.: National Council for the Social Studies. 
Cousins, J. E. (1962). The development of reflective thinking in an eighth grade social 
studies class.  (Unpublished doctoral dissertation). Indiana University, 
Bloomington, IN. 
Cox, B. C. (1961). A description and appraisal of a reflective method of teaching United 
States history.  (Unpublished doctoral dissertation). Indiana University, 
Bloomington, IN. 
Cronin, J., Dahlin, M., Adkins, D., & Kingsbury, G. G. (2007). The proficiency illusion. 
Washington, D.C.: Thomas B. Fordham Institute. 
Curtis, C. K., & Shaver, J. P. (1980). Slow learners and the study of contemporary 
problems. Social Education, 44, 302-309. 
D'Agostino, J. V. (1996). Authentic instruction and academic achievement in 
compensatory education classrooms. Studies in Educational Evaluation, 22(2), 
139-155. 
 
236 
 
Daugherty, R. (2004). Getting high school graduation test policies right in SREB states. 
Atlanta, GA: Southern Regional Education Board. 
De La Paz, S. (2005). Effects of historical reasoning instruction and writing strategy 
mastery in culturally and academically diverse middle school classrooms. Journal 
of Educational Psychology, 97(2), 139-156. 
Delpit, L. (1995). Other people's children:  Cultural conflict in the classroom. New 
York: The New Press. 
Dewey, J. (1910). How we think. New York: D.C. Heath. 
Dewey, J. (1938). Experience & education. New York: Simon & Schuster. 
Dimond, S. E. (1948). The Detroit citizenship study. Social Education, 12, 356-358. 
Dodge, O. N. (1966). Generalization and concept development as an instructional 
method for eighth grade social studies. (Unpublished doctoral dissertation). 
Montana State University, Bozeman, MT. 
Doyle, W. (1983). Academic work. Review of Educational Research, 53(2), 159-199. 
Education Queensland. (2004). The new basics research report.   Retrieved Jan. 15, 2008, 
from http://education.qld.gov.au/corporate/newbasics/html/library.html#resreport 
Elias, G. S. (1958). An experimental study of teaching methods in ninth grade social 
studies classes (civics). (Unpublished doctoral dissertation). Boston University, 
Boston, MA. 
 
237 
 
Elsmere, R. T. (1961). An experimental study utilizing the problem-solving approach in 
teaching United States history (Unpublished doctoral dissertation). Indiana 
University, Bloomington, IN. 
Elsmere, R. T. (1963). An experimental study utilizing the problem-solving approach in 
teaching United States History. Bulletin of the School of Education Indiana 
University, 39(3), 114-139. 
Engle, S. H. (1960). Decision making:  The heart of social studies instruction. Social 
Education, 24(6), 301-304, 306. 
Evans, R. W. (2004). The social studies wars:  What should we teach the children?  . 
New York: Teachers College Press. 
Feilzer, M. Y. (2010). Doing mixed methods research pragmatically:  Implications for the 
rediscovery of pragmatism as a research paradigm. Journal of Mixed Methods 
Research, 4(1), 6-16. 
Fenton, E. (1967). The new social studies. New York: Holt, Rinehart and Winston, Inc. 
Ferretti, R. P., MacArthur, C. D., & Okolo, C. M. (2001). Teaching for historical 
understanding in inclusive classrooms. Learning Disability Quarterly, 24, 59-71. 
Finn, C. E., Jr. (2003). Foreward in S.M. Stern, M. Chesson, M.B. Klee, & L. Spoehr 
(Eds.), Effective state standards for U.S. history:  A 2003 report card (pp. 5-8):  
Thomas B. Fordham Institute. 
 
238 
 
Foster, S. J., & Yeager, E. A. (1999). "You've got to put together the pieces":  English 
12-year-olds encounter and learn from historical evidence. Journal of Curriculum 
and Supervision, 14, 286-317. 
Frankville, D. D. (1969). An evaluation of two methods of teaching American history in 
grade eleven. (Unpublished doctoral dissertation). United States International 
University, San Diego, CA. 
Frazee, B., & Ayers, S. (2003). Garbage in, garbage out:  Expanding environments, 
constructivism, and content knowledge in social studies. In J. Leming, L. 
Ellington & K. Porter (Eds.), Where did social studies go wrong? Washington, 
D.C.: Thomas B. Fordham Foundation. 
Gabella, M. S. (1994). Beyond the looking glass:  Bringing students into the conversation 
of historical inquiry. Theory and Research in Social Education, XXII(3), 340-363. 
Gallagher, S. A., & Stepien, W. J. (1996). Content acquisition in problem-based learning:  
Depth versus breadth in American studies. Journal for the Education of the 
Gifted, 19(3), 257-275. 
Gaudelli, W. (2006). The future of high-stakes history assessment:  Possible scenarios, 
potential outcomes. In S. G. Grant (Ed.), Measuring history:  Cases of high-stakes 
testing across the United States. Greenwich, Conn: Information Age Publishing. 
Glaser, E. M. (1941). An experiment in the development of critical thinking (Vol. 843). 
New York: Teachers College, Columbia University. 
 
239 
 
Goodlad, J. (1984). A place called school:  Prospects for the future. New York: McGraw-
Hill. 
Gore, J. M., Ladwig, J. G., Lingard, R., & Luke, A. (2001). Final report of the 
Queensland school reform longitudinal study.   Full report available from 
Education Queensland. 
Grant, S. G. (2001a). It's just the facts, or is it?  The relationship between teacher's 
practices and students' understandings of history. Theory and Research in Social 
Education, 29(1), 65-108. 
Grant, S. G. (2001b). An uncertain lever:  Exploring the influence of state-level testing in 
New York state on teaching social studies. Teachers College Record, 103(3), 398-
426. 
Grant, S. G. (2005). More journey than end:  A case study of ambitious teaching. In O. L. 
Davis & E. Yeager (Eds.), Wise social studies in an age of high-stakes testing (pp. 
117-130). Greenwich, CT: Information Age. 
Grant, S. G., Derme-Insinna, A., Gradwell, J. M., Lauricella, A. M., Pullano, L., & 
Tzetzo, K. (2002). Juggling two sets of books:  A teacher responds to the new 
global history exam. Journal of Curriculum and Supervision, 17(3), 232-255. 
Grant, S. G., & Horn, C. (2006). The state of state-level history testing. In S. G. Grant 
(Ed.), Measuring cases:  Cases of high-stakes testing across the United States. 
Greenwich, Conn.: Information Age Publishing. 
 
240 
 
Greene, J. C. (2008). Is mixed methods social inquiry a distinctive methodology? Journal 
of Mixed Methods Research, 2(1), 7-22. 
Greeno, J. G., Collins, A. M., & Resnick, L. B. (1996). Cognition and learning. In D. C. 
Berliner & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 1071). 
New York: Simon & Schuster Macmillan. 
Gross, R. E., & McDonald, F. (1958). The problem-solving approach. Phi Delta 
Kappan(March), 259-265. 
Hahn, C. L. (1991). Controversial issues in social studies. In J. P. Shaver (Ed.), 
Handbook of research on social studies teaching and learning (pp. 470-479). 
New York: MacMillan  
Hahn, C. L., & Tocci, C. M. (1990). Classroom climate and controversial issues 
discussions: A five nation study. Theory and Research in Social Education, 
XVIII(4), 344-362. 
Harmon, L. G. (2006). The effects of an inquiry-based American history program on the 
achievement of middle school and high school students. (Unpublished doctoral 
dissertation). University of North Texas, Denton, TX. 
Hartzler-Miller, C. (2001). Making sense of "best practice" in teaching history. Theory 
and Research in Social Education, 29(4), 672-695. 
Henderson, K. B. (1958). The teaching of critical thinking. Phi Delta Kappan, 39, 280-
282. 
 
241 
 
Hess, D. (2008a). Controversial issues and democratic discourse. In L. S. Levstik & C. A. 
Tyson (Eds.), Handbook of research in social studies education (pp. 124-136). 
New York: Routledge. 
Hess, D., & Posselt, J. (2002). How high school students experience and learn from the 
discussion of controversial issues. Journal of Curriculum and Supervision, 17(4), 
283-314. 
Hess, F. M. (2008b). Still at risk:  What student's don't know, even now. Washington, DC: 
Common Core. 
Hirsch, E. D., Jr. (1988). Cultural literacy:  What every American needs to know. Boston: 
Houghton Mifflin. 
Hirsch, E. D., Jr. (2009-2010). The anti-curriculum movement:  Tragically and 
unintentionally, it's really an anti-equality movement. American Educator, 33(4), 
10-11. 
Hunkin, F. P. (1967). Influence of analysis and evaluation questions on critical thinking 
and achievement in sixth grade social studies. (Unpublished doctoral 
dissertation).  
Hyram, G. A. (1957). Experiment in developing critical thinking in children. Journal of 
Experimental Education, 26. 
Johnson, F. A. (1961). Depth vs. breadth in teaching American history. (Unpublished 
doctoral dissertation). University of Minnesota, Minneapolis, MN. 
 
242 
 
Johnson, R. B., Onwuegbuzie, A. J., & Turner, L. A. (2007). Toward a definition of 
mixed methods research. Journal of Mixed Methods Research, 1(2), 112-133. 
Johnston, J., Anderman, E., Milne, L., Klenk, L., & Harris, D. (1994). Improving civic 
discourse in the classroom:  Taking the measure of Channel One (Research 
Report 4). Ann Arbor, MI: University of Michigan. 
Kahne, J., Rodriguez, M., Smith, B. A., & Thiede, K. (2000). Developing citizens for 
democracy?  Assessing opportunities to learn in Chicago's social studies 
classrooms. Theory and Research in Social Education, 28, 318-330. 
Kahne, J. E., & Sporte, S. E. (2008). Developing citizens:  The impact of civic learning 
opportunities on students' commitment to civic participation. American 
Educational Research Journal, 45(3), 738-766. 
Kantrowitz, B., & Wingert, P. (2006, May 8). America's best high schools, 2006. 
Newsweek, 147, 50-54. 
Kight, S. S., & Mickelson, J. M. (1949). Problems vs. subject. The Clearing House, 24, 
3-7. 
King, M. B., Schroeder, J., & Chawszczewski, D. (2001). Authentic assessment and 
student performance in inclusive schools, Brief #5, Research Institute on 
Secondary Education Reform (RISER) for Youth with Disabilities Brief. Madison, 
WI: University of Wisconsin-Madison. 
 
243 
 
King, P. M., & Kitchener, K. S. (1994). Developing reflective judgment:  Understanding 
and promoting intellectual growth and critical thinking in adolescents and adults. 
San Francisco, CA: Jossey-Bass Publishers. 
Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Why minimal guidance during 
instruction does not work:  An analysis of the failure of constructivist, discovery, 
problem-based, experiential, and inquiry-based teaching. Educational 
Psychologist, 41, 75-86. 
Klentschy, M., Garrison, L., & Amaral, O. (2001). Valle Imperial Project in Science 
(VIPS):  Four-year comparison of student achievement data, 1995-1999. El 
Centro, CA: El Centro School District. 
Koedel, C., & Betts, J. (2009). Does student sorting invalidate value-added models of 
teacher effectiveness?  An extended analysis of the Rothstein critique: Department 
of Economics, University of Missouri. 
Koh, K., Kim, & Luke, A. (2009). Authentic and conventional assessment in Singapore 
schools:  An empirical study of teacher assignments and student work. Assessment 
in Education:  Principles, Policy, and Practice, 16(3), 291-318. 
Koh, K., Lee, A. N., Tan, W., Wong, H. M., Guo, L., Lim, T. M., et al. (2005). Looking 
collaboratively at the quality of teachers' assessment tasks and student work in 
Singapore schools. Paper presented at the International Association for 
Educational Assessment Conference, Singapore. 
 
244 
 
Kohlmeier, J. (2005). The impact of having 9th graders "do history". The History 
Teacher, 38(4), 499-524. 
Kohlmeier, J. (2006). "Couldn't she just leave?":  The relationship between consistently 
using class discussions and the development of historical empathy in a 9th grade 
world history course. Theory and Research in Social Education, 34(1), 34-57. 
Kornhaber, M. L. (2004). Appropriate and inappropriate forms of testing, assessment, 
and accountability. Educational Policy, 18(1), 45-70. 
Kozma, R. B. (2008). 21st Century Skills, education, and competitiveness:  A resource 
and policy guide.   Retrieved May 15, 2009, from 
http://www.txccrs.org/downloads/Partnership_21stCenturySkills.pdf 
Ladwig, J. G., Smith, M., Gore, J., Amosa, W., & Griffiths, T. (2007). Quality of 
pedagogy and student achievement:  Multi-level replication of authentic 
pedagogy. Paper presented at the Australian Association for Research in 
Education Conference, Fremantle. 
Lake Corporate Consulting. (2006). Standardised literacy and numeracy scores and 
'doing' the New Basics.   Retrieved Feb. 18, 2008, from 
http://education.qld.gov.au/corporate/newbasics/pdfs/litnum_rpt.pdf 
Lambert, R. A. (1980). Effects of moral education strategies on increased subject matter 
content of secondary school social studies students.  (Unpublished doctoral 
dissertation). Catholic University of America, Washington, D.C.. 
 
245 
 
Larson, B. E. (2003). Comparing face-to-face discussion and electronic discussion:  A 
case study from high school social studies. Theory and Research in Social 
Education, 31(3), 347-365. 
Laws of Florida. (2006). Ch. 2006-74 (House Bill 7087), item 1003.42.2.f, signed June 5, 
2006. from http://laws.flrules.org/files/Ch_2006-074.pdf 
Lee, M. A. (1967). Development of inquiry skills in ungraded social studies classes in a 
junior high school (Unpublished doctoral dissertation). Indiana University, 
Bloomington, IN. 
Lee, P., & Ashby, R. (2000). Progression in historical understanding among students ages 
7-14. In P. N. Stearns, P. Seixas & S. Wineburg (Eds.), Knowing, teaching, and 
learning history:  National and international perspectives (pp. 199-222). New 
York: New York University Press. 
Lee, V. E., & Smith, J. B. (1995). Effects of high school restructuring and size on early 
gains in achievement and engagement. Sociology of Education, 68(4), 241-270. 
Lee, V. E., Smith, J. B., & Croninger, R. G. (1997). How high school organization 
influences the equitable distribution of learning in science and mathematics. 
Sociology of Education, 70(April), 128-150. 
Lee, V. E., Smith, J. B., & Newmann, F. M. (2001). Instruction and achievement in 
Chicago elementary schools. Chicago, IL: Consortium on Chicago School 
Research. 
 
246 
 
Leming, J. S. (2003). Ignorant Activists:  Social change, "higher order thinking," and the 
failure of social studies. In J. Leming, L. Ellington & K. Porter (Eds.), Where Did 
Social Studies Go Wrong? (pp. 124-142). Washington, D.C.: Thomas B. Fordham 
Foundation. 
Levin, M., Newmann, F. M., & Oliver, D. (1969). A law and social science curriculum 
based on the analysis of public issues (No. Final Report project no. HS 058. Grant 
no. OE 310142). Washington, D.C.: Department of Health, Education, and 
Welfare. 
Levstik, L. S. (2008). What happens in social studies classrooms?  Research on K-12 
social studies practice. In L. S. Levstik & C. A. Tyson (Eds.), Handbook of 
research in social studies education. New York, NY: Taylor and Francis. 
Lipka, R. P., Lounsbury, J. H., Toepfer, C. F., Jr., Vars, G. F., Alessi, S. P., & Kridel, C. 
(1998). The Eight-Year Study revisited:  Lessons from the past for the present. 
Columbus, OH: National Middle School Association. 
Mackenzie, A. W., & White, R. T. (1982). Fieldwork in geography and long-term 
memory structures. American Educational Research Journal, 19, 623-632. 
Madden, J. R. (1970). The relationship between the use of an inquiry teaching technique 
in a social studies classroom and the attitude of students toward the social studies 
course. (Unpublished doctoral dissertation).  Syracuse University, Syracuse, NY. 
 
247 
 
Massialas, B. G. (1961). Description and analysis of teaching a high school course in 
World History (Unpublished doctoral dissertation). Indiana University, 
Bloomington, IN. 
Massialas, B. G. (1963). The Indiana experiments in inquiry:  Social studies. Bulletin of 
the School of Education, Indiana University, 39(3). 
Massialas, B. G., & Cox, C. B. (1966). Inquiry in social studies. New York: McGraw-
Hill Book Company. 
McDevitt, M., & Kiousis, S. (2006). Experiments in political socialization:  Kids Voting 
USA as a model for civic education reform. Circle Working Paper 49.    
McNeil, L. (1986). Contradictions of control:  School structure and school knowledge. 
New York: Routledge and Kegan Paul. 
McNeil, L., & Valenzuela, A. (2001). The harmful impact of the TAAS system of testing 
in Texas:  Beneath the accountability rhetoric. In G. Orfield & M. L. Kornhaber 
(Eds.), Raising Standards or Raising Barriers?  Inequality and High-Stakes 
Testing in Public Education (pp. 127-150). New York: Century Foundation Press. 
Metcalf, L. E. (1963). Research on teaching the social studies. In N. L. Gage (Ed.), 
Handbook of Research on Teaching. Chicago, IL: Rand McNally & Company. 
Monte-Sano, C. (2008). Qualities of historical writing instruction:  A comparative case 
study of two teachers' practices. American Educational Research Journal, 45(4), 
1045-1079. 
 
248 
 
Morton, J. B. (2004). Alabama course of study:  Social studies (Bulletin 2004, No. 18): 
Alabama Department of Education. 
Morton, J. B. (2009). The handbook of administrative procedures for the Alabama High 
School Graduation Exam.   Retrieved May 17, 2010, from 
https://docs.alsde.edu/documents/91/Handbook%20of%20Administrative%20Pro
cedures%20for%20the%20AHSGE%202009.pdf 
Nash, G. (1995). The history children should study. Chronicle of Higher Education, 
XLI(32), A60. 
National Council for the Social Studies. (1994). Expectations of excellence: Curriculum 
standards for social studies. Silver Spring, MD: National Council for the Social 
Studies. 
Newmann, F. M. (1990). A test of higher order thinking in social studies:  Persuasive 
writing on constitutional issues using the NAEP approach. Social Education, 54, 
369-373. 
Newmann, F. M. (1991a). Classroom thoughtfulness and students' higher order thinking:  
Common indicators and diverse social studies courses. Theory and Research in 
Social Education, XIX(4), 410-433. 
Newmann, F. M. (1991b). Higher order thinking in the teaching of social studies:  
Connections between theory and practice. In J. F. Voss, D. N. Perkins & J. W. 
Segal (Eds.), Informal reasoning and education. Mahwah, NJ: Lawrence 
Erlbaum. 
 
249 
 
Newmann, F. M. (1991c). Promoting higher order thinking in social studies:  Overview 
of a study of 16 high school departments. Theory and Research in Social 
Education, XIX(4), 324-340. 
Newmann, F. M., & Archbald, D. A. (1988). The functions of assessment and the nature 
of authentic academic achievement. In A. Berlak (Ed.), Assessing achievement: 
Toward the development of a new science of educational testing. Buffalo, NY: 
SUNY. 
Newmann, F. M., & Associates. (1996). Authentic achievement:  Restructuring schools 
for intellectual quality. San Francisco: Jossey-Bass. 
Newmann, F. M., Bryk, A. S., & Nagaoka, J. K. (2001). Authentic intellectual work and 
standardized tests:  Conflict or coexistence? Chicago: Consortium on Chicago 
School Research. 
Newmann, F. M., King, M. B., & Carmichael, D. L. (2007). Authentic instruction and 
assessment:  Common strategies for rigor and relevance in teaching academic 
subjects: Prepared for the Iowa Department of Education. 
Newmann, F. M., Lopez, G., & Bryk, A. S. (1998). The quality of intellectual work in 
Chicago Schools:  A baseline report. Chicago: Consortium on Chicago School 
Research. 
Newmann, F. M., Marks, H. M., & Gamoran, A. (1996). Authentic pedagogy and student 
performance. American Journal of Education, 104(4), 280-312. 
 
250 
 
Newmann, F. M., & Oliver, D. (1970). Clarifying public controversy:  An approach to 
social studies. Boston: Little, Brown. 
Newmann, F. M., Secada, W. G., & Wehlage, G. G. (1995). A guide to authentic 
instruction and assessment:  Vision, standards, and scoring. Madison: Center on 
Organization and Restructuring of Schools, Wisconsin Center for Education 
Research, University of Wisconsin. 
Newmann, F. M., Wehlage, G. G., & Lamborn, S. D. (1992). The significance and 
sources of student engagement. In F.Newmann (Ed.), Student Engagement and 
Achievement in American Secondary Schools (pp. 11-39). New York: Teachers 
College Press. 
No Child Left Behind (NCLB) Act of 2001, 20 U.S.C.A. & 6301 et seq. (West 2003)  
Noel, R. C. (1996). The "authentic pedagogy" study.  Review No. One. Retrieved May 
14, 2006, from http://www.mathematically.correct.com/qed.htm 
Nuthall, G., & Alton-Lee, A. (1995). Assessing classroom learning:  How students use 
their knowledge and experience to answer classroom achievement test questions 
in science and social studies. American Educational Research Journal, 32(1), 
185-223. 
Nystrand, M., & Gamoran, A. (1990). Student engagement:  When recitation becomes 
conversation (No. ED 323 581). Madison, WI: National Center on Effective 
Secondary Schools. 
 
251 
 
Oakes, J. (2005). Keeping track:  How schools structure inequality (2nd ed.). New 
Haven, CT: Yale University Press. 
Oliver, D., & Shaver, J. P. (1966). Teaching public issues in the high school. Boston: 
Houghton Mifflin. 
Onosko, J. J. (1991). Barriers to the promotion of higher order thinking in social studies. 
Theory and Research in Social Education, XIX(4), 341-366. 
Osborne, J., & Waters, E. (2002). Four assumptions of multiple regression that 
researchers should always test [Electronic Version]. Practical Assessment, 
Research, and Evaluation, 8. Retrieved Sept. 7, 2008 from 
http://pareonline.net/getvn.asp?v=8&n=2 
Parker, W. C. (1991). Achieving thinking and decision-making objectives in social 
studies. In J. P. Shaver (Ed.), Handbook of research on social studies teaching 
and learning. New York: Macmillan. 
Parker, W. C. (1996). Introduction:  Schools as laboratories of democracy. In W. Parker 
(Ed.), Educating the Democratic Mind (pp. 1-22). New York: State University of 
New York Press. 
Parker, W. C., Mueller, M., & Wendling, L. (1989). Critical reasoning on civic issues. 
Theory and Research in Social Education, 17(1), 7-32. 
Partnership for 21st Century Skills. (2007). Beyond the three Rs:  Voter attitudes toward 
21st Century Skills. Tucson, AZ: Partnership for 21st Century Skills. 
 
252 
 
Patton, M. Q. (1987). How to use qualitative methods in evaluation. Newbury Park, CA: 
Sage Publications. 
Paul Gagnon and the Bradley Commission on History in Schools (Ed.). (1989). 
Historical literacy:  The case for history in American education. New York: 
Macmillan. 
Peters, C. C. (1948). Teaching high school history and social studies for citizenship 
training:  The Miami experiment in democratic, action-centered education. Coral 
Gables, FL: University of Miami bookstore. 
Piaget, J. (1952). The origins of intelligence in children (M. Cook, Trans.). New York: 
International Universities Press, Inc. 
Pink, D. (2008). Tom Friedman on education in the 'flat world': A discussion with author 
Daniel Pink on curiosity, passion and the politics of school reform in the global 
marketplace (Interview). School Administrator, 65(2), 12. 
Quillen, I. J., & Hanna, L. A. (1948). Education for social competence. Chicago, IL: 
Scott Foresman. 
Ravitch, D., & Finn, C. E. (1987). What do our 17-year-olds know? New York: Harper & 
Row Publishers. 
Rehage, K. J. (1951). A comparison of pupil-teacher and teacher-directed procedures in 
eighth grade social studies classes. Journal of Educational Research, 45, 111-115. 
Resnick, L. B. (1987). Learning in school and out. Educational Researcher, 16(9), 13-20. 
 
253 
 
Richardson, E. (2000). Social studies items specifications for the Alabama High School 
Graduation Exam (Bulletin 2000, No. 49): Alabama Department of Education. 
RISER. (2000). Authentic instruction scoring manual. Madison, WI: Wisconsin Center 
for Education Research. 
Roelofs, E., & Terwel, J. (1999). Constructivism and authentic pedagogy:  State of the art 
and recent developments in the Dutch national curriculum in secondary education. 
Journal of Curriculum Studies, 31(2), 201-227. 
Rogers, C., & Freiburg, H. J. (1994). Freedom to learn. New York: Macmillan College 
Publishing Company. 
Rose, H. P. (1970). The relationship between methods used to teach American history 
and changes in attitude and achievement. (Unpublished doctoral dissertation). 
United International University. 
Rossi, J. A. (1995). In-depth study in an issues-oriented social studies classroom. Theory 
and Research in Social Education, XXIII(2), 88-120. 
Rossi, J. A. (1998). Issues-centered instruction with low-achieving high school students:  
The dilemmas of two teachers. Theory and Research in Social Education, 26(3), 
380-409. 
Rothstein, A. (1960). An experiment in developing critical thinking through the teaching 
of American history. (Unpublished doctoral dissertation). New York University, 
New York.  
 
254 
 
Rothstein, R. (2004). We are not ready to assess history performance [Electronic 
Version]. The Journal of American History, 90. Retrieved Dec. 16, 2008 from 
http://www.historycooperative.org. 
Rothstein, R. (2009). Replacing No Child Left Behind [Electronic Version]. Education 
Week, 28, 28-29. Retrieved August 11, 2009. 
Rumelhart, D. E. (1980). Schemata:  The building blocks of cognition. In R. J. Spiro, B. 
C. Bruce & W. F. Brewer (Eds.), Theoretical issues in reading comprehension. 
Hillsdale, NJ: Lawrence Erlhaum. 
Saxe, D. W. (2003). Patriotism versus multiculturalism in times of war. Social Education, 
67(2), 107-109. 
Saye, J. W., & Brush, T. (1999a). Student engagement with social issues in a multimedia-
supported learning environment. Theory and Research in Social Education, 27(4), 
472-504. 
Saye, J. W., & Brush, T. (1999b). Student reasoning about ill-structured social problems 
in a multimedia-supported learning environment. Paper presented at the Annual 
Meeting of the National Council for the Social Studies, Orlando, FL. 
Saye, J. W., & Brush, T. (2002). Scaffolding critical reasoning about history and social 
issues in multimedia-supported learning environments [Electronic Version]. 
Educational Technology Research and Development, 50, 77-96. 
 
255 
 
Saye, J. W., & Brush, T. (2004). Promoting civic competence through problem-based 
history learning environments. In G. E. Hamot, J. J. Patrick & R. S. Leming 
(Eds.), Civic learning in teacher education:  International perspectives on 
education for democracy in the preparation of teachers (Vol. 3, pp. 123-145). 
Bloomington, Indiana: ERIC Clearinghouse for Social Studies/ Social Science 
Education. 
Saye, J. W., & Brush, T. (2007). Using technology-enhanced learning environments to 
support problem-based historical inquiry in secondary school classrooms. Theory 
and Research in Social Education, 35(2), 196-230. 
Scheurman, G. (1998). From behaviorist to constructivist teaching. Social Education, 62, 
6-9. 
Schroeder, J. L., Braden, J. P., & King, B. (2001). Standards and scoring criteria for 
assessment tasks and student performance. Madison: Research Institute on 
Secondary Education Reform for Youth with Disabilities. 
Schug, M. C. (2003). Teacher-centered instruction:  The Rodney Dangerfield of social 
studies. In J. S. Leming, L. Ellington & K. Porter (Eds.), Where did social studies 
go wrong? Washington, D.C.: Thomas B. Fordham Foundation. 
Seixas, P. (2001). Review of research on social studies. In V. Richardson (Ed.), 
Handbook of research on teaching. Washington, D.C.: American Educational 
Research Association. 
Sizer, T. R. (1984). Horace's compromise. Boston: Houghton Mifflin. 
 
256 
 
Smith, J., & Niemi, R. G. (2001). Learning history in school:  The impact of course work 
and instructional practices on achievement. Theory and Research in Social 
Education, 29(1), 18-42. 
Social Studies Inquiry Research Collaborative. (2011). Authentic pedagogy:  Examining 
intellectual challenge in a national sample of social studies classrooms. Paper 
presented at the Annual Meeting of the American Education Research Association 
Conference, New Orleans, LA. 
Stecher, B. M. (2002). Consequences of large-scale, high-stakes testing on school and 
classroom practice. In L. S. Hamilton, B. M. Stecher & S. P. Klein (Eds.), Making 
sense of test-based accountability in education. Santa Monica, CA: RAND. 
Stern, S. M., Chesson, M., Klee, M. B., & Spoehr, L. (2003). Effective state standards for 
U.S. History:  A 2003 report card: Thomas B. Fordham Institute. 
Stevens, J. (2002). Applied multivariate statistics for the social sciences (4th ed.). 
Mahwah, NJ: Erlbaum. 
Stewart, B. E. (2006). Value added modeling:  The challenge of measuring educational 
outcomes. New York: Carnegie Corporation of New York. 
Stewart, R. A., & Brendefur, J. L. (2005). Fusing lesson study and authentic 
achievement:  A model for teacher collaboration. Phi Delta Kappan, 681-687. 
Stiggins, R. J., & Conklin, N. F. (1992). In teachers' hands:  Investigating the practices 
of classroom assessment. New York: State University of New York Press. 
 
257 
 
Symcox, L. (2002). Whose history?  The struggle for national standards in American 
classrooms. New York, NY: Teacher's College Press. 
Taba, H. (1966). Teaching strategies and cognitive functioning in elementary school 
children (Cooperative Research Project No. 2404). San Francisco: San Francisco 
State College. 
Taba, H., Levine, S., & Elzey, F. F. (1964). Thinking in elementary school children 
(Cooperative Research Project No. 1574). San Francisco, CA: San Francisco 
State College. 
Terwilliger, J. S. (1997). Semantics, psychometrics, and assessment reform:  A close look 
at "authentic" assessments. Educational Researcher, 26(8), 24-27. 
Terwilliger, J. S. (1998). Rejoinder:  Response to Wiggins and Newmann. Educational 
Researcher (August-September), 22-23. 
Thompson, S. (2001). The authentic standards movement and its evil twin. Phi Delta 
Kappan, 82(5). 
Thornton, S. J. (1991). Teacher as curricular-instructional gatekeeper in social studies. In 
J. P. Shaver (Ed.), Handbook of research on social studies teaching and learning 
(pp. 237-248). New York: Macmillan. 
Torney-Purta, J. (2002). The school's role in developing civic engagement:  A study of 
adolescents in twenty-eight countries. Applied Developmental Science, 6(4), 203-
212. 
 
258 
 
Trochim, W. M. (2006). The research methods knowledge base.  2nd edition. from 
<http://www.socialresearchmethods.net/kb/> (version current as of October 20, 
2006). 
U.S. Census Bureau. (2000a). Profile of selected economic characteristics.  Retrieved 
Sept. 15, 2009, from http://censtats.census.gov/data/AL/1600103076.pdf 
U.S. Census Bureau. (2000b). State and county quickfacts.   Retrieved September 15, 
2009, from http://quickfacts.census.gov/qfd/states/01/0103076.html 
VanSickle, R. L., & Hoge, J. D. (1991). Higher cognitive thinking skills in social studies:  
Concepts and critiques.  Theory and Research in Social Education, 19(2), 152-
172. 
VanSledright, B. A. (2002). Fifth graders investigating history in the classroom:  Results 
from a researcher-practioner design experiment. Elementary School Journal, 102, 
131-160. 
Vygotsky, L. S. (1978). Mind in society:  The development of higher psychological 
processes. Cambridge, MA: Harvard University Press. 
Wallen, N. E., & Travers, R. M. W. (1963). Analysis and investigation of teaching 
methods. In N. L. Gage (Ed.), Handbook of research on teaching. Chicago, IL: 
Rand McNally & Company. 
Wallis, C., & Steptoe, S. (2006, December 18). How to bring our schools out of the 20th 
century. Time, 50-56. 
 
259 
 
Wenzel, S., Nagaoka, J. K., Morris, L., Billings, S., & Fendt, C. (2002). Documentation 
of the 1996-2002 Chicago Annenberg research strand on authentic intellectual 
demand exhibited in assignments and student work:  A technical process manual. 
Chicago: Consortium of Chicago School Research. 
Whitehead, A. N. (1929). The aims of education and other essays. New York: 
Macmillan. 
Wiggins, G. (1989). A true test:  Toward more authentic and equitable assessment. Phi 
Delta Kappan, 70(9), 703-713. 
Wiggins, G. (1993a). Assessing student performance. San Francisco, CA: Jossey-Bass. 
Wiggins, G. (1993b). Assessment to improve performance, not just monitor it:  
Assessment reform in the social sciences. Social Science Record(Fall), 5-12. 
Williams, J. M. (1981). A comparison study of the effects of inquiry and traditional 
teaching procedures on student attitude, achievement, and critical-thinking ability 
in eleventh grade United States history. (Unpublished doctoral dissertation). 
Auburn University, Auburn, AL. 
Williamson, J. L. (1966). The effectiveness of two approaches to the teaching of high 
school American history. (Unpublished doctoral dissertation).  University of 
North Texas, Denton, TX. 
 
260 
 
Windschitl, M. (2002). Framing constructivism in practice as the negotiation of 
dilemmas:  An analysis of the conceptual, pedagogical, cultural, and political 
challenges facing teachers. Review of Educational Research, 72(2), 131-175. 
Wineburg, S. (1991). On the reading of historical texts:  Notes on the breach between 
school and academy. American Educational Research Journal, 28, 495-519. 
Wineburg, S. (2001). Historical thinking and other unnatural acts:  Charting the future 
of teaching the past. Philadelphia: Temple University Press. 
Womack, J. A. (1969). An analysis of inquiry-oriented high school geography project 
urban materials. (Unpublished doctoral dissertation), United States International 
University, San Diego, CA. 
Yost, D. E. (1972). The effect of two instructional methods on achievement, critical 
thinking, and study habits and attitudes in tenth grade American government 
classes. (Unpublished doctoral dissertation). University of Northern Colorado, 
Greeley, CO. 
Young, K. M., & Leinhardt, G. (1998). Writing from primary documents:  A way of 
knowing history. Written Communication, 15, 25-68. 
 
 
 
 
 
 
 
261 
 
 
 
 
 
 
 
 
 
 
 
 
APPENDICES 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262 
 
Appendix A:  Teacher Interview Script 
 
 
Teacher Background Data:   
 
1. What is your gender? 
 
  ?     Male 
  ?     Female 
 
2. What is your age? 
 
  ?    25 or less 
             ?    between 26 and 35 
  ?    between 36 and 45 
  ?    between 46 and 55 
  ?    greater than 55 
 
3. What is your ethnicity? 
 
  ??? African American 
  ??? Asian American 
  ??? Latino/Hispanic American 
  ??? Native American 
  ?  ?White (other than Latino) 
  ??? Other/No Response 
 
4. How many years have you been a teacher? 
 
  ?    2 years or less 
  ?    between 3 and 5 years 
  ?    between 6 and 10 years 
  ?    between 11 and 15 years 
  ?    greater than 15 years 
 
(5)  How many years have you been teaching at this particular school?     
 
(6)  How many years have you been teaching a course that is offered at the grade level in 
       which the state-mandated exam is administered?   
 
(7)  Are you National Board certified?         Highly Qualified?   
 
 
(8)  What is your schedule of classes?   
 
 
 
263 
 
 
Teacher Background data (continued) 
 
(9)  What is the highest degree you attained?   
 
(10)  What was your major in college?   Did you have a concentration in one or more of  
         the social science disciplines?  Did you have some type of alternative certification  
         (i.e. 5th year program, Teach for America, etc.)?   
      
Challenging Tasks 
 
(Partnership for 21st Century Skills)  Why do you view these tasks as the most 
challenging for students? 
  
(Partnership for 21st Century Skills)   Can you provide a description of what you want 
students to do for these tasks?   
 
        Task 1:   
 
 Goal(s) for student learning: 
 Where does the assignment fall in the semester (context)?   
 Is this assignment modified in any way for other blocks?  Do all blocks do the   
            assignment? 
 
        Same questions for Tasks 2 and 3. 
 
Graduation Exam 
 
(Partnership for 21st Century Skills)  Do you incorporate activities that are explicitly 
focused on preparation for a high- 
       stakes exam? 
        
(Partnership for 21st Century Skills)  What types of materials do you use to prepare 
students for the graduation exam   
       (i.e. practice test booklets, computer drill & practice programs, transparencies or       
        powerpoints with specific questions, etc.)? 
  
(3)  Provide an estimate of the total amount of time you spend each semester on 
      graduation exam preparation.   
 
___No more than 1 day 
___2 to 4 days 
___1 week 
___Over 1 to 3 weeks 
___1 to 2 months 
___Over 2 to 3 months 
 
264 
 
___Over 3 to 6 months 
___Over 6 months 
 
(4)  Does your graduation exam preparation vary from class to class?  Which classes 
       receive more explicit preparation?  Why? 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
265 
 
Appendix B:  Teacher Recruitment Script 
 
Hello, my name is Lamont Maddox, and I am a doctoral student from Auburn University.  
I would like to invite you to participate in a study that is focused on the following 
question: How does the kind of learning that students experience in their social 
studies classes affect their performances on both low and high order assessments?    
I am trying to gain a better understanding of what works to improve performance for 
different groups of students (i.e. low SES, male/female, ethnicity, etc.).  You are being 
asked to participate because you teach a social studies course on either the 9th or 10th 
grade level.   
 
 As part of this study, I would like to analyze the nature of the instructional experiences 
tenth grade students encounter during the social studies classes they take just prior to   
their first attempt on the Alabama High School Graduation Exam (AHSGE).  Each 
teacher takes a unique approach to his/her instruction.  I want to use three classroom 
observations and an analysis of three challenging assignments in each class as a means to 
better understand the social studies experiences that Auburn students have.  Additional 
information regarding the context of these assignments will be gained through a brief 
interview.  I would like to emphasize that I am not developing a rating that judges one 
form of instruction as better than another.   I am cataloging the type of work students are 
asked to do in their classes.  A one size fits all approach may very well not work will all 
students.  I want to see which instructional experiences produce results with which 
groups of students. 
 
Student achievement data for this study is coming from two sources.  One source is the 
AHSGE.  The other source is an essay being piloted by the school system.  This essay is 
being used as the higher order assessment of what students are able to retain and do 
following a previous unit of instruction.  Participating teachers will teach and assess the 
designated focus unit as they always do.  The essay will be administered at a later date 
and then the school system will provide the essays to me for independent scoring.  
Student names will be removed from all student data and replaced with code numbers 
prior to its delivery to me. 
 
This study will be conducted during the Spring 08, Fall 08, and Spring 09 semesters.  
However, the test of higher order thinking will only be administered to one group of tenth 
graders during the Spring 08 semester.  Therefore, all teacher participation should be 
complete prior to the end of this school year.  I estimate that the study will only require a 
couple of hours of your time beyond the scope of your normal day to day responsibilities.  
If you agree to participate, the study will have a minimal impact on the regular instruction 
you provide your classes.  Throughout this process, teacher confidentiality will be 
protected.  I will not link teacher names to data, but will describe how various classroom 
experiences are correlated to student outcomes.  I will also track student experiences over 
multiple years (9th & 10th grade) rather than with a single teacher.  This will provide 
further support in maintaining teacher confidentiality. 
 
 
266 
 
 The results of this study could help you to make decisions about how to reach the needs 
of all of your students.  I hope you will join me in this study.  Any questions? 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
267 
 
Appendix C:  Scoring Criteria for Classroom Instruction 
 
Scoring instructions:  To determine scores for the four standards, follow the technical scoring criteria as outlined in the tips below.  Consider the descriptions for scores 
1-5 on each standard to constitute the minimum criteria for that score.  If you find yourself between scores, make the decision by asking whether the minimum conditions 
of the higher score have been met.  If not, use the lower score.  In determining scores for each standard, the observer should consider only the evidence observed during 
the lesson observation.  ?Many? students refers to at least 1/3 of the students in a class; ?most? refers to more than half; ?almost all? is not specified numerically, but 
should be interpreted as ?all but a few.? 
 
Date:____________   Class Observed:______________________   Observer:_______________ 
 
Score HOTS 
To what extent do students use lower order thinking processes?  To 
what extent do students use higher order thinking processes? 
 
Lower Order                                                               Higher Order 
Thinking Only               1       2      3      4      5          Thinking Only 
Deep Knowledge 
To what extent is knowledge deep?  To what extent is knowledge shallow and 
superficial? 
 
    
Knowledge is shallow          1    2    3    4    5           Knowledge is deep 
5 Almost all students, almost all of the time, are performing HOT. Knowledge is very deep because the teacher successfully structures the lesson so that 
almost all students sustain a focus on a significant topic and do at least one of the 
following: demonstrate their understanding of the problematic nature of information 
and/or ideas; demonstrate complex understanding by arriving at a reasoned, 
supported conclusion; or explain how they solved a complex problem. In general, 
students' reasoning, explanations and arguments demonstrate fullness and complexity 
of understanding. 
4 Students are engaged in at least one major activity during the lesson 
in which they perform HOT operations, and this activity occupies a 
substantial portion (at least 1/3) of the lesson and many students 
are performing HOT. 
Knowledge is relatively deep because either the teacher or the students provide 
information, arguments or reasoning that demonstrate the complexity of an important 
idea. The teacher structures the lesson so that many students sustain a focus on a 
significant topic for a period of time and do at least one of the following: 
demonstrate their understanding of the problematic nature of information and/or 
ideas; demonstrate understanding by arriving at a reasoned, supported conclusion; or 
explain how they solved a relatively complex problem. 
3 Students are primarily engaged in routine LOT operations a good 
share of the lesson. There is at least one significant question or 
activity in which some students perform some HOT operations. 
Knowledge is treated unevenly during instruction; i.e., deep understanding of 
something is countered by superficial understanding of other ideas. At least one 
significant idea may be presented in depth and its significance grasped, but in general 
the focus is not sustained. 
2 Students are primarily engaged in LOT, but at some point they 
perform HOT as a minor diversion within the lesson. 
Knowledge remains superficial and fragmented; while some key concepts and ideas 
are mentioned or covered, only a superficial acquaintance or trivialized 
understanding of these complex ideas is evident. 
1 Students are engaged only LOT operation; i.e., they either receive, 
or recite, or participate in routine practice and in no activities 
during the lesson do students go beyond LOT. 
Knowledge is very thin because it does not deal with significant topics or ideas; 
teacher and students are involved in the coverage of simple information which they 
are to remember. 
 
 
 
268 
 
Scoring Criteria for Classroom Instruction (Cont.) 
 
Score Substantive Conversation 
 
To what extent is classroom discourse devoted to creating or 
negotiating understandings of subject matter? 
 
no substantive                                       high level 
conversation           1   2   3   4   5         substantive conversation 
Connectedness to the Real World 
 
To what extent is the lesson, activity, or task connected to competencies or concerns 
beyond the classroom? 
 
no connection        1   2   3   4   5        connected 
5 All features of substantive conversation occur, with at least one 
example of sustained conversation, and almost all students 
participate. 
Students study or work on a topic, problem or issue that the teacher and students see as 
connected to their personal experiences or actual contemporary or persistent public 
issues. Students recognize the connection between classroom knowledge and 
situations outside the classroom. They explore these connections in ways that create 
personal meaning and significance for the knowledge. This meaning and significance 
is strong enough to lead students to become involved in an effort to affect or influence 
a larger audience beyond their classroom in one of the following ways: by 
communicating knowledge to others (including within the school), advocating 
solutions to social problems, providing assistance to people, creating performances or 
products with utilitarian or aesthetic value. 
4 All features of substantive conversation occur, with at least one 
example of sustained conversation, and many students participate 
in some substantive conversation (even if not part of the sustained 
conversation). 
Students study or work on a topic, problem or issue that the teacher and students see as 
connected to their personal experiences or actual contemporary or persistent public 
issues. Students recognize the connection between classroom knowledge and 
situations outside the classroom. They explore these connections in ways that create 
personal meaning and significance for the knowledge. However, there is no effort to 
use the knowledge in ways that go beyond the classroom to actually influence a larger 
audience. 
3 Substantive Conversation Feature # 2 (sharing) and/or #3 
(coherent promotion of collective understanding) occur and 
involve at least one example of sustained conversation (i.e., at 
least 3 consecutive interchanges). 
Students study a topic, problem or issue that the teacher succeeds in connecting to 
students' actual experiences or to actual contemporary or persistent public issues. 
Students recognize some connection between classroom knowledge and situations 
outside the classroom, but they do not explore the implications of these connections 
which remain abstract or hypothetical. There is no effort to actually influence a larger 
audience. 
2 Substantive Conversation Feature # 2 (sharing) and/or # 3 
(coherent promotion of collective understanding) occur briefly 
and involve at least one example of two consecutive interchanges. 
Students encounter a topic, problem or issue that the teacher tries to connect to 
students' experiences or to actual contemporary or persistent public issues; i.e., the 
teacher informs students that there is potential value in the knowledge being studied 
because it relates to the world beyond the classroom. For example, students are told 
that understanding Middle East history is important for politicians trying to bring 
peace to the region; however, the connection is weak and there is no evidence that 
students make the connection. 
1 Virtually no features of substantive conversation occur during the 
lesson. 
Lesson topic and activities have no clear connection to anything beyond itself; the 
teacher offers no justification beyond the need to perform well in class. 
 
 
 
269 
 
Appendix D:  Scoring Tips for Instruction Rubric 
 
 
Tips for Scoring HOTS 
 
? Lower order thinking (LOT) occurs when students are asked to receive or 
recite factual information or to employ rules and algorithms through repetitive 
routines.  As information receivers, students are given pre-specified 
knowledge ranging from simple facts and information to more complex 
concepts.  Such knowledge is conveyed to students through a reading, work 
sheet, lecture or other direct instructional medium.  Students are not required 
to do much intellectual work since the purpose of the instructional process is 
to simply transmit knowledge or to practice procedural routines.  Students are 
in a similar role when they are reciting previously acquired knowledge; i.e., 
responding to test-type questions that require recall of pre-specified 
knowledge.  More complex activities still may involve LOT when students 
only need to follow pre-specified steps and routines or employ algorithms in a 
rote fashion. 
? Higher order thinking (HOT) requires students to manipulate information and 
ideas in ways that transfer their meaning and implications. This 
transformation occurs when students combine facts and ideas in order to 
synthesize, generalize, explain, hypothesize or arrive at some conclusion or 
interpretation. Manipulating information and ideas through these processes 
allows students to solve problems and discover new (for them) meanings and 
understandings. 
? When students engage in HOT, an element of uncertainty is introduced into 
the instructional process and makes instructional outcomes not always 
predictable; i.e., the teacher is not certain what will be produced by students.  
In helping students become producers of knowledge, the teacher?s main 
instructional task is to create activities or environments that allow them 
opportunities to engage in HOT. 
 
 
 
Tips for Scoring Deep Knowledge 
 
? Knowledge is shallow, thin or superficial when it does not deal with 
significant concepts or central ideas of a topic or discipline. Knowledge is 
also shallow when important, central ideas have been trivialized, or when it is 
presented as non-problematic.  Knowledge is thin when students? 
understanding of important concepts or issues is superficial such as when 
ideas are covered in a way that gives them only a surface acquaintance with 
their meaning.  This superficiality can be due, in part, to instructional 
strategies such as when teachers cover large quantities of fragmented ideas 
and bits of information that are unconnected to other knowledge. 
? Evidence of shallow understanding by students exists when they do not or can 
not use knowledge to make clear distinctions, arguments, solve problems and 
develop more complex understanding of other related phenomena. 
? Knowledge is deep or thick when it concerns the central ideas of a topic or 
discipline and because such knowledge is judged to be crucial to a topic or 
discipline.  
? For students, knowledge is deep when they develop relatively complex 
understandings of these central concepts. Instead of being able to recite only 
fragmented pieces of information, students develop relatively systematic, 
integrated or holistic understanding. Mastery is demonstrated by their success 
in producing new knowledge by discovering relationships, solving problems, 
constructing explanations, and drawing conclusions. 
? In scoring this item, observers should note that depth of knowledge and 
understanding refers to the substantive character of the ideas that the teacher 
presents in the lesson, or to the level of understanding that students 
demonstrate as they consider these ideas.  It is possible to have a lesson that 
contains substantively important, deep knowledge, but students do not 
become engaged or they fail to show understanding of the complexity or the 
significance of the ideas.  Observers? ratings can reflect either the depth of the 
teacher?s knowledge or the depth of understanding that students develop of 
that content. 
 
 
 
 
 
 
 
 
270 
 
 
Tips for Scoring Substantive Conversation 
 
? This scale measures the extent of talking to learn and to understand in the 
classroom.  There are two dimensions to this construct:  one is the substance 
of subject matter, and the other is the character of dialogue.  
? In classes where there is little or no substantive conversation, teacher-student 
interaction typically consists of a lecture with recitation where the teacher 
deviates very little from delivering a preplanned body of information and set 
of questions; students typically give very short answers.  Because the 
teacher?s questions are motivated principally by a preplanned checklist of 
questions, facts, and concepts, the discourse is frequently choppy, rather than 
coherent; there is often little or no follow-up of student responses.  Such 
discourse is the oral equivalent of fill-in-the-blank or short-answer study 
questions. 
? In classes characterized by high levels of substantive conversation there is 
considerable teacher-student and student-student interaction about the ideas of 
a topic; the interaction is reciprocal, and it promotes coherent shared 
understanding. (Partnership for 21st Century Skills) The talk is about subject 
matter in the discipline and includes higher order thinking such as making 
distinctions, applying ideas, forming generalizations, raising questions; not 
just reporting of experiences, facts, definitions, or procedures. (Partnership for 
21st Century Skills) The conversation involves sharing of ideas and is not 
completely scripted or controlled by one party (as in teacher-led recitation). 
Sharing is best illustrated when participants explain themselves or ask 
questions in complete sentences, and when they respond directly to comments 
of previous speakers. (3) The dialogue build coherently on participants' ideas 
to promote improved collective understanding of a theme or topic (which 
does not necessarily require an explicit summary statement). In short, 
substantive conversation resembles the kind of sustained exploration of 
content characteristic of a good seminar where student contributions lead to 
shared understandings. 
? To recognize sustained conversations, we define an interchange as a statement 
by one person and a response by another.  Interchanges can occur between 
teacher and student or student and student.  Sustained conversation is defined 
as at least three consecutive interchanges.  The interchanges need not be 
between the same two people, but they must be linked substantively as 
consecutive responses.  Consecutive responses should demonstrate sensitivity 
either by responding directly to the ideas of another speaker or by making an 
explicit transition that shows the speaker is aware he/she is shifting the 
conversation. Substantive conversation includes the 3 features described 
above.  Each of the features requires interchange between two or more people. 
None can be illustrated through monologue by one person. 
 
 
Tips for Scoring Value Beyond School 
 
? This scale measures the extent to which the class has value and meaning 
beyond the instructional context. In a class with little or no value beyond, 
activities are deemed important for success only in school (now or later), but 
for no other aspects of life. Student work has no impact on others and serves 
only to certify their level of competence or compliance with the norms and 
routines of formal schooling. 
? A lesson gains in authenticity the more there is a connection to the larger social 
context within which students live. Two areas in which student work can 
exhibit some degree of connectedness are: (a) a real world public problem; i.e., 
students confront an actual contemporary or persistent issue or problem, such 
as applying statistical analysis in preparing a report to the city council on the 
homeless. (b) students' personal experiences; i.e., the lesson focuses directly or 
builds upon students' actual experiences or situations. High scores can be 
achieved when the lesson entails one or both of these. 
 
 
 
271 
 
Appendix E:  Scoring Criteria for Tasks 
 
General Rules 
The main point here is to estimate the extent to which successful completion of the task requires the kind of cognitive work indicated by each of the 
three standards: Construction of Knowledge, Elaborated Communication, and Connections to Students? Lives.  Each standard will be scored according 
to different rules, but the following apply to all three standards. 
? If a task has different parts that imply different expectations (e.g., worksheet/short answer questions and a question asking for explanations of some conclusions), 
the score should reflect the teacher?s apparent dominant or overall expectations.  Overall expectations are indicated by the proportion of time or effort spent on 
different parts of the task and criteria for evaluation, if stated by the teacher. 
? Take into account what students can reasonably be expected to do at the grade level. 
? When it is difficult to decide between two scores, give the higher score only when a persuasive case can be made that the task meets minimal criteria for the higher 
score. 
? If the specific wording of the criteria is not helpful in making judgments, base the score on the general intent or spirit of the standard described in the tips for scoring 
a particular AIW standard. 
 
 Construction of Knowledge Elaborated Communication Connection to Students? Lives 
4 N/A Analysis / Persuasion / Theory.  Explicit call for 
generalization AND support.  The task requires 
explanations of generalizations, classifications and 
relationships relevant to a situation, problem, or 
theme, AND requires the student to substantiate 
them with examples, summaries, illustrations, 
details, or reasons.  Examples include attempts to 
argue, convince or persuade and to develop and 
test hypotheses. 
N/A 
3 The task?s dominant expectation is for students 
to interpret, analyze, synthesize, or evaluate 
information, rather than merely to reproduce 
information. To score high the task should call 
for interpretation of nuances of a topic that go 
deeper than surface exposure or familiarity.   
Report / Summary.  Call for generalization OR 
support. The task asks students either to draw 
conclusions or make generalizations or arguments, 
OR to offer examples, summaries, illustrations, 
details, or reasons, but not both. 
The question, issue, or problem clearly resembles one that students 
have encountered or might encounter in their lives. The task explicitly 
asks students to connect the topic to experiences, observations, 
feelings, or situations significant in their lives. 
 
2 There is some expectation for students to 
interpret, analyze, synthesize, or evaluate 
information, rather than merely to reproduce 
information. 
Short-answer exercises.  The task or its parts can 
be answered with only one or two sentences, 
clauses, or phrasal fragments that complete a 
thought.   
 
 
The question, issue, or problem bears some resemblance to one that 
students have encountered or might encounter in their lives, but the 
connections are not immediately apparent.  The task offers the 
opportunity for students to connect the topic to experiences, 
observations, feelings, or situations significant in their lives, but does 
not explicitly call for them to do so. 
1 There is very little or no expectation for students 
to interpret, analyze, synthesize, or evaluate 
information.  The dominant expectation is that 
students will merely reproduce information 
gained by reading, listening, or observing. 
Fill-in-the-blank or multiple choice exercises.  
 
 
The problem has virtually no resemblance to questions, issues, or 
problems that students have encountered or might encounter in their 
lives.  The task offers very minimal or no opportunity for students to 
connect the topic to experiences, observations, feelings, or situations 
significant in their lives. 
 
 
272 
 
Appendix F:  Scoring Tips for Task Rubric 
 
Tips for Scoring Construction of Knowledge 
? The task asks students to organize and 
interpret information in addressing a concept, 
problem, or issue. 
? Consider the extent to which the task asks the 
student to organize, interpret, evaluate, or 
synthesize complex information, rather than 
to retrieve or to reproduce isolated fragments 
of knowledge or to repeatedly apply 
previously learned procedures.  To score high 
the task should call for interpretation of 
nuances of a topic that go deeper than surface 
exposure or familiarity.  Nuanced 
interpretation often requires students to read 
for subtext and make inferences. Possible 
indicators of interpretation may include (but 
are not limited to) tasks that ask students to 
consider alternative solutions, strategies, 
perspectives and points of view.     
? These indicators can be inferred either 
through explicit instructions from the teacher 
or through a task that cannot be successfully 
completed without students doing these 
things. 
 
Tips for Scoring Elaborated 
Communication 
? The task asks students to elaborate on 
their understanding, explanations, or 
conclusions on important social studies 
concepts. 
? Consider the extent to which the task 
requires students to elaborate on their 
ideas and conclusions. 
 
Tips for Scoring Connection to Students? Lives 
? The task asks students to address a concept, 
problem or issue that is similar to one that 
they have encountered or are likely to 
encounter in life outside of school. 
? Consider the extent to which the task 
presents students with a question, issue, or 
problem that they have actually encountered 
or are likely to encounter in their lives.  
Defending one?s position on compulsory 
community service for students could qualify 
as a real world problem, but describing the 
origins of World War II generally would not. 
? Certain kinds of school knowledge may be 
considered valuable in social, civic, or 
vocational situations beyond the classroom 
(e.g., knowing how a bill becomes a law).  
However, task demands for ?basic? 
knowledge will not be counted here unless 
the task requires applying such knowledge to 
a specific problem likely to be encountered 
beyond the classroom. 
 
 
 
 
 
 
 
 
 
 
273 
 
Appendix G:  Email Correspondence Request for Tasks 
 
Attn:  9-10th grade SS faculty 
 
I recently had the opportunity to meet with each of your regarding a research study to 
determine what works to improve student learning outcomes in social studies.  I 
appreciate your willingness to listen to my presentation and take part in this study.   I 
have listed below the data that I'd like to collect from you this semester. 
 
Part I: 
 
Please send me copies of three student assignments or assessments that you feel best 
indicate how well students understand your subject at a high level.  So that I am clear on 
what you are asking your students to do, please include any materials necessary to help 
me understand the tasks and how they fit into the rest of your course.  Please select tasks 
that relate to an instructional unit or a single lesson rather than midterm or final exams.    
 
For each assignment, provide a general indication of when it will take place in your 
classroom this semester. 
 
This information can be sent via email.  Contact me if you have hard copies that you'd 
like me to pick-up. 
 
Part II: 
 
I would like to observe a class that is associated with each task that you provide.  I would 
like to observe the class that gives me the most insight into what students have done to 
prepare for each task.  Once I get your tasks, I'll contact each of you to set up an 
observation schedule. 
 
Please try to have the three student assignments or assessments to me no later than 
February 15.  Coordinate with me if this is an unrealistic deadline for some reason.  It is 
important to get this process started as soon as possible since we will undoubtedly be 
running into scheduling conflicts for observations (graduation exam, etc.).  For those of 
you with interns, go ahead and provide the tasks this semester.  Depending on your 
intern's teaching schedule, we may need to schedule the observations in the Fall.   
 
Let me know if you have any questions.   Thanks again for your participation in this 
study. 
 
Lamont E. Maddox 
 
 
 
 
 
 
274 
 
Appendix H:  U.S. History Higher Order Assessment Resources 
 
During the 1800s, the United States greatly expanded its territory through treaties, land 
purchases, and the use of force.  Many Americans justified this expansion by saying that 
the United States had a ?Manifest Destiny? to control all the land from the Atlantic to the 
Pacific.  One of the main periods when the United States considered the idea of Manifest 
Destiny was during the Mexican-American War.  The timeline below includes some 
important events in the relationship between the United States and Mexico during this 
time.  Use it as a resource to assist you in thinking about the causes of the Mexican-
American War and whether the U.S. was justified or wrong to declare war on Mexico. 
 
1820s 
 1821  -Mexico gains independence from Spain  
1823                 -American citizens migrate to the Mexican territory of 
                          Texas to become Mexican citizens and obtain cheap land  
1823                 -Over the next few decades, Mexico?s government is 
                           unstable and weak as different groups fight for control  
1830s 
 1835  -The Mexican central government attempts to exert more  
                                      control over its territories (including Texas) 
 1835  -Texas declares independence from Mexico 
 1836  -Battle of the Alamo  
   -Mexican forces led by President Santa Anna are defeated 
at San Jacinto. Santa Anna signs a treaty granting Texas 
independence. 
   -Mexican Congress refuses to recognize the treaty 
 1836-1845 - U.S and some European nations recognize Texas 
                                       independence 
       
1840s 
 1845  -Texas becomes part of the United States.  Mexican 
                                       Ambassador leaves the United States in protest   
   -Mexico rejects Texas? independence and its annexation by U.S. 
-President Polk sends John Slidell to Mexico to try to purchase 
  New Mexico and California and to address problems between the 
  two nations. Mexican authorities refuse to meet with him  
 Mar. 1846 -U.S. military forces enter territory claimed by both the 
                                      United States and Mexico along the Rio Grande 
 Apr. 1846  -Mexican forces cross the Rio Grande and enter the  
                                      disputed territory.  U.S. and Mexican forces clash 
 May 1846  -U.S. declares war on Mexico 
 1846-1848  -Mexican-American War 
            1848                 -U.S. wins the war. U. S. gains California, New Mexico,                        
                                      and other territories from Mexico as part of peace treaty 
 
275 
 
Source Documents 
 
The following documents offer opposing views about Manifest Destiny and the 
Mexican-American war.  Use the information from these documents and the 
timeline to decide whether the United States was justified in its war with Mexico. 
 
Document 1:  (Boston Times, October 22, 1847) 
 
The ?conquest? [of Mexico] which carries peace into a land where military force is the 
usual basis for resolving conflict between competing groups, which establishes the reign 
of law where lawlessness has existed for a generation; which provides for the education 
and elevation of the great mass of the people, who have, for a period of 300 years been 
the slaves of an overbearing foreign race [the Spanish], and which causes religious 
liberty, and full freedom of mind to prevail where a [Catholic] priesthood has long been 
enabled to prevent all [other] religion, - such a ?conquest? should be characterized as 
work worthy of a great people, of a people who are about to regenerate the world by 
asserting the supremacy of humans to decide their own fate [as opposed to their fate 
being decided by kings or dictators].  
 
 
Document 2:  Albert Gallatin, Peace with Mexico (New York, 1847, pp. 12-14.) 
Gallatin, a Swiss immigrant, served in a number of government positions including the 
House of Representatives and Secretary of the Treasury.   
 
 The people of the United States have been placed by God in a position never 
before enjoyed by any other nation.  They are possessed of a most extensive territory, 
with a very fertile soil, a variety of climates, and a capacity of sustaining a population 
greater . . . than any other territory of the same size on the face of the globe?.  
 America?s mission is, to improve the state of the world, to be the ?Model 
Republic,? to show that men are capable of governing themselves, and that this simple 
and natural form of government is the one that also makes the most people happy, is 
productive of the greatest development of the intellectual faculties, above all, the one that 
develops the highest standard of private and political virtue and morality. 
 In their foreign relations the United States, before this unfortunate war, always 
acted with justice? The use of military force  was always in self-defense?.  
 The allegation that the conquering of Mexico would be the means of enlightening 
the Mexicans, of improving their social state, and of increasing their happiness, is but the 
shallow attempt to disguise unbounded greed and ambition.  Truth never was or can be 
spread by fire and sword, or by any other than purely moral means.   
 
Documents excerpted and paraphrased from: 
Rappaport, A.  (1964). The War with Mexico:  Why did it Happen?   Berkeley:  Rand 
McNally & Company. 
 
 
 
 
276 
 
Appendix I:  U.S. History Higher Order Assessment Instructions 
 
Part I:   Assume the role of a concerned citizen in 1847.  The war with Mexico is nearing 
the end of its first year. U. S. newspapers are full of commentaries about the war.  You 
have decided to write an editorial for one of these newspapers. Using information from 
the timeline, the source documents, and your knowledge of the time period, write a 
persuasive essay that takes a position on whether Manifest Destiny adequately justifies 
going to war with Mexico.  Specifically, is using Manifest Destiny to justify war [in 
Mexico] a violation of American ideals [and therefore wrong] or does pursuing 
Manifest Destiny in Mexico ultimately promote the greater good?   
 
Your editorial should meet the following guidelines:  
 
Requirements: 
1. Your editorial must include a minimum of 4 paragraphs as described below. 
2. Your editorial should use persuasive language and should be written to the readers 
of the newspaper. 
3. Your editorial should be written in the 1st person, plural tense ? ?we?.  For 
example:  I believe the war is just because.... Or  We took the wrong approach 
because?..  
4. Note:  The format provided below is meant to be used as an outline for writing 
your editorial.  Your final paper should be written as one coherent, continuous 
essay.  However, to assist in grading, please identify each part of your editorial by 
using the section headers provided below (i.e. Section I, Section II, etc.). 
 
Editorial Format: 
 
Section I:   Introduction 
Briefly describe the situation between the United States and Mexico. Discuss the 
most important events that contributed to the war.  Use this information to lead to 
a final statement that clearly describes your position on the war as it relates to 
Manifest Destiny (For example: The United States is wrong to use Manifest 
Destiny to go to war with Mexico Or The United States has every right to pursue 
its Manifest Destiny by 
conquering Mexico).   
  
Section II:  Support your argument 
Provide at least two or three distinct reasons to defend your position.  Your 
             reasons should be supported by evidence from the timeline, the source 
             documents, and your knowledge of the time period. Make sure your arguments in 
             this paragraph clearly relate to Manifest Destiny.  
 
 Section III:  Address the arguments of those who disagree with you.   
 Acknowledge the arguments of those who might take an opposing position on this 
      issue.  In doing so, provide two or three distinct reasons your opponents might use  
      to disagree with your point of view.  Cite information from the timeline, the 
 
277 
 
      source documents, and your knowledge of the time period to support this 
      perspective.   
 
  Section IV:  Conclusion 
 Respond to the arguments of your opponents and summarize your most  
             persuasive points.   
 
Part II:   
 
Step out of the role of being a citizen of the 1840s and answer the following question 
based on your own opinion.   
 
    Consider the role of the United States in world affairs today.  Does America  
    still have a special destiny or mission in the world?  If so, what is it and how 
    should it be accomplished? If not, explain why you think it does not.  
 
       
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
278 
 
Appendix J:  Advanced Placement Higher Order Assessment Student Resources 
 
During the 19th century, many liberal nationalists in Germany sought to organize the 
separate German states into one nation-state based on principles of representative 
government.  The ?German Question? of how to define the boundaries of the new Reich 
was one of many problems that made unity and freedom for the German people difficult 
to accomplish.  The timeline below includes some important events that will help answer 
the question: Should the unification of all Germanic peoples within one nation be 
endorsed (supported) by the German people and encouraged by other nations in 
1870?   
     
Early 1800s 
 1814/5   -German Confederation born at Congress of Vienna 
1834 -Custom Union called the Zollverein established 
 
1840s 
Feb. 1848  -Revolution in France; overthrow of the monarchy of King 
                                      Louis-Philippe; proclamation of the creation of the French 
                                      Second Republic 
Mar. 1848  -Uprisings in some German states; granting of 
                                      constitutional reforms in Prussia 
1848/1849    -Revolutions in Italy, Vienna, Budapest, and Prague   
May 1848 -Frankfurt Assembly meets and proposes a plan for the 
                                      unification of Germany; Prussian king refuses to take the 
      crown     
1860s 
 1862   -Bismarck becomes prime minister of Prussia 
 1862   -Bismarck gives the ?Blood and Iron? speech to the Budget 
  Committee of Prussia?s lower parliamentary house 
1864    -Danish-Prussian War     
 1866     -Austro-Prussian War  
1867 -North German Federation formed.   
-The constitution of the North German Confederation     
  serves as a model for that of the German Empire, with 
  which it merged in 1871 
1870s        
 
       June 1870  -Controversy involving the Hohenzollern candidacy for the 
      Spanish thrown 
       July 1870  -Bismarck publishes the edited Ems dispatch 
        July 1870  -Franco-Prussian War 
        Jan.  1871  -Proclamation of the German Empire at Versailles 
        May 1871  -Treaty of Frankfurt ratified between France and Germany 
     Germany annexes Alsace and Lorraine 
 
279 
 
Source Documents 
 
The following documents offer additional information that will help you address the 
assessment question.   
Document 1:  August Bebel Criticizes the Franco-Prussian War and the Annexation of 
Alsace-Lorraine in a Speech before the North German Reichstag (November 26, 1870)  
?.In my opinion, the principle of nationality is a thoroughly reactionary principle. You 
will admit that if we were to apply the principle of nationality in its pure form in Europe, 
there would be no end in sight to war; the peoples? mission would always and exclusively 
be to make war, to work only to make war possible. On the basis of the principle of 
nationality we would have to cede [give away] Poland, return northern Schleswig, get rid 
of South Tyrol and Trento, and relinquish many Slavic-speaking regions; on the other 
hand, we would have to annex [make part of Germany] parts of Switzerland, the 
Netherlands, and Belgium. As I have already mentioned, according to the principle of 
nationality, we would not be able to get out of war. The peoples would tear each other 
apart until the end of time. Nationality means but little; in my view, it has merely a 
secondary importance for the political life of a state. The highest and most fundamental 
idea in the political life of a state must be the internal satisfaction of peoples through their 
institutions, their right to self-determination.  
Translation: Erwin Fink 
?August Bebel. Sein Leben in Dokumenten, Reden und Schriften?, a document by Helmut Hirsch. In 
Forging an Empire: Bismarckian Germany, 1866-1890, edited by James Retallack, volume 4, German 
History in Documents and Images, German Historical Institute, Washington, DC 
(www.germanhistorydocs.ghi-dc.org). 
 
Document 2:  From the Debates in the German National Assembly on the territories to 
be included as part of a German nation-state ? Little Germany or Greater Germany? ? 
1848-9 
 
Context:  The German National Assembly is debating the following alternatives for 
unification:   
"Lesser German Solution":  A united Germany, led by Prussia, without Austria.  
"Greater German Solution": A German state that includes most of the German speaking 
population of Europe.  Some wanted the German speaking territories of the Austrian 
Empire.  Others favored including all of Austria as part of the German nation-state. 
 
Venedy (Representative for Cologne): 
??We have come here, Gentlemen, to constitute Germany?s Unity, and we are met with 
the proposal that we throw a part of Germany out of Germany.  On that day when we 
only discuss this proposition, we will be discussing the division of Germany.  The 
German nation, Gentlemen, has already suffered enough, but she has finally prevailed 
and has sent us here to constitute Germany, and they want us to sell off a part of 
Germany.  I have come here?with the firm decision to stand or to fall with the assembly.  
 
280 
 
But I do not want to sit here a moment longer if Austria is not here too [as a member of 
the new German empire].   
 
Moritz Mohl (Representative for Stuttgart): 
??.We are 40 million Germans; we do not need to fear these scattered little nations.  
There are perhaps five million Czechs:  there are not five million Magyars, still fewer 
Croatians and even fewer Wallachians etc?.All these nations [within Austria] can do no 
disadvantage to German nationality;  it is however of the very greatest importance that 
they combine with Germany, and that with Germany they form a Reich of seventy 
million persons. Gentlemen!  I ask you, when these seventy million people are 
represented in a German parliament, when this parliament through its influence 
nominates the ministers of this great Reich, and when nothing occurs to the disadvantage 
of this great Reich of seventy million people; I ask you, which power in Europe, even 
Russia with her sixty-six millions, or France with her thirty-six millions, which power in 
Europe will be powerful enough to challenge this great Reich?  I ask, whether this 
German Reich is then not in a condition to dictate war and peace to the whole world; I 
ask you, to consider this?.   Gentlemen!  This thought about the entry of the whole of 
Austria within the German Federal State; I beg you to fix your eyes on this thought, 
telling yourselves that it removes every difficulty?.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
281 
 
Appendix K:  Advanced Placement Higher Order Assessment Instructions 
 
Part I:   In 1870, Germany successfully defeated France in the Franco-Prussian War.  
One outcome of this war was a renewed effort to create a unified nation-state for all 
Germans.  The topic of unification was discussed in parliamentary proceedings, 
newspapers, and official correspondence between statesmen.  The debate over German 
unification raised broader issues of how a nation?s boundaries should be drawn.   
 
Assume the role of a German citizen in 1870.  You have decided to write an editorial for 
a German newspaper on the issue of unification.  Remember, it is 1870 and the ultimate 
solution of 1871 has not yet been decided.  The editorial should reflect your judgment of 
what the best solution should be.  Using information from the timeline, the source 
documents, and your knowledge of the time period, write a persuasive essay that takes a 
position on the following question:  Should the unification of all Germanic peoples 
within one nation be endorsed (supported) by the German people?  Would other 
nations likely support it?  In framing your response, consider a number of factors such 
as the potential military, political, social, and economic consequences of unification.   
 
Your editorial should meet the following guidelines: 
 
Requirements: 
 
 1.  Your editorial must include a minimum of 4 paragraphs as described below. 
 2.  Your editorial should use persuasive language and should be written to the  
              readers of the newspaper. 
 3.  Your editorial should be written in the 1st person, plural tense ? ?we?.  For 
      example:  I believe other nations should encourage unification because? or  
      We should not support unification because?  
 4.  Note:  The format provided below is meant to be used as an outline for writing 
      your editorial.  Your final paper should be written as one coherent, continuous 
      essay.  However, to assist in grading, please identify each part of your editorial 
      by using the section headers provided below (i.e. Section I, Section II, etc.). 
 
Editorial Format: 
 
 Section I:  Introduction  
 Briefly describe the most significant events in the road to German unification up 
 to this point (1870).  In doing so, remember to assume the perspective of a citizen 
            of this time period who is not aware of the actual events to come regarding the 
            unification of Germany.  Use this information to lead to a final statement that  
            clearly describes your position on German unification (For example: The 
            unification of all Germanic peoples within one nation should be endorsed by the 
            German people because?.or Germans should oppose unification of all Germanic 
            peoples within one nation because ?.).     
 
 
 
282 
 
 Section II:  Support your argument 
 Provide at least two or three distinct reasons to defend your position.  Your  
 reasons should be supported by evidence from the timeline, the source documents,  
 and your knowledge of the time period.   
 
 Section III:  Address the arguments of those who disagree with you 
 Acknowledge the arguments of those who might take an opposing position on this 
 issue.  In doing so, provide two or three distinct reasons your opponents might use 
 to disagree with your point of view.  Cite information from the timeline, the 
             source documents, and your knowledge of the time period to support this  
             perspective. 
 
 Section IV:  Conclusion 
 Respond to the arguments of your opponents and summarize your most  
 persuasive points. 
 
Part II:  Step out of the role of being a citizen of the 1870s and answer the following 
question based on your own opinion. 
 
Some policy makers today support the formation of nation-states based on common 
ethnic, cultural, or religious identities as a way to stop violence in regions around the 
world (i.e. a Palestinian state; separate Kurd, Sunni, & Shiite states instead of a united 
Iraq).  To what extent, if any, should the U.S. support the ambitions of ethnic, cultural, or 
religious groups seeking to secure their own nation-states today?  In your response, 
consider the pros and cons of supporting these sorts of new nation-states and discuss why 
the course of action you recommend is preferable to the position taken by those opposing 
your view.    
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
283 
 
Appendix L:  Proctor Instructions 
 
Ensure your students have several blank sheets of paper available and a writing utensil. 
 
Step 1:  You should have a set of notecards (by block) that includes the names of your 
students and their corresponding student numbers.  Pass the notecards out to your 
students and allow them to transfer their student number to their answer sheets.  Students 
should write their number on each page they intend to turn in.  They should not write 
their name on the assessment ? just their student number. 
 
Step 2:  Read to students: 
 
Today you will be writing an essay that measures your ability to think critically about an 
issue of historic and contemporary importance.  This assessment is being tested by _____ 
_____ Schools as an additional way for students to demonstrate what they?ve learned in 
their social studies courses.  Do your very best since the assessment will be used as one 
indication of how well you can apply your knowledge of European history.   [insert how 
the assessment will be graded for your individual class] 
 
This assessment is primarily a test of your ability to reason and make persuasive 
arguments related to the forming of nation-states and, in particular, German unification.  
It includes two main parts.  The first part requires you to write an editorial.  A timeline 
and two historical documents are provided to help you with this task.   The second part 
asks you to state and support your opinion about American support for new nation-states 
today.  Partial credit is awarded, so it is in your best interest to attempt to answer 
each part of the assessment.  You can still earn points even if you do not have a great 
deal of prior knowledge about the topics included on this test. 
 
You may underline passages or take notes on the materials provided to you for this 
assessment.  However, your final response should be provided on separate sheets of 
paper.  All testing materials will be turned in by the end of the testing period.  When you 
finish the assessment, turn over your work and wait for your teacher to come by and pick 
it up.  Please remain quiet throughout the testing period so your peers can concentrate. 
 
As the proctor for this essay, I cannot provide any hints, answers, or suggestions to you 
as you take this exam.  I can restate the directions if you don?t understand what you are 
being asked to do. 
 
You have 1 hour to complete this essay.  What are your questions? 
 
 
 
 
 
 
 
 
284 
 
Appendix M:  Scoring Rubric for Advanced Placement Higher Order Editorial 
 
Part I 
 
1.  Position Statement.  Does the student take a clear position on the question?  (Y=1, 
     N=0).  This 1-2 sentence statement can be found anywhere in the essay.  In order to  
     take a clear position, the student?s statement must specifically indicate what the  
     German people would be endorsing (i.e. a unified state for all German-speaking  
     people, no unification at all, a limited German state with Prussian leadership, etc.). 
       
2.  Historical Context of Problem.  How well is the problem defined in paragraph one  
     before the student provides arguments related to the focus question of the editorial? 
     Does the student appear to understand the events and/or historic forces (i.e. liberalism,  
     nationalism) related to unification described in the opening paragraph?   
 
 0= No background context provided.      
       Assign a 0 when:   
? There is a position statement with no other information 
? The introduction includes vague statements with no real factual 
information from the timeline 
? The introduction just includes persuasive arguments instead of 
background information 
? The paragraph includes more inaccurate statements than valid 
contextual information 
   
 1= Some background context provided.  The student provides a brief mention of  
                 some historical events over the course of 1-2 sentences as part of the  
                 introductory paragraph.   
  Scoring notes: 
? Information that is copied from the top of the timeline does not count.   
? A level one score is characterized by very limited information in 
paragraph 1 that closely follows the timeline.  However, a student might 
also get a 1 score if he/she copies virtually every event from the 
timeline (almost verbatim).   
? Do not assign a ?1? if the student only mentions the war Prussia just 
won against France. 
 
 2= Historical context is well defined.  The student provides a clear and coherent  
                 introduction to the editorial.  Historical events are introduced strategically to  
                 build up to a thesis statement.  The paragraph includes at least two  
                 sentences of relevant historical information or a particularly strong description  
                 of the problem.   
   Scoring notes: 
? The paragraph must stick to the appropriate time period (1870)   
? Inaccurate statements will drop the score to a 1. 
? Look for originality in how the information is introduced to the reader.  
 
285 
 
 
? A ?2? should be strongly considered when students incorporate 
appropriate ideas/topics/events not listed on the timeline.  
 
3.  Persuasiveness.  To what extent does the essay demonstrate persuasive reasoning? 
     Read the entire essay to evaluate the persuasiveness standard.  The underlined portion  
     of the standard is the main factor to consider in assigning a score. 
 
 0= Unsatisfactory.   The student has failed to take a stand on the question, or has 
                  taken a stand, but has failed to provide a single persuasive reason.  The   
                  response may indicate that the student didn?t fully understand the question.    
                  Overall, the response has no chance of persuading the reader. 
 
      1 or 2= Minimal.  The student has taken a stand on the question (which may be  
                   flawed) and provided at least one persuasive reason to back up this stance.  
                   Faulty assumptions, undermining, or irrelevant reasons could result in an  
                   unsatisfactory score if they reduce the persuasiveness of the argument.   
                   Overall, however, the response is unlikely to persuade the reader.   
 
  1= The student provided a single persuasive reason to support his/her 
                              argument.  The reason may have no clear connection to the question of 
                              how a unified Germany should be created (focusing instead on  
                              the desirability of unification).  For example, the student might argue  
                              that unification is a good thing without ever describing what type of  
        unified German state the people should endorse. 
           
  2= This score is assigned when a student provides multiple arguments that  
                              focus entirely on the pros or cons of unification (demonstrating a  
                              flawed understanding of the question).  It might also be awarded when  
                              a student provides a single persuasive argument that is well stated  
                              (typically requiring more than one sentence), but not described at the  
                              level of detail needed for a level 3 score.  The argument, by itself,  
                              remains unlikely to persuade the reader.   
                                 
            3= Adequate.  The student has taken a stand on the question and has provided two 
                 or more persuasive reasons.  The arguments in the essay have a clear relation 
                 to the question.  Elaboration of reasons is not necessary here.  The presentation 
                 of only one persuasive reason can result in a score of  ?adequate? if useful  
                 elaboration is included.  Undermining reasons, faulty assumptions, or  
                 irrelevant reasons can possibly reduce the score to a 2.   Overall, the response  
                 has a chance of persuading the reader. 
 
  *When trying to determine if a single persuasive reason is thorough 
               enough for an adequate score, consider the main criteria for this  
               standard.  Did the student?s elaboration result in an overall argument that  
   has a chance of persuading the reader? 
 
286 
 
 
4=Elaborated.  The student has taken a stand, provided two or more persuasive 
                 reasons, and has provided elaboration on at least one of those reasons (i.e.  
                 accurately referencing documents, providing examples, etc.).  Presentation of   
                 many persuasive reasons (at least three) can also produce this score.  Overall,                         
                 the response is likely to persuade the reader. 
 
  *The student must address in some way the potential reaction of other  
                           countries to German unification (the 2nd part of the focus question) to get  
                           a 4.   
 
 5= Exemplary.  The student?s response meets criteria for ?elaborated?, and 
                 demonstrates (a) at least two elaborated persuasive reasons, and (b) an  
                 argument so clear and coherent (i.e. no significant undermining reasons, faulty  
                 assumptions, or irrelevant reasons) and grammatically correct as to merit  
                 public display as an outstanding accomplishment for a high school student.   
      Overall, the response is more likely to persuade the reader than the elaborated  
                 response. 
 
4.  Low Level Dialectical Reasoning.  To what extent are opposing arguments    
     recognized and developed?   
  
 0 = opposing arguments are not addressed in the editorial or they are not  
                  described fully enough to make sense.  A student might also receive this score 
                  if the opposing arguments are not accurate. 
  *Scoring tip:  ?Described fully enough to make sense? ? take this literally 
                           to mean that you can?t understand what the student is saying.  If you can 
     reasonably understand the point being made by the student, and it 
     represents an accurate opposing view, assign a 1.    
 
 1 = includes one argument that accurately represents an opposing viewpoint on 
                  the issue. The argument is described in minimal detail with little or no use of 
                  historical evidence.  Strong opposing perspectives may be ignored or greatly 
                  simplified so they can be easily refuted later in the essay.  
 
 2=  includes multiple arguments of the sort described for a level 1 score.   
                   
 3 = includes at least one well developed argument that accurately represents an 
                  opposing viewpoint on the issue.  The student seems to understand the  
                  opposing argument(s) he/she is representing.  The degree of development in 
                  the paragraph suggests the student gave more than cursory consideration to 
                  the opposing perspective. 
  Scoring tip:   
                            *The likelihood of a level 3 score increases when a student dedicates 
                              an entire paragraph to explaining opposing views instead of following 
      the pattern of presenting an argument only to immediately shoot it 
 
287 
 
      down 
    *Look for use of the documents to back up the opposing perspective 
    *Look for an explanation of the opposing view that covers at least  
      a couple of consecutive sentences. 
     
 
5.  Quality of Final Position.  How well does the student synthesize opposing  
     viewpoints and offer persuasive counter-arguments to arrive at a well supported final    
     position? 
 
 0=Unsatisfactory.  The student doesn?t provide a conclusion (although some  
                 opposing arguments might be addressed in section 3) or the conclusion  
                 is very brief.   
 
      Assign a 0 when:   
? No concluding paragraph is provided 
? The student doesn?t mention/restate any key points 
? The concluding paragraph mostly includes inaccurate/vague 
statements 
? The paragraph mainly quotes (perhaps without citing) directly 
from a source document with no  elaboration on the part of the 
student 
? The concluding paragraph mostly includes arguments based on a 
future Germany that doesn?t exist in 1870 
? The concluding paragraph actually reduces the overall 
persuasiveness of the editorial based on the presence of random, 
unintelligible, or inaccurate statements. 
 
 1= Adequate.  The student provides a concluding paragraph that incorporates a 
                 response to critics (perhaps at the end of the previous paragraph) and brief  
                 mention of at least 1 key point made in the editorial.  The final position  
                 generally does not add to the persuasiveness of the essay (perhaps because it is  
                 overly brief, vague, ignores major holes in argumentation, etc.).  Some  
                 significant questions may be left unresolved for the reader.   
       Scoring tip: 
   *Simply responding to critics is not enough.  The student must  
                                       conclude the paragraph by listing or mentioning 1-2 key points -  
                                       perhaps in conjunction with a restatement of the thesis (this  
                                       could be 1 sentence).  
    
 
 2= Approaching Satisfactory.  The basic standards for a level 1 score are met.  A 
                 level 2 paragraph features a stronger summary of the key points made in the  
                 essay or a more persuasive response to the views of opponents.  Overall, the  
                 paragraph adds to the persuasiveness of the essay, but there is little evidence  
                 the student genuinely weighed the views of critics when crafting his/her final  
 
288 
 
                 position (no higher level dialectical reasoning). Undermining reasons, faulty 
                 assumptions, or irrelevant reasons can possibly reduce the score to ?adequate?.  
  Scoring tip: 
   ?stronger summary of the key points? ? A good solid paragraph (3 
                                      sentence minimum) that clearly articulates the students? point of  
                                      view. 
 
 3= Satisfactory.   The student synthesizes the views of opponents (perhaps at the  
                 end of section III) and takes these arguments into account when developing a   
                 persuasive final position.  The final position includes at least 2-3 key  
                 points. 
  Scoring tips: 
*Look for tight argumentation (not many unanswered questions),  
  passionate language, a thoughtful response to critics, and  
  reinforcement of key points/ideas. 
*Look for language that suggests the student really considered the 
  opposing view (i.e. my opponents make a good point when they  
  say _______, but I feel they are overlooking?.; I concede that  
  German unification might cause _____, but I wonder if my critics 
  have considered? .) 
 
 
Part II  
 
1. Decision-making. Does the student take a clear position regarding whether the U.S. 
should support the formation of new nation-states based on common traits?  (Y=1; 
N=0). 
   
2. Persuasiveness.  To what extent does the response demonstrate persuasive  
       reasoning?  Read the entire essay to evaluate the persuasiveness standard.  The           
       underlined portion of the standard is the main factor to consider in assigning a score. 
 
 
  0=Unsatisfactory.   The student has failed to take a stand on the question, or has 
                  taken a stand, but has failed to provide a single persuasive reason.  The   
                  response may indicate that the student didn?t fully understand the question.    
                  Overall, the response has no chance of persuading the reader. 
 
             1=Minimal.  The student has taken a stand on the question and provided at least 
                  one persuasive reason to back up this stance. Faulty assumptions,  
                  undermining, or irrelevant reasons could result in an unsatisfactory score if  
                  they reduce the persuasiveness of the argument.  Overall, however, the  
                  response is unlikely to persuade the reader.   
                                 
            2= Adequate.  The student has taken a stand on the question and has provided two 
                 or more persuasive reasons.  Elaboration of reasons is not necessary here.  The  
 
289 
 
                 presentation of only one persuasive reason can result in a score of  ?adequate?  
                 if useful elaboration is included.  Undermining reasons, faulty assumptions, or  
                 irrelevant reasons can possibly reduce the score to a 2.   Overall, the response  
                 has a chance of persuading the reader. 
 
3=Elaborated.  The student has taken a stand, provided two or more persuasive 
                 reasons, and has provided elaboration on at least one of those reasons (i.e.  
                 providing examples, etc.).  Presentation of many persuasive reasons (at least  
                 three) can also produce this score.  Overall, the response is likely to persuade  
                 the reader. 
 
 4= Exemplary.  The student?s response meets criteria for ?elaborated?, and 
                 demonstrates (a) at least two elaborated persuasive reasons, and (b) an  
                 argument so clear and coherent (i.e. no significant undermining reasons, faulty  
                 assumptions, or irrelevant reasons) and grammatically correct as to merit  
                 public display as an outstanding accomplishment for a high school student.   
      Overall, the response is more likely to persuade the reader than the elaborated  
                 response. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
290 
 
Appendix N:  Scoring Rubric for Manifest Destiny Higher Order Assignment 
 
Part I 
1. Position Statement.  Does the student take a clear position on the question?  
(Y=1, N=0).  This statement can be found anywhere in the essay.  The position 
must relate to Manifest Destiny.   
       
2. Historical Context of Problem.  Does the student appear to understand the 
events that contributed to the Mexican-American War?  How well is the problem 
            defined in paragraph one before the student engages in arguments for or against 
            America?s actions? 
 
 0= No background context provided.      
       Assign a 0 when:   
? There is a thesis with no other information 
? The introduction includes vague statements with no real factual 
information 
? The introduction just includes persuasive arguments instead of 
background information 
? The paragraph includes more inaccuracies or seemingly random 
statements than valid contextual information 
   
 1=Some background context provided.   The student provides some historical  
                 context for the essay (at least one historical event).  The event should serve as  
                 context, not as part of an argument. 
 
? Information that is copied from the top of the timeline sheet (i.e. the 
first sentence, the definition of Manifest Destiny) does not count.   
? Events may be listed (perhaps verbatim) from the timeline.  It may be 
unclear whether the student truly understands the problem ? especially 
if some of the events are inaccurately stated.   
 
 2= Historical context  is well defined.  The student demonstrates an understanding 
                  of the problem and uses at least some language that differs from the source 
                  documents.   
      Key indicators: 
? The paragraph sticks to the appropriate time period (1847).  The student 
should not state that the U.S. gained New Mexico, California, and 
Texas as a result of the war. 
? The paragraph suggests some elaborated understanding beyond what is 
on the timeline (see scoring tips).   
? The paragraph is generally free of inaccurate statements. 
? The paragraph should include or at least reference key events that 
immediately led to the Mexican-American War (border dispute, 
annexation of Texas).   The historical context is not well defined if the 
 
291 
 
student exclusively talks about the war for Texas? independence that 
happened over ten years before the decision point of this essay.  
 
3. Persuasiveness.  To what extent does the essay demonstrate persuasive 
reasoning?  Does the student relate his/her response to Manifest Destiny?  Read 
the entire essay to evaluate the persuasiveness standard.  The underlined portion 
of the standard is the main factor to consider in assigning a score. 
 
 0= Unsatisfactory.   The student has failed to take a stand on the question, or has 
                  taken a stand, but has failed to provide a single persuasive reason.  The  
                  response may indicate that the student didn?t fully understand the question.   
                  Overall, the response has no chance of persuading the reader. 
 
    1 or 2= Minimal.  The student has taken a stand on the question and provided at least 
                 one persuasive reason to back up this stance. The ?stand? may be focused 
                 entirely on whether the war was right or wrong with no reference to Manifest  
                 Destiny.  Faulty assumptions, undermining, or irrelevant reasons could  
                 possibly reduce the score from a 2 to a 1 or from minimal to unsatisfactory.   
                 Overall, however, the response is unlikely to persuade the reader.   
 
  1=The student provided a single persuasive reason to support his/her 
                             argument.  The reason may have no clear connection to Manifest 
                             Destiny.   
   Examples of arguments without a connection to Manifest Destiny:   
   The U.S. was acting in self-defense because Mexico attacked first 
   The war was justified because Mexico refused to meet with Slidell 
   The U.S. needed the land for a growing population 
   Stealing land is wrong (assuming the student didn?t connect this  
                                    statement to Manifest Destiny) 
           
  2=The student?s essay mainly focused on whether the war was right or  
                             wrong.  In this context, the student provided multiple persuasive  
                             reasons, or a single persuasive reason described in greater depth 
                             (several sentences), to support his/her position.  
   *Note:  A single persuasive argument with a clear connection to 
                                      manifest destiny that does not contain enough elaboration for a  
                                      ?3? would also receive this score. 
         
             3=Adequate.  The student has taken a stand on the question and has 
                             provided two or more persuasive reasons.  The arguments in the essay  
                             have a clear relation to Manifest Destiny.  Elaboration of reasons is not  
                             necessary here.  The presentation of only one persuasive reason  
                             can result in a score of  ?adequate? if useful elaboration is included.   
                     Undermining reasons, faulty assumptions, or irrelevant reasons can 
                             possibly reduce the score to ?minimal.?  Overall, the response has a  
                             chance of persuading the reader. 
 
292 
 
 
  *When trying to determine if a single persuasive reason is thorough 
                           enough for an adequate score, consider the main criteria for this  
                           standard.  Did the student?s elaboration result in an overall argument that  
                           has a chance of persuading the reader? 
 
 4=Elaborated.  The student has taken a stand, provided two or more persuasive  
                 reasons, and has provided elaboration on at least one of those reasons (i.e.  
                 accurately referencing documents, providing examples, etc.).  Presentation of  
                 many persuasive reasons (at least three) can also produce this score.  The  
                 arguments presented by the student have a clear connection to Manifest  
                 Destiny.  Overall, the response is likely to persuade the reader. 
 
 5=Exemplary.  The student?s response meets criteria for ?elaborated?, and  
                 demonstrates (a) at least two elaborated persuasive reasons, and (b) an  
                 argument so clear and coherent (i.e. no significant undermining reasons, faulty  
                 assumptions, or irrelevant reasons) and grammatically correct as to merit  
                 public display as an outstanding accomplishment for a high school student.   
      Overall, the response is more likely to persuade the reader than the elaborated  
                 response. 
 
4. Low Level Dialectical Reasoning.  To what extent are opposing arguments 
      developed?   
  
 0= opposing arguments are not addressed in the editorial or they are not 
                 described fully enough to make sense.  A student might also receive this  
                 score if the opposing arguments are not accurate. 
 1= includes one argument that accurately represents an opposing viewpoint on 
                 the issue.  The argument is described in minimal detail with little or no use of  
                 historical evidence.  Strong opposing perspectives may be ignored or greatly 
                 simplified so they can be easily refuted later in the essay.  
 2= includes multiple arguments of the type described for a level 1 score. 
 3= includes at least one well developed argument that accurately represents an 
                 opposing viewpoint on the issue.  The student seems to understand the  
                 opposing argument(s) he/she is representing.  The degree of development in 
                 the paragraph suggests the student gave more than cursory consideration to  
                 the opposing perspective.     
 
5. Quality of Final Position.  How well does the student synthesize opposing 
 viewpoints and offer persuasive counter-arguments to arrive at a well supported 
 final position? 
 
 0=Unsatisfactory.  The student doesn?t provide a conclusion or the conclusion 
                 consists of a single sentence that restates the student?s opinion.   
      Assign a 0 when:   
? No concluding paragraph is provided 
 
293 
 
? The student doesn?t make/restate any arguments 
? The concluding paragraph mostly includes inaccurate/vague 
statements 
? The paragraph mainly quotes (perhaps without citing) directly 
from a source document with no  elaboration on the part of the 
student 
? The concluding paragraph mostly includes arguments based on a 
future America that doesn?t exist in 1847 
? The concluding paragraph actually reduces the overall 
persuasiveness of the editorial based on the presence of random, 
unintelligible, or inaccurate statements. 
 
 1=Adequate.  The student lists or mentions 1-2 key points made in the essay.   
                 This may be in conjunction with a restatement of the thesis.  The conclusion  
                 generally does not add to the persuasiveness of the essay and the arguments of  
                 critics are given little to no consideration.  Some significant questions may be  
                 left unresolved for the reader. 
       
 2=Approaching Satisfactory.  The student summarizes 1-2 key points made in the 
                 essay or adds a final persuasive reason that is new.  Overall, the paragraph  
                 adds to the persuasiveness of the essay, but there is little evidence the student 
                 genuinely weighed the views of critics when crafting his/her final position.   
                 Undermining reasons, faulty assumptions, or irrelevant reasons can possibly  
                 reduce the score to ?adequate?.  
 
 3=Satisfactory.   The student synthesizes the views of opponents (perhaps at the  
                 end of section III) and takes these arguments into account when developing a 
                 persuasive final position.  The final position should include at least 2-3 key  
                 points.  
?takes these arguments into account? = mentions or references them in  
  final paragraph 
 
Part II assesses the connectedness to the real world standard.  Essays are evaluated based 
on the extent to which the student connects the disciplinary topic of Manifest Destiny to 
contemporary issues or events that have personal relevance in their own life.   
  
2= Explicit Connection.  There is an explicit connection being made between classroom  
     knowledge (Manifest Destiny) and contemporary situations outside the classroom.  
 
1=Possible Connection.  The student?s response hits on themes associated with Manifest 
     Destiny/American Exceptionalism, but this may not have been intentional on the part 
     of the student.  The student might also receive this score if he/she makes valid  
     historical references demonstrating how a particular mission might be exemplified 
     across time.    
 
 
 
294 
 
0= No connection.  It isn?t clear whether the student recognizes any parallels between 
     Manifest Destiny in the 1800s and U.S. actions today.   
           Indicators:  a totally off topic response, vague responses (world peace), etc. 
 
Scoring Tips: 
 
Position Statement:   
? Does the student clearly weigh in on one side of the issue (no waffling)? 
? Does the student?s position clearly relate to Manifest Destiny?  Simply stating that 
the war was wrong or right does not count. 
? Look for an actual statement.  In some cases, you will be able to infer the 
student?s position based on arguments throughout the essay.  However, a 1 is only 
assigned for a concise statement (1-2 sentences) that clearly indicates the 
students? position on the idea of Manifest Destiny as it pertains to the Mexican-
American War. 
? It is possible for a student to take a stand on the question without having a clear 
position statement.  The persuasiveness score can still be high assuming the 
student?s position on Manifest Destiny can reasonably be inferred.   
 
Historical Context:   
? A great deal of variation can exist at level 2.  A student with one background 
event and a student with an entire page of events can potentially get the same 
score.   
? A paragraph that is not set within the United States (i.e. student assumes 
perspective of a Mexican citizen) can receive a 2 if background events are still 
explained accurately.    
? Close adherence to the timeline is an indicator that the student might not possess 
much depth of knowledge on the topic (i.e. inclusion of virtually every event on 
the timeline in paragraph 1 or misreading the timeline to suggest that Texas 
gained its independence as a result of the Alamo, etc.). 
? Indicators of more in-depth understanding that would support a score of 2:  
 -purposively selecting key events rather than trying to cover every topic 
              listed on the timeline.   
             -properly using the phrase ?Manifest Destiny?   
             -incorporating appropriate ideas/topics/events not listed on the timeline  
             -using style and language that differs substantially from the timeline 
 
 
Persuasiveness:   
Any score above the minimal level requires a connection to be made to Manifest Destiny.  
The connection doesn?t necessarily have to be explicit if you can reasonably infer that the 
student understands Manifest Destiny and that his/her arguments closely fit the question.   
A particularly strong conclusion can add to the persuasiveness score. 
 
 
 
 
295 
 
Quality of final position: 
In judging between a 1 or 2, consider how developed the paragraph is (summarize vs. 
mention) and its persuasiveness.   A weak concluding argument (in terms of logic) would 
likely receive a 1. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
296 
 
Appendix O:  Authentic Pedagogy Scores 
 
 
Minimal Authentic Pedagogy 
 
Roy?s Authentic Pedagogy scores   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Political Cartoon 4 5  
Industrial Revolution Illustrated Timeline 6 6  
Teach a Lesson 4 4  
Average Task/Instruction Scores 4.6 5 9.6 
 
 
Andy?s Authentic Pedagogy scores   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
U.S. History Project 5 5  
Reformers of the 1800s 5 7  
Manifest Destiny Questions 4 7  
Average Task/Instruction Scores 4.6 6.3 10.9 
 
 
 Jason?s Authentic Pedagogy scores   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Presidential Research 8 4  
Declaration Activity 7 5  
Daily Life of Civil War Soldiers 5 6  
Average Task/Instruction Scores 6.6 5 11.6 
 
 
 
 
 
 
 
 
 
 
297 
 
 
 
Limited Authentic Pedagogy 
 
Amy?s Authentic Pedagogy scores   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Absolute Monarchy of your Own 4 8  
Ideal Form of Government Debate 9 10  
Renaissance Ball 4 4  
Average Task/Instruction Scores 5.6 7.3 12.9 
 
Phillip?s Authentic Pedagogy scores   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Washington?s Farewell Address 8 8  
Reformers Lesson 7 4  
Manifest Destiny Painting Analysis 6 7  
Average Task/Instruction Scores 7 6.3 13.3 
 
 
Moderate Authentic Pedagogy 
 
Lauren?s Authentic Pedagogy scores   
Task Name  Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Industrial Revolution Documentary 8 8  
PR Campaign Billboard Assignment 8 14  
French Revolution Storybook 5 N/A  
Average Task/Instruction Scores 7 11 18 
  
Ryan?s Authentic Pedagogy scores   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Czar Nicholas Think Aloud 8 14  
Political Cartoon Analysis 8 12  
Me Card 7 14  
Average Task/Instruction Scores 7.6 13.3 20.9 
 
298 
 
 
 
 
 
Lee?s Authentic Pedagogy scores   
Task Name Task 
Score 
(3-10) 
Instruction 
Score 
(4-20) 
Final 
Authentic 
Pedagogy 
(7-30) 
Industrial Revolution Editorial 8 15  
Truman Think Aloud 8 11  
Industrial Revolution Illustrated Timeline 7 15  
Average Task/Instruction Scores 7.6 13.6 21.2 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
299 
 
 
Appendix P:  Manifest Destiny Painting 
 
American Progress by John Gast 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
300 
 
Painting Analysis 
?American Progress? 
a painting by John Gast 
1872 
 
     A.  Look at the painting for at least one minute without writing anything.  Look at every portion 
          of it without excluding anything. 
 
     B.  Use the chart below to categorize what is going on in the painting. 
 
People Objects Activities 
   
   
   
   
   
   
   
   
   
   
   
 
     C.  How does the artist use color?   Light/Dark scenes? 
 
 
     D.  Does this painting have a negative or positive connotation regarding Manifest  
           Destiny?  Why? 
 
 
      E.  What is this woman carrying in her right arm?  What does it mean? 
 
 
      F.  What does this painting tell you about Manifest Destiny? 
 
 
 
      G.  Think back to ?Washington Crossing the Delaware? by Emmanuel Leutze.  Did that 
            painting provoke a positive or negative connotation regarding the Revolution? 
 
 
      H.  How do those two paintings compare in their connotations of their era of American  
            History? 
 
 
 
 
 
 
 
 
 
 
301 
 
Appendix Q:  WWII Political Cartoon 
 
 
 
 
 
 
 
 
302 
 
 
Appendix R:  Moderate Authentic Pedagogy Task 
 
Truman Considers the Berlin Crisis 
Is the U.S. justified in imposing its will in Europe? 
 
Instructions for the Truman Decision-making groups: 
 
In a meeting on this crisis, you will hear from George Marshall (Sec. of State), George 
Kennan (ambassador to the U.S.S.R.), Henry Wallace (former Sec. of Commerce), and 
Walter Lippman (well known journalist).  Listen carefully to each of their positions and 
recommendations.  Record these and any concerns you have below.  After hearing from 
each person, discuss the options available to President Truman.  Brainstorm the strengths 
and weaknesses, benefits and dangers of each position. 
 
Advisor Recommendation Concerns Strength Weakness 
Marshall 
 
 
 
   
Kennan 
 
 
 
   
Wallace 
 
 
 
   
Lippman 
 
 
 
   
In coming to your decision consider: 
 
1. What are the strongest arguments to be made for each option? 
2. What are the strongest arguments against each option? 
3. Is the U.S. justified in imposing its will in Europe?  If so, how? 
4. Is the U.S. justified in withdrawing from the conflict? 
5. Does our moral responsibility to a people cut off by an outside force outweigh all 
other political/practical alternatives? 
6. What decision will bring about the best solution for the U.S.?  Europe?  The 
world? 
 
You will justify your decision to the American people in a speech ? be sure to address 
each of these considerations as you plan your thoughts (collectively). 
 
 
 
 
 
303 
 
What course of action did your Truman group choose?  Justify your choice! (use 
other paper as needed) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
304 
 
Appendix S:  Content Analysis Explanation and Examples 
 
 
Several publications produced by the Alabama Department of Education provide 
information about the social studies graduation exam (Alabama Department of 
Education, 2009b; Morton, 2009; Richardson, 2000).  Alabama began minimum 
competency testing in 1977.  Three editions of the graduation exam have been created 
since, with the latest implemented in 1998 (Morton, 2009, p. 11).  The third edition was 
the first to include a social studies subtest.  The social studies graduation exam has 100 
questions.  Students must answer 54 correctly to pass. 
The social studies graduation exam associated with this study went into effect 
with the class of 2003.  In order to receive a diploma, students must pass all of the 
graduation exams.  However, students can get an alternative diploma called the Alabama 
High School Diploma with Credit-Based Endorsement if they pass reading, math, and 
one other graduation exam.  Students initially take the social studies exam in the tenth 
grade for practice.  This test counts if it is passed.  Students get four additional attempts 
to pass the exam before the end of their senior year.  After graduation, exited students can 
take the exam as many times as they want during regularly scheduled times (Morton, 
2009, p. 4).   
 No previous versions of the social studies graduation exam have been released 
and little information was available regarding the process for determining cut scores.  The 
best source of information on the test items came from Bulletin 2000, No. 49: a 
publication provided to the general public by the Alabama Department of Education.  
This publication was designed to enable students to understand the general format of the 
test and the weight provided to different historical time periods.  The questions were not 
 
305 
 
intended to necessarily be representative of the difficulty level of the social studies 
subtest.  A content analysis using Bulletin 2000 was not ideal, but it was the only option 
available.  According to a personal email communication from Dr. Gloria Turner who 
served as the Director of Assessment for the Alabama State Department of Education, the 
content standards are ?considered to be minimum, required, fundamental, and specific? 
(G. Turner, personal communication, February 11, 2008).  Teachers familiar with the 
exam (through unsolicited student comments) also confirmed that the test is similar to the 
item specifications in the bulletin. 
 The content analysis was conducted using Bloom?s taxonomy and the first two 
authentic pedagogy standards associated with the task rubric (Construction of Knowledge 
and Elaborated Communication).  The table at the end of this appendix provides a 
breakdown of how the 84 items were rated.  All of the items were coded by three raters: 
Lamont Maddox, Dr. John Saye, and a graduate student trained with the AIW rubrics. At 
least two of the three raters agreed 94% of the time when applying Bloom?s taxonomy to 
categorize questions as having either low or high levels of cognitive difficulty.  There 
was complete agreement among raters on the Elaborated Communication (EC) standard 
on the AIW rubric since the test is exclusively multiple-choice (and therefore does not 
require EC above a 1).  Two of the three raters agreed 98% of the time when scoring 
based on the Construction of Knowledge standard on the AIW rubric. When 
disagreement occurred it was within one level on the rubric (1 or 2). In most instances the 
three raters agreed on the rating regardless of the method of analysis.   
 
 
 
 
306 
 
Results of Content Analysis   
 
Method of Analysis N %  
Bloom?s Taxonomy    
   High 3 4%  
   Low 81 96%  
Construction of Knowledge (AIW)    
   Level 3 
   Level 2 
   Level 1 
0 
11 
73 
0 
13% 
87% 
 
Note.  Low = knowledge and comprehension; high = some application or analysis.  
Numbers reflect agreement among at least two of the three raters on each item. 
 
 
Sample Items: 
 
1.  The first fort in America built by the Spanish was located in 
A.  El Paso, Texas 
B.  St. Augustine, Florida 
C.  Natchez, Mississippi 
D.  New Orleans, Louisiana 
 
Scoring Notes:  This task requires recall of factual information.  It reflects a ?low? 
knowledge level ranking on Bloom?s taxonomy since no higher order processes 
(i.e. synthesis, application, analysis, etc.) are required.  On the construction of 
knowledge authentic intellectual work scale, this question most closely 
approximates a level 1 score whereby the dominate expectation is ?that students 
will merely reproduce information gained by reading, listening, or observing.?  
The question also reflects a level 1 score for elaborated communication since it is 
multiple-choice. 
 
2.  The Missouri Compromise of 1820 
 A.  ended the slave trade in the United States. 
 B.  maintained a balance between slave and free states. 
 C.  granted political rights to slaves escaping to free states. 
 D.  allowed the expansion of slavery in all United States territories. 
 
Scoring Notes:  This question also requires recall of factual knowledge.  In this 
case, students must remember what the Missouri Compromise accomplished. The 
question received a ?low? knowledge level designation on Bloom?s taxonomy.  
On the construction of knowledge authentic intellectual work scale, this question 
most closely approximates a level 1 score whereby the dominate expectation is 
?that students will merely reproduce information gained by reading, listening, or 
 
307 
 
observing.?  The question also reflects a level 1 score for elaborated 
communication since it is multiple-choice. 
 
3.  Use the passage below and your own knowledge to answer Number 5. 
 
 Removal of Southern Indians to Indian Territory, 1835 
 
The plan of removing the aboriginal people who yet remain within the settled portions of 
the United States?approaches its consummation?an extensive region?has been 
assigned for their permanent residence.  It has been divided into districts and allotted 
among them.  Many have already removed and others are preparing to go?  
 
The pledge of the United States has been given by Congress that the [region] destined for 
the residence of this people shall be forever ?secured and guaranteed to them.?  A 
[region] ?has been assigned to them, into which the white settlements are not to be 
pushed?A barrier has thus been raised for their protection against the encroachment of 
our citizens?  
 
The action described in the passage was a direct result of the 
 
 A.  growth of social reform movements. 
 B.  westward expansion of the United States. 
 C.  movement of people from rural to urban areas. 
 D.  acquisition of territories overseas by the United States. 
 
Scoring Notes:  This question also falls at the lower end of Bloom?s taxonomy, 
but it measures comprehension of the material in the paragraph instead of just 
recall. It scored at the ?2? level on the construction of knowledge scale.  This 
indicates that there ?was some expectation for students to interpret, analyze, 
synthesize, or evaluate information, rather than merely to reproduce information.?  
The question also reflects a level 1 score for elaborated communication since it is 
multiple-choice. 
 
 
 
 
 
 
 
 
 
 
308 
 
Appendix T:  Notes on the Student Sample 
 
Every attempt was made to make the student sample as inclusive as possible.  
Thirty students in the sample had multiple tenth grade social studies teachers because 
they took more than one social studies course during their tenth grade year.  These 
students were excluded from the analysis for a number of reasons.  In many cases, 
authentic pedagogy scores weren?t available for both teachers.  It was also difficult to 
isolate the effects of each teacher?s instruction on student performance.  With these 
students removed the resulting N for the study was 805.  When factoring out students 
who did not have data listed for the social studies graduation exam variable, that number 
was reduced to 747.   
 The study schools have a strong reputation for academic excellence.  As a result, 
they experienced a relatively high number of transfer students.  Data was collected from 
the system at different points during the study as test results, student grades, and other 
information became available.  The data collection schedule and changing student 
population at the schools produced some discrepancies in student documentation.  For 
example, students were listed on class rosters (by ID #), but corresponding demographic 
or achievement data was not always available on the other spreadsheets.  Whenever 
possible, I worked through the school system to resolve these differences.  However, the 
final data set still had some missing data. The statistics for the various analyses included 
in chapter five are based on cases with no missing values for the variables used.   
 
 
 
 
309 
 
Appendix U:  Technical Description of Multiple Regression Analysis  
 
Research Question Two:  Do students that have been taught by teachers demonstrating 
higher levels of authentic pedagogy score higher on the Alabama High School 
Graduation Exam (AHSGE) than students taught by teachers with lower levels of 
authentic pedagogy?   
The initial step in analyzing this question was to clearly define the predictor 
variables, other than authentic pedagogy, that most influenced students? graduation exam 
scores.  I ran a series of regression analyses designed to filter out highly correlated 
variables that would overlap in explaining the variance of graduation exam scores.  I first 
conducted a backward entry regression analysis in SPSS using demographic and course 
related variables. This procedure resulted in the removal of the course type predictor 
variable (courses that were a year or just a semester in length ? Fall or Spring) because it 
was interacting with the course name variable (.895).   A second regression was 
conducted sequentially using the forced entry method.  Results of this analysis indicated 
that the course name variable (AP European History or Regular U.S. History) was highly 
correlated with authentic instruction (.802).  This correlation was not very surprising 
since the sample of teachers was small, and the teacher with the highest authentic 
pedagogy score also taught the majority of the Advanced Placement courses.  In order to 
better ascertain the impact of authentic instruction on the graduation exam results, the AP 
European History students needed to be removed from the analysis. 
I ran a final sequential regression analysis and filtered out all of the students who 
took Advanced Placement European History. The resulting analysis included 427 
students who took regular 10th grade United States history over the course of the two 
 
310 
 
years covered by the study.  In evaluating the multiple regression models I made sure to 
test common assumptions.  Ultimately, the assumptions needed for valid regression 
results were met.     
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
311 
 
Appendix V:  Technical Description of One-way ANOVA Procedures  
 
In order to effectively run the ANOVA tests associated with research question 
two, the classes being compared need to be as similar as possible.  The process that was 
used to pair like classes is described in this appendix.  I first paired a class from the 
minimal authentic pedagogy category (Andy) with one from the limited authentic 
pedagogy category (Phillip).  The classes included in the analysis were ones that I 
actually observed, although not in the same school year.  Both classes were taught during 
the spring semester and were regular U.S. history courses.  I compared the classes on 
specific demographic variables using the Pearson Chi-Square test.  No statistically 
significant differences were found between the two classes in terms of gender, race, or 
socio-economic status.  A subsequent T-test indicated that the limited authentic pedagogy 
class had higher mean social studies grades (87.55 vs. 82.50), but this difference was not 
significant (t=1.429, p=.160). 
 
Comparison of Minimal and Limited Authentic Pedagogy Classes 
 # of Students  
Variable Minimal  Limited  Chi-Square Value 
Race   1.422 
     White 12 14  
     African-American 11 6  
Gender   .023 
     Male 13 12  
     Female 13 11  
SESa   1.243 
     Free/Reduced Lunch 3 6  
     Paid 20 17  
Note.  aFisher?s Exact Test (2-sided) consulted since two cells had expected count of less 
than 5.  Result still did not reach significance (.459).   
 
 
312 
 
I also compared Andy?s minimal authentic pedagogy class with a class taught by the 
highest scoring tenth grade teacher (Ryan).  Ryan?s class had higher mean social studies 
grades (85.45 vs. 80.92).  The difference in means was not statistically significant 
(t=.924, p=.374).   The table below indicates that the classes did not differ significantly 
on gender or socio-economic status.  However, they did differ based on race with the 
minimal class having significantly more African-Americans.   As a result, I focused the 
subsequent ANOVA analysis on white students only. 
 
Comparison of Minimal and Moderate Authentic Pedagogy Classes 
 
Variable Number of Students 
 Minimal         Moderate  
Chi-Square  
Race   6.571** 
     White 12 20  
     African-American 11 3  
Gender   .087 
     Male 13 13  
     Female 13 11  
SES   .505 
     Free/Reduced Lunch 3 5  
     Paid 20 19  
Note.  aFisher?s Exact Test (2-sided) consulted since two cells had  
expected count of less than 5.  Result still did not reach significance (.701).  
 **p<.01. 
 
 
 
 
 
 
 
 
 
 
 
 
 
313 
 
Appendix W:  Technical Description of Factorial MANOVA Procedures 
 
 
This section provides additional information regarding the analysis of data 
associated with the higher order editorials (research question three).  I will first discuss 
the process used to analyze the Manifest Destiny editorials before turning to the advanced 
placement German Unification writing task.  In order to conduct the MANOVA analysis 
associated with this research question, the three groups (minimal, limited, and moderate 
authentic pedagogy) needed to be as similar as possible.  I wanted to control for variables 
that could impact student performance other than authentic pedagogy.  I compared the 
groups on specific demographic variables using the Pearson Chi-Square test.  The results 
indicated that the groups did not differ significantly for gender or SES.  However, the 
difference in race was significant. The three groups differed in their racial composition 
(black vs. white) beyond what you would anticipate happening by chance. The follow-up 
to this finding was to determine whether race played a significant role in influencing 
student performance on the higher order assessment. If it didn?t, then the differences 
between the groups on this variable were irrelevant. I ran a MANOVA and found that 
race did have a significant impact on student performance (Hotelling?s Trace p=.039) so I 
decided to incorporate this variable into my final factorial MANOVA model which 
included the various authentic pedagogy teacher groupings (minimal, limited, moderate) 
as the other independent variables.  This model revealed that there was not a statistically 
significant impact for race on student achievement on the designated dependent variables 
(Hotelling?s Trace p=.107).     
In addition to trying to control for some of the demographic characteristics that 
could influence achievement on the Manifest Destiny higher order assessment, I also 
 
314 
 
conducted a one-way ANOVA to determine if the groups were significantly different in 
terms of students? grades in history.  The assumption of homogeneity of variance was 
violated; therefore, the Welch F-ratio is reported.  The teacher group (minimal, limited, 
or moderate) did not have a significant effect on student grades F(2, 70) = 2.047, p = 
.137.  Put differently, the difference in mean 10th grade history average between the 
groups may be due to chance (82, 81, and 84). 
 
Comparison of Minimal, Limited, and Moderate Authentic Pedagogy Groups for 
Manifest Destiny Editorial 
 
Variable Number of Students 
  Minimal         Limited      Moderate 
Chi-Square  
Race    9.320** 
     White 36 14      48  
     African-American 23 13      11  
Gendera        .969 
     Male 35 14      30  
     Female 27 15      32  
SES        5.461 
     Free/Reduced Lunch 13 10        9  
     Paid 46 17       52  
Note.  **p<.01. 
 
 
 A similar process was used to compare groups for the advanced placement 
editorial.  In this case, there were only two groups:  limited and moderate authentic 
pedagogy.  The groups were similar in terms of gender, SES, and race.  A T-test indicated 
that the limited authentic pedagogy class had lower mean social studies grades (83.40 vs. 
84.29), but this difference was not significant (t= -.550 , p=.584).  The results of these 
analyses suggested that a fair comparison could be made between the two authentic 
 
315 
 
pedagogy groups because no significant differences existed on the variables I had chosen 
to examine. 
 
Comparison of Minimal, Limited, and Moderate Authentic Pedagogy Groups for 
Advanced Placement German Unification Editorial 
 
Variable    Number of Students Chi-Square  
Limited Moderate 
Racea    
     White 18 40 3.134 
     African-American 6 4  
Gender        
     Male 10 27 2.742 
     Female 23 29  
SESb        
     Free/Reduced Lunch 2 4 .048 
     Paid 31 51  
Note.  aFisher?s Exact Test (2-sided) consulted since one cell had an expected count of 
less than 5.  Result still did not reach significance (.148); bFisher?s Exact Test (2-sided) 
consulted since two cells had an expected count of less than 5.  Result still did not reach 
significance (1.000). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
316 
 
Appendix X:  Higher Order Editorial Examples 
 
Advanced Placement Example #1 
 
Part I Scoring Notes 
Paragraph 1:  Introduction. 
     Germany has gone through uprisings/riots in 
1848, won many wars including the Danish-Prussian 
and Austro-Prussian war; and has united northern 
Germany in 1867.  German people should unify into 
one nation because the German people have shown 
their strength through winning many wars and 
proved that unification is possible when Northern 
Germany was united.  Other nations and especially 
conservatives would oppose this because one 
German nation-state would be a threat to their 
country. 
 
Position = 1.  See underlined 
sentence in paragraph 2 
 
Context = 1.   The student 
provides one sentence of 
historical background that sticks 
pretty close to the timeline.   
Paragraph 2:  Supporting Arguments 
     The unification of all Germanic people should be 
endorsed by the German people because we have 
proven to be a strong country united by winning 
many wars.  The defeat of the Danish in 1864 shows 
that the German military is strong and can be very 
successful and victorious in battle creating a stable 
nation.  To add to this, the unification of Northern 
Germany in 1867 shows that unification can be done.  
It would be in our best interest as German people to 
unite because the Constitution of the North German 
Confederation allows for more rights for citizens and 
there is hope for a constitution for all of Germany if 
we unify. 
 
Persuasiveness= 3   ?adequate.? 
The student never defines what a 
unified Germany would include 
(i.e. Austria?).    
Three main points are made.  
Unification will result in a 
stronger, more stable country.  It 
can be accomplished (practical) 
and it would potentially bring 
more rights for citizens.   The 
student doesn?t elaborate enough 
on these points for a four. 
 
Paragraph 3:  Opposing Views 
     I think Conservatives in other countries would 
oppose the unification of Germany.  Some arguments 
that any opposer would make could be that 
nationalism causes war in that countries/nations want 
to grow (Document 1).   Along with this argument 
opposers might say that nationality is secondary as in 
document 1.    
     
Dialectical Reasoning =0 
The student did not adequately 
describe the opposing view that 
nationalism causes war.  Why 
would Germany ?want to grow? 
and how would this lead to 
conflict?  The other parts of this 
paragraph are also too brief to 
gauge student understanding. 
 
Paragraph 4:  Final Conclusion 
     Although becoming unified may bring about wars 
to extend boarders, with nationalism comes a sense 
Quality of Final Position= 1 
The student responds to critics 
(even though the critic?s views 
 
317 
 
of pride and the need to make your country strong 
from outside enemies.  Take for example Northern 
Germany.  Before 1867 they were scattered nations 
and when the Danish-Prussian War came along, they 
had no power to resist and were defeated.  It is also a 
false accusation to say that nationality is secondary to 
other matters.  Being one nation connect us all and 
make us feel obligated to make the country stronger.  
In conclusion Unification of all Germanic peoples is 
a great idea because it would make us part of a 
strong, stable nation, that is able to protect itself.     
aren?t very well described in 
paragraph 3) and offers a brief 
conclusion.  The paragraph does 
not add to the persuasiveness of 
the editorial, especially since the 
example is not very clear and 
possibly inaccurate.     
 
Part II Scoring Notes 
     I think nation-states based on common ethnic 
groups, culture, or religion is a negative idea.  
Although it could bring relations among people of 
the same groups to grow stronger, it would support 
segregation and hatred among groups.  If people 
can?t interact with all sorts of diverse people, then 
there is no way of beginning to understand them.  
Not understanding people often brings the feeling of 
group superiority and anger towards others.  I 
believe that experiencing different things and 
interacting with different people brings a more 
cultured and well rounded society that promotes 
peace and understanding not segregation and hate. 
Decision-Making = 0 ? Never 
mentions anything about U.S. 
policy although the student?s view 
can be inferred. 
 
Persuasiveness = 2 ? ?Adequate? 
The response has a chance of 
persuading the reader.  The 
student argues that new nation-
states would promote more hatred 
and increase the likelihood that 
groups will not understand each 
other.  The ideas are not supported 
with any elaboration. 
 
 
 
 
 
 
 
 
 
 
318 
 
 
 
Advanced Placement Example #2 
 
Part I Scoring Notes 
Paragraph 1:  Introduction. 
     The unification of Germany would only escort a 
plethora of problems and end up dumping upon us strife.  
Germany has been embedded in a perpetual state of war.  
[We?ve endured the Danish-Prussian War, Austro-Prussian 
War, and the Franco-Prussian War, and we German people 
are drained like our economy].  It will be impossible to 
unite peacefully and successfully the currently fragmented 
Germany.  German[s] should oppose the unification of 
Germany because of our economic stature of Germany as 
well as the impossibility of uniting all these prideful 
German nationalities. 
      
Position = 1 ? The student 
clearly is opposed to 
German unification. 
 
Context = 1 ? Some 
historical context provided 
in the bracketed sentence. 
Paragraph 2:  Supporting Arguments 
            If Germany unifies, we will never exit a state of 
war.  Germany is composed of numerous nationalities who 
own a sense of entitlement.  No nationality will want to 
engage the compromise that will be required by unification 
(1).  Not only that, but we would have to annex parts of 
Switzerland, the Netherlands, and Belgium.  These 
countries certainly will not be happy and will most likely 
pursue war (2).  Another concern plaguing my mind is our 
economic situation.  Germany has been engaged in war 
after war, and we all know war drains economies (3).  
Trying to unify in a time of economic difficulty is certainly 
not a smart move.  Because we are weary of war and our 
economy is drained, it is not a good idea that Germany 
should unify. 
     
Persuasiveness =3 
?Adequate? 
The student provided three 
arguments with one of them 
closely following the 
source document.  None of 
the arguments include 
much in the way of support.  
The economic argument 
may be correct, but the 
student provides no 
information to substantiate 
the claim.  The editorial 
would be more 
persuasive if the student 
acknowledged the 
possibility of more limited 
forms of unification.   
Paragraph 3:  Opposing Views 
      Faulty arguments are flung at my opposition to German 
unification.  My critics claim that all Germany has a sense 
of unity after fighting together to defeat France, and thus 
should unify.  Although this is true, everyone must keep in 
mind that this sense of unity was during a time of war 
where we were all seeking to defeat France.  This sense of 
unity will soon fade and be replaced by arguing 
nationalities each fighting for their own good.  The conflict 
 
Dialectical Reasoning =2 
The student provided two 
feasible opposing views.  
Both of them are not 
described in much detail.   
The student also didn?t 
address some significant 
points that would hurt 
 
319 
 
with France is done and soon Germany will not have a 
common interest to fight for.  Other critics say that 
together, Germany could rise and be a great power who 
eventually controls the world.  This is an absolutely absurd 
notion.  Because of the aforementioned nationality conflict, 
Germany will have too much internal conflict to focus on 
international affairs.  And do not forget the troublesome 
question of whether Austria would be able to join a unified 
Germany.  My critics arguments are superficial and need 
serious reevaluation. 
 
his/her argument.  For 
example, the possibility 
that unification could 
stimulate economic growth. 
Paragraph 4:  Final Conclusion 
     Obviously, German unification is a bad idea.  If 
Germany attempts to unify all these different nationalities 
during our dismal economic condition, we will never exit a 
state of war.  The only reason we Germans should unify is 
against the very idea of German unification. 
Quality of Final Position= 1 
The student addressed the 
views of critics in the 
preceding paragraph.  
He/she offers a short 
conclusion that restates the 
economic argument.  It 
does not add (or detract) 
from the persuasiveness of 
the editorial. 
 
 
 
Part II  
     The U.S. is faced with a difficult policy 
decision when it comes to the support of 
nation-states based on cultural identity.  In 
the real world, everyone cannot always be 
happy.  There must be compromise.  
Nobody can always have their way.  I 
believe that the U.S. should respect each 
and every culture?s rights and safety and if 
these basic rights and safety can only be 
obtained by creating a new nation-state, 
then I think the U.S. should support it.  But, 
a far better route for the U.S. to take is 
supporting compromise and peace in an 
existing nation.  Every nation is going to 
encounter problems, even the nations that 
break off because of basic rights and safety.  
Simply amputating a cultural group from a 
mother nation is not going to solve all the 
problems.  So, if the U.S. supports peace 
and compromise in existing nations, people 
will learn to live peacefully with other 
1,1  No concrete examples.  All countries 
experience problems- people will learn to 
live peacefully with each other. 
 
320 
 
people as opposed to speedily and selfishly 
forming a cultural bubble.  Altogether, I 
think the U.S. should support the unity of a 
nation as opposed to many separate nation-
states. 
 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
321 
 
U.S. History Higher Order Editorial Example 1  
 
Part I Scoring Notes 
Paragraph 1:  Introduction. 
          The war with Mexico has been going on for a year 
now, and many people still have differing opinions on the 
matter.  Texas was full of settlers from our country, and the 
Mexican government tried forcing many unfair laws upon 
them.  They fought for their independence and won it, but 
the Mexican government refused to aknowledge their clear 
victory.  When the new, independent Texas tried to join us, 
the Mexican government ignores that and us.  I think we 
are perfectly justified in going to war with Mexico for 
many reasons. 
 
Position = 1 ? The student 
clearly supports Manifest 
Destiny based on the last 
sentence of the editorial. 
 
Context = 2 ? The student 
accurately describes some 
of the events leading to the 
decision-point and does so 
without simply copying the 
timeline.     
 
Paragraph 2:  Supporting Arguments 
      If we win the war with Mexico, the people we liberate 
from their controlling government will only benefit.  We 
didn?t decide to start violently pursuing our Manifest 
Destiny either.  Mexico forced us into it by not 
acknowledging Texas independence and freedom to choose 
who they wish to follow.  The Mexican government is too 
stubborn to see that everyone could benefit from us 
pursuing our Manifest Destiny.  We were trying to buy 
California and New Mexico from them. 
                
Persuasiveness = 3  
?Adequate?.  The editorial 
has a chance of persuading 
the reader.  The student 
provides at least two 
reasons why U.S. actions 
were justified.  However, 
the reasons have little to do 
with Manifest Destiny & 
they are not supported with 
enough evidence to warrant 
a higher score. 
 
Paragraph 3:  Opposing Views 
     Some say that the only reason we are at war with 
Mexico is because we are greedy.  Others think we are 
only at war because we didn?t consider the rights of other 
countries.  We would be gaining much land, and at the 
expense of Mexico, if we win this war.            
 
Dialectical Reasoning = 2 
 The student dedicated a 
paragraph to opposing 
views without responding 
(which was rare).  Two 
points are made in minimal 
detail. 
 
Paragraph 4:  Final Conclusion 
     To those that think to United States is being greedy, 
consider the facts.  We were trying to buy territory from 
Mexico before this war started.  Texas no longer belongs to 
Mexico, so we weren?t violating any of their rights, and 
Texas wanted to become a part of our country.  Pursuing 
Manifest Destiny is something we should do, but only if it 
is done without violating any rights.  Because Texas is not 
a part of Mexico, because Mexico is overly controlling, 
and because it is our destiny, war with Mexico is 
Quality of Final Position= 2  
?Approaching Satisfactory?  
The student did a decent 
job of reiterating points 
made earlier in the 
editorial.  The paragraph as 
a whole added to the 
persuasiveness of the 
editorial, but it did not 
represent advanced 
 
322 
 
completely justified.      dialectical reasoning.  The 
student didn?t appear to 
thoughtfully consider the 
validity of the opposing 
views introduced in 
paragraph three. They were 
basically dismissed out of 
hand. 
 
 
 
Part II  
     I think the U.S.?s mission is to be the ?police.?  We tend 
to take care of people that are having their rights taken 
away.  I?m not sure how this could be accomplished for the 
whole world, but I think that some day, a long time from 
now, it could be accomplished.      
0 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323 
 
U.S. History Higher Order Editorial Example 2 
 
Part I Scoring Notes 
Paragraph 1:  Introduction. 
     In my opinion using the ?saying? Manifest Destiny does 
not justify the war.  In the case that it does, that means that 
anything wrong someone can just say that some how they 
know that God wanted them to do it.  We took the wrong 
approach in getting Texas.  We should have paid for it if 
we wanted it and not killed hundreds of people. 
     The situation between the United States and Mexico 
was that the United States wanted Mexico and Santa Anna 
did not want to give it up.  Some of the important events 
were the four battles that finally won over Texas to the 
United States.  The United States is wrong to use Manifest 
Destiny to go to war with Mexico. 
 
Position = 1 ? Clearly 
stated at the end of the 
introduction. 
 
Context = 1 ? The student 
provides some context, but 
it is not at all clear whether 
he/she really understands 
the events that preceded the 
Mexican-American War.  
This was borderline ?0?. 
 
Paragraph 2:  Supporting Arguments 
     Mexican problems are Mexicos problems.  The United 
States should first worry about the wrong going on in the 
United States.  Just because we want something doesn?t 
mean that we can just take it.   
 
Persuasiveness = 2   
?Minimal? ? The editorial 
is not likely to persuade the 
reader.  The main points 
can essentially be 
summarized as might 
doesn?t make right and 
?we? (the U.S.) can?t 
always have what we want.   
The explanation is not clear 
enough to warrant a higher 
score.   
 
Paragraph 3:  Opposing Views 
     Some people might disagree with me because they think 
that something is going wrong over there, but there are 
things that are going wrong in the United States.  Slavery is 
one of the biggest issues.  If you think about it we are 
doing the same thing as them.  Also that means that any 
country in the world can just come over and decide that 
they want a part of our country and if they are better 
fighters then us then they just win our country?  Its not 
right.       
    
Dialectical Reasoning = 0 
The student doesn?t 
accurately provide an 
opposing view.  However, 
this was one of the few 
editorials to mention 
slavery. 
 
Paragraph 4:  Final Conclusion 
     So there are my decisions to what I think about the 
Manifest Destiny and why I think what I think.   
Quality of Final Position= 0 
The student doesn?t restate 
any arguments.  
 
 
324 
 
 
 
Part II Scoring Notes 
     My opinion on the war is that what those people over 
there do is their business.  America already has so many 
problems without getting in everyone elses.  There are 
millions of kids without homes or food.  Millions of kids 
dropping out of school, loss in jobs, people killing people, 
but we are too busy in everyone elses problems.  America is 
a free country and I think that was our mission and destiny.      
0 = The student gets 
sidetracked in focusing on 
the war (Iraq?).  The last 
sentence ties back into the 
question some, but I really 
couldn?t determine if the 
student saw any 
connections between 
America?s historic sense 
of Manifest Destiny and 
its place in the world 
today.  The overall 
isolationist, America first, 
stance was a common one 
expressed by students in 
this part of the assessment.