SCHOOL LEADERS: CHALLENGING ROLES AND IMPACT ON

likewise, are hiring and compensation policies that reward certain qualifications the equivalent of investing in teacher quality? many principals are themselves unprepared to evaluate the teachers they supervise. a highly effective teacher, therefore, is one whose students show the most gains from one year to the next. number of factors have been found to have strong influences on student learning gains, aside from the teachers to whom their scores would be attached..Henry braun, then of the educational testing service, concluded in his review of vam research:Vam results should not serve as the sole or principal basis for making consequential decisions about teachers. survey data reveal that accountability pressures are associated with higher attrition and reduced morale, especially among teachers in high-need schools. a variety of reasons, analyses of vam results have led researchers to doubt whether the methodology can accurately identify more and less effective teachers. the usefulness of value-added modeling requires the assumption that teachers whose performance is being compared have classrooms with students of similar ability (or that the analyst has been able to control statistically for all the relevant characteristics of students that differ across classrooms). another study found that the achievement gains from having a highly effective teacher could be almost three times as large for african american students as for white students, even when comparing students who start with similar achievement levels (sanders and rivers 1996). school districts often fall short in efforts to improve the performance of less effective teachers, and failing that, of removing them. if test scores subsequently improve, should a specific teacher or the tutoring service be given the credit? as a result, increasing scores on students’ mathematics exams may reflect, in part, greater skill by their teachers in predicting the topics and types of questions, if not necessarily the precise questions, likely to be covered by the exam. in addition to changes in the characteristics of students assigned to teachers, this is also partly due to the small number of students whose scores are relevant for particular teachers. some teachers might be relatively stronger in teaching probability, and others in teaching algebra. of the curriculum to increase time on what is tested is another negative consequence of high-stakes uses of value-added measures for evaluating teachers. thus, a teacher who appears to be very ineffective in one year might have a dramatically different result the following year. the work by hedges, laine, and greenwald demonstrated that the use of more sophisticated meta-analytical techniques to analyze the same set of studies included in hanushek’s review produced far more consistent and compelling findings regarding the effect of educational resources—including variables related to the quality and quantity of teachers—on student achievement. if used for high-stakes purposes, such as individual personnel decisions or merit pay, extensive use of test-based metrics could create disincentives for teachers to take on the neediest students, to collaborate with one another, or even to stay in the profession. within a school, teachers will have incentives to avoid working with such students likely to pull down their teacher effectiveness scores. teacher rewards based on comparative student test results can also create disincentives for teacher collaboration. the various reasons to be skeptical about the use of student test scores to evaluate teachers, along with the many conceptual and practical limitations of empirical value added measures, might suffice by themselves to make one wary of the move to test-based evaluation of teachers, they take on even greater significance in light of the potential for large negative effects of such an approach. academic ability: teachers with stronger academic skills perform better, whether these skills are measured by teachers’ sat or act scores, grade point average or selectivity of the college they attended. the research suggests that investing in teachers can make a difference in student achievement. yet in practice, american public schools generally do a poor job of systematically developing and evaluating teachers. a study of teachers in alabama by ferguson and ladd (1996) found a correlation between a teacher’s higher act scores and higher reading scores for her students..As noted above, even in a more stable community, the number of students in a given teacher’s class is often too small to support reliable conclusions about teacher effectiveness. these and other reasons, even when methods are used to adjust statistically for student demographic factors and school differences, teachers have been found to receive lower “effectiveness” scores when they teach new english learners, special education students, and low-income students than when they teach more affluent and educationally advantaged students. surprisingly, it finds that students’ fifth grade teachers appear to be good predictors of students’ fourth grade test scores..A commonplace objection to a group incentive system is that it permits free riding—teachers who share in rewards without contributing additional effort. single teacher accounts for all of a student’s achievement. for example, fifth-grade math students who had three consecutive highly effective teachers scored between 52 and 54 percentile points ahead of students who had three consecutive teachers who were least effective, even though the math achievement of both groups of students was the same prior to entering second grade (sanders and rivers 1996).{{53 }}another recent study found that students achieve more in mathematics and reading when they attend schools characterized by higher levels of teacher collaboration for school improvement. however, these standards are not generally tested, and teachers evaluated by student scores on standardized tests have little incentive to develop student skills in these areas. indeed, recommendations for reforming the preparation of teachers have become commonplace in reports aimed at improving public education (bush 1987). consistently shows that teacher quality—whether measured by content knowledge, experience, training and credentials, or general intellectual skills—is strongly related to student achievement: simply, skilled teachers produce better student results. sum, teachers’ value-added effects can be compared only where teachers have the same mix of struggling and successful students, something that almost never occurs, or when statistical measures of effectiveness fully adjust for the differing mix of students, something that is exceedingly hard to do. quite often, these approaches incorporate several ways of looking at student learning over time in relation to a teacher’s instruction. this approach, however, limits the usefulness of the results because teachers can then be compared only to other teachers in the same school and not to other teachers throughout the district. does hiring and retaining qualified teachers lead to improvements in student achievement? schools that have adopted pull-out, team teaching, or block scheduling practices will have additional difficulties in isolating individual teacher “effects” for pay or disciplinary purposes. greenwald, hedges, and laine’s (1996) analysis showed an overall positive relationship between a teacher’s verbal ability and student performance. one factor relates to the high proportion of educational dollars devoted to teacher compensation. it is relatively easy for teachers to prepare students for such tests by drilling them in the mechanics of reading, but this behavior does not necessarily make them good readers. a result, standardized annual exams, if usable for high-stakes teacher or school evaluation purposes, typically include no or very few extended-writing or problem-solving items, and therefore do not measure conceptual understanding, communication, scientific investigation, technology and real-world applications, or a host of other critically important skills. some schools expect, and train, teachers of all subjects to integrate reading and writing instruction into their curricula. they use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and artifacts such as lesson plans, assignments, and samples of student work..Nonrandom sorting of students to teachers within schools: a comparable statistical problem arises for teachers within schools, in that teachers’ value-added scores are affected by differences in the types of students who happen to be in their classrooms. one study examining two consecutive years of data showed, for example, that across five large urban districts, among teachers who were ranked in the bottom 20% of effectiveness in the first year, fewer than a third were in that bottom group the next year, and another third moved all the way up to the top 40%. such pressures to narrow the curriculum will certainly increase if sanctions for low test scores are toughened to include the loss of pay or employment for individual teachers. (if teachers are found wanting, administrators should know this before designing staff development programs or renewing teacher contracts for the following school year. initiated in 1990, this system provides extensive data on state achievement tests for all students in grades 2-8 in tennessee and allows for comparisons of teacher effects on students’ learning. if new laws or policies specifically require that teachers be fired if their students’ test scores do not rise by a certain amount, then more teachers might well be terminated than is now the case.. department of education’s national schools and staffing survey (sass) showed that students in high-poverty secondary schools were 77 percent more likely to be taught by teachers without degrees in the subject they were teaching than were their affluent counterparts. for instance, the board on testing and assessment of the national research council of the national academy of sciences stated,…vam estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable.  simply, researchers looked for the change in students’ test scores according to the teacher they were assigned to. the 2004 estimates put the number of teachers who have not yet met the highly qualified standard at 20 percent in elementary schools and 25 percent in secondary schools (u. reform cannot succeed unless it focuses on creating the conditions under which teachers can teach and teach well. once teachers in schools or classrooms with more transient student populations understand that their vam estimates will be based only on the subset of students for whom complete data are available and usable, they will have incentives to spend disproportionately more time with students who have prior-year data or who pass a longevity threshold, and less time with students who arrive mid-year and who may be more in need of individualized instruction.

Problems with the use of student test scores to evaluate teachers

greenwald, hedges, and laine’s (1996) analysis showed an overall positive relationship between a teacher’s verbal ability and student performance. policy makers and administrators would be well served by recognizing the complexity of the issue and adopting multiple measures along many dimensions to support existing teachers and to attract and hire new, highly qualified teachers.• while the studies on the field experience component of teacher education are not designed to reveal causal relationships, they suggest positive effects in terms of opportunity to learn the profession and reduced anxiety among new teachers. once in the classroom, teachers should be evaluated on a regular basis in a fair and systematic way. for instance, almost two decades ago in its call for improved teacher preparation, the national commission on excellence in education (1983) stated that “teacher preparation programs are too heavily weighted with courses in educational methods at the expense of courses in subjects to be taught. impending teacher shortage, estimated at more than two million teachers by 2007 (ingersoll 2003), could exacerbate the inequitable distribution of teacher quality in the coming decades unless policymakers and educational leaders find ways of increasing the supply of skilled teachers and ensuring that the lowest performing students are enrolled in their classes.• pedagogical coursework seems to contribute to teacher effectiveness at all grade levels, particularly when coupled with content knowledge. this may partially be a reflection of the cognitive ability of the teacher. she also found a significant negative association between achievement and the presence of a high proportion of new or uncertified teachers in the school. there was similar movement for teachers who were highly ranked in the first year. others expect that the apparent objectivity of test-based measures of teacher performance will permit the expeditious removal of ineffective teachers from the profession and will encourage less effective teachers to resign if their pay stagnates. it uses a vam to assign effects to teachers after controlling for other factors, but applies the model backwards to see if credible results obtain. of course, it could also be that affluent schools or districts are able to recruit the best teachers. they are now very focused on phonics of the words and the mechanics of the words, even the very bright kids are… teachers feel isolated. began by noting that some advocates of using student test scores for teacher evaluation believe that doing so will make it easier to dismiss ineffective teachers. however, because of the broad agreement by technical experts that student test scores alone are not a sufficiently reliable or valid indicator of teacher effectiveness, any school district that bases a teacher’s dismissal on her students’ test scores is likely to face the prospect of drawn-out and expensive arbitration and/or litigation in which experts will be called to testify, making the district unlikely to prevail. statistical concerns we have described are accompanied by a number of practical problems of evaluating teachers based on student test scores on state tests. a comprehensive analysis by greenwald, hedges, and laine (1996) examined data from 60 studies and found a positive relationship between years of teacher experience and student test scores. such pressures to narrow the curriculum will certainly increase if sanctions for low test scores are toughened to include the loss of pay or employment for individual teachers. and finally, if teachers are so important to student learning, how can we make sure all students receive the benefit of good teachers? value-added methods can support stronger inferences about the influences of schools and programs on student growth than less sophisticated approaches, the research reports cited above have consistently cautioned that the contributions of vam are not sufficient to support high-stakes inferences about individual teachers. these negative effects can result both from the statistical and practical difficulties of evaluating teachers by their students’ test scores. however, questions continue to persist about what exactly a quality teacher is. by using this approach, researchers are able to isolate the effect of the teacher from other factors related to student performance, for example, students’ prior academic record or school they attend. researchers and policy makers agree that teacher quality is a pivotal policy issue in education reform, particularly given the proportion of education dollars devoted to teacher compensation coupled with the evidence that teachers are the most important school-related factor affecting student achievement. and minority students are about twice as likely to have teachers with less than three years of teaching experience (national center for education statistics 2000). similarly, if teachers know they will be evaluated by their students’ scores on a test that predictably asks questions about triangles and rectangles, teachers skilled in preparing students for calculations involving these shapes may fail to devote much time to polygons, an equally important but somewhat more difficult topic in the overall math curriculum. that many dimensions of teacher characteristics matter—preparation in both pedagogic and subject content, credentials, experience, and test scores—the findings from the literature imply that there is no merit in large-scale elimination of all credentialing requirements. contrast, some less conscientious principals may purposefully assign students with the greatest difficulties to teachers who are inexperienced, perhaps to avoid conflict with senior staff who resist such assignments. because certification standards between states differ significantly, several researchers have sought to evaluate the effects of the teacher training that certification indicates. training and credentials: certified teachers are more effective than uncertified, particularly in mathematics. principals typically have too broad a span of control (frequently supervising as many as 30 teachers), and too little time and training to do an adequate job of assessing and supporting teachers. on emergency certificates don’t perform as well as fully certified teachers..These systems for observing teachers’ classroom practice are based on professional teaching standards grounded in research on teaching and learning. one study of teach for america (tfa)1 teachers in houston found that tfa teachers had a positive effect on student achievement scores when compared with other new teachers (raymond, fletcher, and luque 2001). recruiters could assess the rigor of teacher preparation programs by closely examining transcripts and other records that identify and describe the actual courses that teacher candidates have taken in order to be certified.’s instability can result from differences in the characteristics of students assigned to particular teachers in a particular year, from small samples of students (made even less representative in schools serving disadvantaged students by high rates of student mobility), from other influences on student learning both inside and outside school, and from tests that are poorly lined up with the curriculum teachers are expected to cover, or that do not measure the full range of achievement of students in the class. and scores are also needed quickly if test results are to be used for timely teacher evaluation. indeed, it is just as reasonable to expect that “learning begets learning”: students at the top of the distribution could find it easier to make gains, because they have more knowledge and skills they can utilize to acquire additional knowledge and skills and, because they are independent learners, they may be able to learn as easily from less effective teachers as from more effective ones. factor that sets certified teachers apart from other teachers is usually their training in teaching methods and in child and adolescent development, in addition to content knowledge. it uses a vam to assign effects to teachers after controlling for other factors, but applies the model backwards to see if credible results obtain. schools that have adopted pull-out, team teaching, or block scheduling practices will only inaccurately be able to isolate individual teacher “effects” for evaluation, pay, or disciplinary purposes. as in the tennessee findings, jordan, mendro, and weerasinghe (1997) found that the difference between students who had three consecutive highly effective teachers (again defined as those whose students showed the most improvement) and those who had three consecutive low-effect teachers (those with the least improvement) in the dallas schools was 34 percentile points in reading achievement and 49 percentile points in math. reform cannot succeed unless it focuses on creating the conditions under which teachers can teach and teach well. secondary school teachers, all teachers in kindergarten, first, and second grades and some teachers in grades three through eight do not teach courses in which students are subject to external tests of the type needed to evaluate test score gains..Status measures primarily reflect the higher or lower achievement with which students entered a teacher’s classroom at the beginning of the year rather than the contribution of the teacher in the current year. if the quality, coverage, and design of standardized tests were to improve, some concerns would be addressed, but the serious problems of attribution and nonrandom assignment of students, as well as the practical problems described above, would still argue for serious limits on the use of test scores for teacher evaluation. teachers’ value-added scores and rankings are most unstable at the upper and lower ends of the scale, where they are most likely to be used to allocate performance pay or to dismiss teachers believed to be ineffective. due process requirements in state law and union contracts are sometimes so cumbersome that terminating ineffective teachers can be quite difficult, except in the most extreme cases. policy makers, acknowledging the inability fairly to identify effective or ineffective teachers by their students’ test scores, have suggested that low test scores (or value-added estimates) should be a “trigger” that invites further investigation.” that’s not happening… teachers are as frustrated as i’ve ever seen them. and such response to incentives is not unprecedented: an unintended incentive created by nclb caused many schools and teachers to focus greater effort on children whose test scores were just below proficiency cutoffs and whose small improvements would have great consequences for describing a school’s progress, while paying less attention to children who were either far above or far below those cutoffs. an invalid teacher evaluation system and tying it to rewards and sanctions is likely to lead to inaccurate personnel decisions and to demoralize teachers, causing talented teachers to avoid high-needs students and schools, or to leave the profession entirely, and discouraging potentially effective teachers from entering it. by competent supervisors and peers, employing such approaches, should form the foundation of teacher evaluation systems, with a supplemental role played by multiple measures of student learning gains that, where appropriate, could include test scores. greater clarity on the empirical evidence can inform the wisdom of current practice, guide state efforts as they struggle with no child left behind compliance regarding teacher quality, and provide direction for future teacher policy decisions. it implies there are only two options for evaluating teachers—the ineffectual current system or the deeply flawed test-based system. these studies not only provide insight into the characteristics of good teachers, they reveal how these contribute to student learning and closing achievement gaps. third reason for skepticism is that in practice, and especially in the current tight fiscal environment, performance rewards are likely to come mostly from the redistribution of already-appropriated teacher compensation funds, and thus are not likely to be accompanied by a significant increase in average teacher salaries (unless public funds are supplemented by substantial new money from foundations, as is currently the situation in washington, d.

Approaches to evaluating teacher effectiveness: A research synthesis

but there is not strong evidence to indicate either that the departing teachers would actually be the weakest teachers, or that the departing teachers would be replaced by more effective ones. nor is there empirical verification for the claim that teachers will improve student learning if teachers are evaluated based on test score gains or are monetarily rewarded for raising scores. most of the research does not seek to capture interactions among the multiple dimensions of teacher quality, and as a result, there are major gaps in the research that still need to be explored. there are many pitfalls to making causal attributions of teacher effectiveness on the basis of the kinds of data available from typical school districts. one recent study, economists found that peer learning among small groups of teachers was the most powerful predictor of improved student achievement over time. such instability from year to year renders single year estimates unsuitable for high-stakes decisions about teachers, and is likely to erode confidence both among teachers and among the public in the validity of the approach. in addition, if probability is tested only in eighth grade, a student’s success may be attributed to the eighth grade teacher even if it is largely a function of instruction received from his seventh grade teacher. positive effects associated with being taught by a highly effective teacher, defined as a teacher whose average student score gain is in the top 25 percent, were stronger for poor and minority students than for their white and affluent counterparts. poor and minority students are much less likely to get well-qualified teachers than students who are better off. among these, teachers stand out as a key to realizing the high standards that are increasingly emphasized in schools and school systems across the country..Henry braun, then of the educational testing service, concluded in his review of vam research:Vam results should not serve as the sole or principal basis for making consequential decisions about teachers. these and other problems can undermine teacher morale, as well as provide disincentives for teachers to take on the neediest students. captured by relative value-added metrics, and the use of vam to evaluate such teachers could exacerbate disincentives to teach students with high levels of need. that is, students who were enrolled in a succession of classes taught by effective teachers demonstrated greater learning gains than did students who had the least effective teachers one after another. the difficulty arises largely because of the nonrandom sorting of teachers to students across schools, as well as the nonrandom sorting of students to teachers within schools. these and other reasons, even when methods are used to adjust statistically for student demographic factors and school differences, teachers have been found to receive lower “effectiveness” scores when they teach new english learners, special education students, and low-income students than when they teach more affluent and educationally advantaged students. in others, comprehensive systems have been developed for examining teacher performance in concert with evidence about outcomes for purposes of personnel decision making and compensation. training and credentials: certified teachers are more effective than uncertified, particularly in mathematics. these include the influences of students’ other teachers—both previous teachers and, in secondary schools, current teachers of other subjects—as well as tutors or instructional specialists, who have been found often to have very large influences on achievement gains. some believe that the prospect of higher pay for better performance will attract more effective teachers to the profession and that a flexible pay scale, based in part on test-based measures of effectiveness, will reduce the attrition of more qualified teachers whose commitment to teaching will be strengthened by the prospect of greater financial rewards for success. there are good reasons for concern about the current system of teacher evaluation, there are also good reasons to be concerned about claims that measuring teachers’ effectiveness largely by student test scores will lead to improved student achievement. teacher quality provisions of the no child left behind act (nclb) underscore the importance of these premises. surveys have found that teacher attrition and demoralization have been associated with test-based accountability efforts, particularly in high-need schools. if the quality, coverage, and design of standardized tests were to improve, some concerns would be addressed, but the serious problems of attribution and nonrandom assignment of students, as well as the practical problems described above, would still argue for serious limits on the use of test scores for teacher evaluation. example, if teachers perceive the system to be generating incorrect or arbitrary evaluations, perhaps because the evaluation of a specific teacher varies widely from year to year for no explicable reason, teachers could well be demoralized, with adverse effects on their teaching and increased desire to leave the profession. survey data reveal that accountability pressures are associated with higher attrition and reduced morale, especially among teachers in high-need schools. that are predominantly poor or minority were considerably more likely to employ uncertified teachers (darling-hammond 1999). schools and their communities have always sought out the best teachers they could get in the belief that their students’ success depends on it. except at the very bottom of the teacher quality distribution where test-based evaluation could result in termination, individual incentives will have little impact on teachers who are aware they are less effective (and who therefore expect they will have little chance of getting a bonus) or teachers who are aware they are stronger (and who therefore expect to get a bonus without additional effort)..The tests most likely to be used in any test-based teacher evaluation program are those that are currently required under nclb, or that will be required under its reauthorized version. factor that sets certified teachers apart from other teachers is usually their training in teaching methods and in child and adolescent development, in addition to content knowledge. in some cases, students may be pulled out of classes for special programs or instruction, thereby altering the influence of classroom teachers. but there is not strong evidence to indicate either that the departing teachers would actually be the weakest teachers, or that the departing teachers would be replaced by more effective ones. using student achievement test scores as evidence of external validity for indicators of teacher quality: connecticut’s beginning educator support and training program. this information could prompt k–16  discussions between districts and institutions of higher education regarding ways to ensure that teacher preparation programs explicitly address the districts’ needs. value-added modeling of teacher effectiveness: an exploration of stability across models and contexts., nonrandom assignment of students to teachers can be a function of either good or bad educational policy. we noted above that an individual incentive system that rewards teachers for their students’ mathematics and reading scores can result in narrowing the curriculum, both by reducing attention paid to non-tested curricular areas, and by focusing attention on the specific math and reading topics and skills most likely to be tested. are several studies providing evidence that the students of certified teachers perform better than students of uncertified teachers. the large magnitude of these error rates, the mathematica researchers are careful to point out that the resulting misclassification of teachers that would emerge from value-added models is still most likely understated because their analysis focuses on imprecision error alone. however, considerable disagreement surrounds what specific teacher attributes indicate quality and how to better invest resources to provide quality teachers for all students. by comparing the achievement of similar students within the same schools but assigned to different teachers, researchers were able to isolate the effects of the teacher on student achievement.  similarly, the utd texas schools project data showed that students of experienced teachers attained significantly higher levels of achievement than did students of new teachers (those with one to three years of experience) (rivkin, hanushek, and kain 2005). mobility is a much greater problem for poor and minority students; teachers are much more likely to move from urban to suburban schools than vice versa (hanushek, kain, and rivkin 2004). effective teachers should be retained, and those with remediable shortcomings should be guided and trained further. surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. to the extent that this policy results in the incorrect categorization of particular teachers, it can harm teacher morale and fail in its goal of changing behavior in desired directions. yet even when the incentives were substantial, teachers have not always been willing to go to or to stay in difficult schools. rather, they are directly relevant to policy makers and to the desirability of efforts to evaluate teachers by their students’ scores. it is notable that findings for these characteristics frequently differ for teachers at the elementary school level and teachers at the high school level and that the body of research on the subject of teacher quality suggests that the context of teaching matters (e. for example, students who do well in fourth grade may tend to be assigned to one fifth grade teacher while those who do poorly are assigned to another. however commonplace it might be under current systems for teachers to respond rationally to incentives by artificially inflating end-of-year scores by drill, test preparation activities, or teaching to the test, it would be so much easier for teachers to inflate their value-added ratings by discouraging students’ high performance on a september test, if only by not making the same extraordinary efforts to boost scores in the fall that they make in the spring. this is a fundamental question that must precede policy discussions concerning what kinds of teacher qualities and qualifications to promote in aspiring teachers, whom to recruit and hire, what factors to use in setting salary schedules, and how to distribute teachers across different types of schools and classrooms to achieve equity and adequacy goals. teachers with more than five years in the classroom seem to be the most effective. concerns about statistical methodology, other practical and policy considerations weigh against heavy reliance on student test scores to evaluate teachers. the bush administration’s proposal, which specifies what defines a “highly qualified” teacher, is based on the premise that teacher excellence is vital to realizing improved student achievement. and districts can also explore value-added methods for monitoring teacher effectiveness, such as those used in texas, north carolina and other states. the most frequently proposed solution to this problem is to limit vam to teachers who have been teaching for many years, so their performance can be estimated using multiple years of data, and so that instability in vam measures over time can be averaged out.

Teacher quality and student achievement: Research review

recruiters could assess the rigor of teacher preparation programs by closely examining transcripts and other records that identify and describe the actual courses that teacher candidates have taken in order to be certified., studies have sought to evaluate the effects of teacher training by comparing teachers who take alternative routes to teaching with those who complete a traditional teacher preparation program. in general, teachers with emergency certificates don’t perform as well as those with traditional certification. these and other problems can undermine teacher morale, as well as provide disincentives for teachers to take on the neediest students. better schools are collaborative institutions where teachers work across classroom and grade-level boundaries toward the common goal of educating all children to their maximum potential. for example, a teacher in a school with exceptionally talented teachers may not appear to add as much value to her students as others in the school, but if compared to all the teachers in the district, she might fall well above average. center for public education will continue to monitor state and district efforts to provide each child with a highly qualified, effective teacher.  similarly, the utd texas schools project data showed that students of experienced teachers attained significantly higher levels of achievement than did students of new teachers (those with one to three years of experience) (rivkin, hanushek, and kain 2005). although some reasoning and other advanced skills can be tested with multiple-choice questions, most cannot be, so teachers who are evaluated by students’ scores on multiple-choice exams have incentives to teach only lower level, procedural skills that can easily be tested. what is now necessary is a comprehensive system that gives teachers the guidance and feedback, supportive leadership, and working conditions to improve their performance, and that permits schools to remove persistently ineffective teachers without distorting the entire instructional program by imposing a flawed system of standardized quantification of teacher quality. a school will be more effective if its teachers are more knowledgeable about all students and can coordinate efforts to meet students’ needs.: trying to define teacher qualityteacher quality and student achievement: at a glancedoes highly qualified mean highly effective? even if state data systems permit tracking of students who change schools, measured growth for these students will be distorted, and attributing their progress (or lack of progress) to different schools and teachers will be problematic. most of the effective teacher studies, for example, have focused on elementary school. because of the inability of value-added methods to fully account for the differences in student characteristics and in school supports, as well as the effects of summer learning loss, teachers who teach students with the greatest educational needs will appear to be less effective than they are. 67) likewise conclude that “student characteristics are likely to confound estimated teacher effects when schools serve distinctly different populations. students in high-minority schools were 40 percent more likely to be taught by out-of-field teachers. many researchers and analysts argue that the fact that poor and minority students are the least likely to have qualified teachers is itself a major contributor to the achievement gap. statistical concerns we have described are accompanied by a number of practical problems of evaluating teachers based on student test scores on state tests..In a letter to the department of education, commenting on the department’s proposal to use student achievement to evaluate teachers, the board on testing and assessment of the national research council of the national academy of sciences wrote:…vam estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable. districts seeking to remove ineffective teachers must invest the time and resources in a comprehensive approach to evaluation that incorporates concrete steps for the improvement of teacher performance based on professional standards of instructional practice, and unambiguous evidence for dismissal, if improvements do not occur. some states are now considering plans that would give as much as 50% of the weight in teacher evaluation and compensation decisions to scores on existing tests of basic skills in math and reading. researchers studying year-to-year fluctuations in teacher and school averages have also noted sources of variation that affect the entire group of students, especially the effects of particularly cooperative or particularly disruptive class members. thus, teachers working in affluent suburban districts would almost always look more effective than teachers in urban districts if the achievement scores of their students were interpreted directly as a measure of effectiveness. but there is no current evidence to indicate either that the departing teachers would actually be the weakest teachers, or that the departing teachers would be replaced by more effective ones. for example, low-performing schools often have weak organizational supports for teachers. quite often, these approaches incorporate several ways of looking at student learning over time in relation to the teacher’s instruction. (1996) found that over 41% of the total expenditures in this district were devoted to the salaries and benefits of instructional teachers. policy makers persist in attempting to use vam to evaluate teachers serving highly mobile student populations, perverse consequences can result.. some policy makers seek to minimize these realities by citing teachers or schools who achieve exceptional results with disadvantaged students., psychometricians, and economists who have studied the use of test scores for high-stakes teacher evaluation, including its most sophisticated form, value-added modeling (vam), mostly concur that such use should be pursued only with great caution. the measurement of average achievement for all of a school’s students is, though still not perfectly reliable, more stable than measurement of achievement of students attributable to a specific teacher. was the first data-tracking system in the country to measure individual teacher performance according to annual gains in student test scores. sorting of teachers to students across schools: some schools and districts have students who are more socioeconomically disadvantaged than others. surprisingly, it finds that students’ fifth grade teachers appear to be good predictors of students’ fourth grade test scores. teacher evaluation primarily on student test scores does not accurately distinguish more from less effective teachers because even relatively sophisticated approaches cannot adequately address the full range of statistical problems that arise in estimating a teacher’s effectiveness.-added approaches are a clear improvement over status test-score comparisons (that simply compare the average student scores of one teacher to the average student scores of another); over change measures (that simply compare the average student scores of a teacher in one year to her average student scores in the previous year); and over growth measures (that simply compare the average student scores of a teacher in one year to the same students’ scores when they were in an earlier grade the previous year). for now, suffice it to say that teachers who teach large numbers of low-income students will be noticeably disadvantaged in spring-to-spring test gain analyses, because their students will start the fall further behind than more affluent students who were scoring at the same level in the previous spring. although such survey data are limited, anecdotes abound regarding the demoralization of apparently dedicated and talented teachers, as test-based accountability intensifies.., vary widely) across time, across the classes that teachers teach, and across tests that are used to evaluate instruction, to be used for the high-stakes purposes of evaluating teachers. policy makers persist in attempting to use vam to evaluate teachers serving highly mobile student populations, perverse consequences can result. growth measures implicitly assume, without justification, that students who begin at different achievement levels should be expected to gain at the same rate, and that all gains are due solely to the individual teacher to whom student scores are attached; growth measures do not control for students’ socioeconomic advantages or disadvantages that may affect not only their initial levels but their learning rates. to be sure, if new laws or district policies specifically require that teachers be fired if their students’ test scores do not rise by a certain amount or reach a certain threshold, then more teachers might well be terminated than is now the case.” that’s not happening… teachers are as frustrated as i’ve ever seen them. test scores to evaluate teachers unfairly disadvantages teachers of the neediest students.. and new york: economic policy institute and teachers college press. there are many reasons for concern about the current system of teacher evaluation, there are also reasons to be skeptical of claims that measuring teachers’ effectiveness by student test scores will lead to the desired outcomes. is research that has shown that students of teachers who have greater academic ability—be it measured through sat or act scores, gpa, iq, tests of verbal ability, or selectivity of the college attended—perform better. error also renders the estimates of teacher quality that emerge from value-added models highly unstable. that is, students who were enrolled in a succession of classes taught by effective teachers demonstrated greater learning gains than did students who had the least effective teachers one after another.. rivkin, hanushek, and kain (1998) identify teachers as a major determinant of student performance, but do not describe teacher quality in terms of specific qualifications and characteristics. following teacher qualities are related to higher student achievement are:Content knowledge: effective teachers have a solid background in the subject area they teach as measured by a college major or minor in the field. they are:What teachers know and can do is the most important influence on what students learn.-added approaches are a clear improvement over status test-score comparisons (that simply compare the average student scores of one teacher to the average student scores of another); over change measures (that simply compare the average student scores of a teacher in one year to her average student scores in the previous year); and over growth measures (that simply compare the average student scores of a teacher in one year to the same students’ scores when they were in an earlier grade the previous year). to reduce it to 12% would require 10 years of data for each teacher. the failure of policy makers to address some of the validity issues, such as those associated with the nonrandom sorting of students across schools, discussed above, would lead to even greater misclassification of teachers. some districts have found ways to identify, improve, and as necessary, dismiss teachers using strategies like peer assistance and evaluation that offer intensive mentoring and review panels. some districts, peer assistance and review programs—using standards-based evaluations that incorporate evidence of student learning, supported by expert teachers who can offer intensive assistance, and panels of administrators and teachers that oversee personnel decisions—have been successful in coaching teachers, identifying teachers for intervention, providing them assistance, and efficiently counseling out those who do not improve.

A Proposal for Measuring and Recognizing Teacher Effectiveness

as a result, reliance on student test scores for evaluating teachers is likely to misidentify many teachers as either poor or successful., the enhancement of teacher quality is likely to be quite costly. to the extent that teachers are given incentives to pursue individual monetary rewards by posting greater test score gains than their peers, teachers may also have incentives to cease collaborating. > staffing and students > teacher quality and student achievement > teacher quality and student achievement: research review. a study that examined the math achievement of elementary students also found that students taught by new, uncertified teachers did significantly worse on achievement tests than did those taught by new, certified teachers (laczko-kerr and berliner 2002). a comprehensive analysis by greenwald, hedges, and laine (1996) examined data from 60 studies and found a positive relationship between years of teacher experience and student test scores. fifth grade teachers being evaluated by their students’ test scores might have a greater interest in pressing fourth grade teachers to better prepare their students for fifth grade. a teacher who appears to be very effective (or ineffective) in one year might have a dramatically different result the following year, runs counter to most people’s notions that the true quality of a teacher is likely to change very little over time. vam estimates have proven to be unstable across statistical models, years, and classes that teachers teach. it is important to note that many personal characteristics important for a good teacher are not measured in the studies reviewed. distribution of teachers with these qualities has grown more inequitable in recent years. moreover, teacher compensation represents a significant public investment: in 2002 alone, the united states invested 2 billion in teacher pay and benefits. louis and another from a los angeles teacher:No child left behind has completely destroyed everything i ever worked for… we now have an enforced 90-minute reading block. the concerns raised by researchers are the prospects that value-added methods can misidentify both successful and unsuccessful teachers and, because of their instability and failure to disentangle other influences on learning, can create confusion about the relative sources of influence on student achievement. using student achievement test scores as evidence of external validity for indicators of teacher quality: connecticut’s beginning educator support and training program. of the curriculum to increase time on what is tested is another negative consequence of high-stakes uses of value-added measures for evaluating teachers. a number of researchers have argued that teacher quality is a powerful predictor of student performance. that are predominantly poor or minority were considerably more likely to employ uncertified teachers (darling-hammond 1999). teachers’ value-added scores and rankings are most unstable at the upper and lower ends of the scale, where they are most likely to be used to allocate performance pay or to dismiss teachers believed to be ineffective. these techniques measure the gains that students make and then compare these gains to those of students whose measured background characteristics and initial test scores were similar, concluding that those who made greater gains must have had more effective teachers. for example, with vam, the essay-writing a student learns from his history teacher may be credited to his english teacher, even if the english teacher assigns no writing; the mathematics a student learns in her physics class may be credited to her math teacher. example, if teachers perceive the system to be generating incorrect or arbitrary evaluations, perhaps because the evaluation of a specific teacher varies widely from year to year for no explicable reason, teachers could well be demoralized, with adverse effects on their teaching and increased desire to leave the profession. second reason to be wary of evaluating teachers by their students’ test scores is that so much of the promotion of such approaches is based on a faulty analogy—the notion that this is how the private sector evaluates professional employees.• studies show little clear impact of emergency or alternative-route certification on student performance in either mathematics or science, as compared to teachers who acquire standard certification. effective teachers should be retained, and those with remediable shortcomings should be guided and trained further., preparing, and retaining good teachers is the central strategy for improving our schools., for elementary (and some middle-school) teachers who are responsible for all (or most) curricular areas, evaluation by student test scores creates incentives to diminish instruction in history, the sciences, the arts, music, foreign language, health and physical education, civics, ethics and character, all of which we expect children to learn. teacher evaluation and sanctions to test score results can discourage teachers from wanting to work in schools with the neediest students, while the large, unpredictable variation in the results and their perceived unfairness can undermine teacher morale. of tvaas and star data indicated that teachers had a substantial effect on student achievement. effect is stronger for poor and/or minority students than for their more affluent and/or white peers, although all groups benefit from effective teachers. however, opinions conflict about the effectiveness of teach for america (tfa) teachers, who enter classrooms with alternate certificates. is the case in every profession that requires complex practice and judgments, precision and perfection in the evaluation of teachers will never be possible. some states are now considering plans that would give as much as 50% of the weight in teacher evaluation and compensation decisions to scores on existing poor-quality tests of basic skills in math and reading. in addition, the conflicting findings on the effectiveness of alternate route teachers need to be resolved, especially since many districts rely on such non-traditional candidates to deal with teacher shortages. researchers have found that teachers’ effectiveness ratings differ from class to class, from year to year, and from test to test, even when these are within the same content area..Another study confirmed that big changes from one year to the next are quite likely, with year-to-year correlations of estimated teacher quality ranging from only 0. “teaching students and teaching each other: the importance of peer learning for teachers. with caution, value-added modeling can add useful information to comprehensive analyses of student progress and can help support stronger inferences about the influences of teachers, schools, and programs on student growth. teacher quality in educational production: tracking, decay, and student achievement..If these anecdotes reflect the feelings of good teachers, then analysis of student test scores may distinguish teachers who are more able to raise test scores, but encourage teachers who are truly more effective to leave the profession. by comparing the achievement of similar students within the same schools but assigned to different teachers, researchers were able to isolate the effects of the teacher on student achievement. to the extent that teachers are given incentives to pursue individual monetary rewards by posting greater test score gains than their peers, teachers may also have incentives to cease collaborating. third reason for skepticism is that in practice, and especially in the current tight fiscal environment, performance rewards are likely to come mostly from the redistribution of already-appropriated teacher compensation funds, and thus are not likely to be accompanied by a significant increase in average teacher salaries (unless public funds are supplemented by substantial new money from foundations, as is currently the situation in washington, d. of teacher training and licensing procedures stem largely from a belief that the requirements for certification do not encompass all the characteristics that should be sought in teachers and thus should be reformed to require more content knowledge and displays of teaching competency (walsh and snyder 2004). is simply no shortcut to the identification and removal of ineffective teachers. second reason to be wary of evaluating teachers by their students’ test scores is that so much of the promotion of such approaches is based on a faulty analogy—the notion that this is how the private sector evaluates professional employees. distribution of teachers with these qualities has grown more inequitable in recent years. policy makers, acknowledging the inability fairly to identify effective or ineffective teachers by their students’ test scores, have suggested that low test scores (or value-added estimates) should be a “trigger” that invites further investigation. advocates of evaluating teachers by students’ fall-to-spring growth have not explained how, within reasonable budgetary constraints, all spring testing can be moved close to the end of the school year. louis and another from a los angeles teacher:No child left behind has completely destroyed everything i ever worked for… we now have an enforced 90-minute reading block. and districts can also explore value-added methods for monitoring teacher effectiveness, such as those used in texas, north carolina and other states.’s instability can result from differences in the characteristics of students assigned to particular teachers in a particular year, from small samples of students (made even less representative in schools serving disadvantaged students by high rates of student mobility), from other influences on student learning both inside and outside school, and from tests that are poorly lined up with the curriculum teachers are expected to cover, or that do not measure the full range of achievement of students in the class. the concerns raised by researchers are the prospects that value-added methods can misidentify both successful and unsuccessful teachers and, because of their instability and failure to disentangle other influences on learning, can create confusion about the relative sources of influence on student achievement.  in the following sections, we review research findings on teacher characteristics that are commonly recognized measures of quality: content knowledge, teaching experience, training and credentials, and overall academic ability. within a school, teachers will have incentives to avoid working with such students likely to pull down their teacher effectiveness scores. was the first data-tracking system in the country to measure individual teacher performance according to annual gains in student test scores. highlights of the empirical evidence include:• several studies have found a positive effect of experience on teacher effectiveness; specifically, the “learning by doing” effect is most obvious in the early years of teaching. most compelling evidence for the importance of teaching came initially from economists who adapted value-added models from business to measure the effect of teachers on student learning.

How Leadership Influences Student Learning

guthrie and rothstein (1998) assert that teacher salaries account for at least 50% of typical school district expenditures. there is no way, however, to adjust statistically for a teacher’s ability to pressure other instructors in estimating the teacher’s effectiveness in raising her own students’ test scores. thus, teachers working in affluent suburban districts would almost always look more effective than teachers in urban districts if the achievement scores of their students were interpreted directly as a measure of effectiveness. teacher quality provisions of the no child left behind act (nclb) underscore the importance of these premises. for that to happen, school systems should recruit, prepare, and retain teachers who are qualified to do the job.. in contrast to many of the policy recommendations for stricter teacher qualifications, the abell foundation has recently released a report calling for the elimination of statewide coursework and certification requirements for teachers in favor of more flexible professional requirements (abell foundation 2001). an analysis that synthesized findings from a group of studies showed that teachers with pedagogical training performed better than those who entered teaching without such training (greenwald, hedges, and laine 1996). to raise student test scores, to the exclusion of other important goals, can demoralize good teachers and, in some cases, provoke them to leave the profession entirely. this research typically uses teachers’ college degree to represent content knowledge. teacher quality in educational production: tracking, decay, and student achievement. many researchers and analysts argue that the fact that poor and minority students are the least likely to have qualified teachers is itself a major contributor to the achievement gap. it implies there are only two options for evaluating teachers—the ineffectual current system or the deeply flawed test-based system. the focus is on aspects of teacher background that can be translated into policy recommendations and incorporated into teaching practice. 67) likewise conclude that “student characteristics are likely to confound estimated teacher effects when schools serve distinctly different populations. in any event, teacher effectiveness measures continue to be highly unstable, whether or not they are estimated using school fixed effects..Because of the range of influences on student learning, many studies have confirmed that estimates of teacher effectiveness are highly unstable. the evidence indicates that neither an extreme centralized bureaucratization nor a complete deregulation of teacher requirements is a wise approach for improving teacher quality.. department of education’s national schools and staffing survey (sass) showed that students in high-poverty secondary schools were 77 percent more likely to be taught by teachers without degrees in the subject they were teaching than were their affluent counterparts. schools are collaborative institutions where teachers work across classroom and grade-level boundaries towards the common goal of educating all children to their maximum potential. the challenge for districts is to ensure that every classroom is staffed by a skilled, qualified teacher..If these anecdotes reflect the feelings of good teachers, then analysis of student test scores may distinguish teachers who are more able to raise test scores, but encourage teachers who are truly more effective to leave the profession. some advocates argue that admittedly flawed value-added measures are preferred to existing cumbersome measures for identifying, remediating, or dismissing ineffective teachers, this argument creates a false dichotomy., there is broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed. newly hired teachers, districts can establish and maintain intensive, long-term induction programs that focus on helping new teachers meet challenging professional performance standards. however, the literature on teacher quality and qualifications has typically been viewed as inconsistent and inconclusive. in highly mobile communities, if two years of data are unavailable for many students, or if teachers are not to be held accountable for students who have been present for less than the full year, the sample is even smaller than the already small samples for a single typical teacher, and the problem of misestimation is exacerbated. nor is there empirical verification for the claim that teachers will improve student learning if teachers are evaluated based on test score gains or are monetarily rewarded for raising scores. a recent new york times report, for example, described how teachers prepare students for state high school history exams:As at many schools…teachers and administrators …prepare students for the tests. number of factors have been found to have strong influences on student learning gains, aside from the teachers to whom their scores would be attached. these results mean is that incentives to work in hard-to-staff schools should also take into account the working conditions they provide for teachers. nclb has used student test scores to evaluate schools, with clear negative sanctions for schools (and, sometimes, their teachers) whose students fail to meet expected performance standards. the various reasons to be skeptical about the use of student test scores to evaluate teachers, along with the many conceptual and practical limitations of empirical value added measures, might suffice by themselves to make one wary of the move to test-based evaluation of teachers, they take on even greater significance in light of the potential for large negative effects of such an approach. evaluators may find it useful to take student test score information into account in their evaluations of teachers, provided such information is embedded in a more comprehensive approach. one study has identified a teacher quality “tipping point” when the proportion of underqualified teachers is about 20 percent of the total school faculty. collaborative work among teachers with different levels and areas of skill and different types of experience can capitalize on the strengths of some, compensate for the weaknesses of others, increase shared knowledge and skill, and thus increase their school’s overall professional capacity. education is both a cumulative and a complex process, it is impossible fully to distinguish the influences of students’ other teachers as well as school conditions on their apparent learning, let alone their out-of-school learning experiences at home, with peers, at museums and libraries, in summer programs, on-line, and in the community. the study found that the students taught by certified teachers scored better on the state math achievement test. recent study documents the consequences of students (in this case, apparently purposefully) not being randomly assigned to teachers within a school. one study examining two consecutive years of data showed, for example, that across five large urban districts, among teachers who were ranked in the bottom 20% of effectiveness in the first year, fewer than a third were in that bottom group the next year, and another third moved all the way up to the top 40%. study designed to test this question used vam methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. in addition, the conflicting findings on the effectiveness of alternate route teachers need to be resolved, especially since many districts rely on such non-traditional candidates to deal with teacher shortages. most compelling evidence for the importance of teaching came initially from economists who adapted value-added models from business to measure the effect of teachers on student learning., an emphasis on test results for individual teachers exacerbates the well-documented incentives for teachers to focus on narrow test-taking skills, repetitive drill, and other undesirable instructional practices. name as (required):Comments (max 2000 characters):Home > staffing and students > teacher quality and student achievement > teacher quality and student achievement: research review. but they are silent on the question of what characterizes an “effective teacher..A commonplace objection to a group incentive system is that it permits free riding—teachers who share in rewards without contributing additional effort. however, these standards are not generally tested, and teachers evaluated by student scores on standardized tests have little incentive to develop student skills in these areas.  while the tennessee data from star showed achievement gains associated with smaller class sizes, a stronger achievement gain is associated with teacher quality (nye, konstantopoulos and hedges 2004). research base is currently insufficient to support the use of vam for high-stakes decisions about individual teachers or schools.  while the tennessee data from star showed achievement gains associated with smaller class sizes, a stronger achievement gain is associated with teacher quality (nye, konstantopoulos and hedges 2004).. department of education, concludes that the errors are sufficiently large to lead to the misclassification of many teachers. vam methods have also contributed to stronger analyses of school progress, program influences, and the validity of evaluation methods than were previously possible. teachers who teach a greater share of lower-income students are disadvantaged by summer learning loss in estimates of their effectiveness that are calculated in terms of gains in their students’ test scores from the previous year. resource-intensive nature of teachers coupled with the empirical evidence documenting the critical role of teacher quality in realizing student achievement implies that teacher policy is a promising avenue toward better realizing goals of efficiency, equity, and adequacy in public education. the single largest category of educational spending is devoted to the purchase of teacher time. contrast, some less conscientious principals may purposefully assign students with the greatest difficulties to teachers who are inexperienced, perhaps to avoid conflict with senior staff who resist such assignments. what is now necessary is a comprehensive system that gives teachers the guidance and feedback, supportive leadership, and working conditions to improve their performance, and that permits schools to remove persistently ineffective teachers without distorting the entire instructional program by imposing a flawed system of standardized quantification of teacher quality. the nonrandom assignment of students to classrooms and schools—and the wide variation in students’ experiences at home and at school—mean that teachers cannot be accurately judged against one another by their students’ test scores, even when efforts are made to control for student characteristics in statistical models.

The Power of an Effective Teacher and Why We Should Assess It

of how it’s measured, teacher quality is not distributed equitably across schools and districts. these challenges arise because of the influence of student socioeconomic advantage or disadvantage on learning, measurement error and instability, the nonrandom sorting of teachers across schools and of students to teachers in classrooms within schools, and the difficulty of disentangling the contributions of multiple teachers over time to students’ learning..To enhance productive collaboration among all of a school’s staff for the purpose of raising overall student scores, group (school-wide) incentives are preferred to incentives that attempt to distinguish among teachers.• teacher coursework in both the subject area taught and pedagogy contributes to positive education outcomes.. for example, studies have found the effects of one-on-one or small group tutoring, generally conducted in pull-out sessions or after school by someone other than the classroom teacher, can be quite substantial..30 this means that only about 4% to 16% of the variation in a teacher’s value-added ranking in one year can be predicted from his or her rating in the previous year. central to nclb’s goal of closing the achievement gap by 2014 is the requirement that all teachers be highly qualified by the end of the 2005-06 school year. are a number of actions to take:Districts can step up recruitment efforts to hire teacher candidates who have strong academic credentials and who have completed a rigorous teacher preparation program.. some argue that the qualifications identified in the nclb legislation are more reflective of a “minimally qualified teacher” than a “highly qualified teacher. the difficulty arises largely because of the nonrandom sorting of teachers to students across schools, as well as the nonrandom sorting of students to teachers within schools.• research suggests that the selectivity/prestige of the institution a teacher attended has a positive effect on student achievement, particularly at the secondary level. findings provide little support for the view that test-based incentives for schools or individual teachers are likely to improve achievement, or for the expectation that such incentives for individual teachers will suffice to produce gains in student learning. any sound evaluation will necessarily involve a balancing of many factors that provide a more accurate view of what teachers in fact do in the classroom and how that contributes to student learning. also, principals often attempt to make assignments that match students’ particular learning needs to the instructional strengths of individual teachers. there are many pitfalls to making causal attributions of teacher effectiveness on the basis of the kinds of data available from typical school districts. review of vam research from the educational testing service’s policy information center concluded,Vam results should not serve as the sole or principal basis for making consequential decisions about teachers., for elementary (and some middle-school) teachers who are responsible for all (or most) curricular areas, evaluation by student test scores creates incentives to diminish instruction in history, the sciences, the arts, music, foreign language, health and physical education, civics, ethics and character, all of which we expect children to learn. those who evaluate teachers could take student test scores over time into account, they should be fully aware of their limitations, and such scores should be only one element among many considered in teacher profiles. in any event, teacher effectiveness measures continue to be highly unstable, whether or not they are estimated using school fixed effects. captured by relative value-added metrics, and the use of vam to evaluate such teachers could exacerbate disincentives to teach students with high levels of need. error also renders the estimates of teacher quality that emerge from value-added models highly unstable.. denver’s pro-comp system, arizona’s career ladder, and the teacher advancement program are illustrative. some districts, peer assistance and review programs—using standards-based evaluations that incorporate evidence of student learning, supported by expert teachers who can offer intensive assistance, and panels of administrators and teachers that oversee personnel decisions—have been successful in coaching teachers, identifying teachers for intervention, providing them assistance, and efficiently counseling out those who do not improve.• evidence suggests that teachers who have earned advanced degrees have a positive impact on high school mathematics and science achievement when the degrees earned were in these subjects. of how it’s measured, teacher quality is not distributed equitably across schools and districts. for instance, numerous measures of what a teacher knows and can do have been routinely assumed to be important (at least as indicated through hiring strategies, salary schedules, and teacher reform agendas)..Nonrandom sorting of students to teachers within schools: a comparable statistical problem arises for teachers within schools, in that teachers’ value-added scores are affected by differences in the types of students who happen to be in their classrooms. a recent new york times report, for example, described how teachers prepare students for state high school history exams:As at many schools…teachers and administrators …prepare students for the tests. even where these accounts are true, they only demonstrate that more effective teachers and schools achieve better results, on average, with disadvantaged students than less effective teachers and schools achieve; they do not demonstrate that more effective teachers and schools achieve average results for disadvantaged students that are typical for advantaged students. they haven’t stopped doing what children do but the teachers don’t have time to deal with it. the challenge for districts is to ensure that every classroom is staffed by a skilled, qualified teacher. models cannot fully adjust for the fact that some teachers will have a disproportionate number of students who may be exceptionally difficult to teach (students with poorer attendance, who have become homeless, who have severe problems at home, who come into or leave the classroom during the year due to family moves, etc. making matters worse, because most vam techniques rely on growth calculations from one year to the next, each teacher’s value-added score is affected by the measurement error in two different tests. the context of this intense activity surrounding teacher policy, it makes sense to turn to the existing evidence on which teacher attributes are related to teacher effectiveness in order to guide policy decisions about hiring, compensation, and distribution with respect to teachers. when attached to individual merit pay plans, such approaches may also create disincentives for teacher collaboration. even the most sophisticated analyses of student test score gains generate estimates of teacher quality that vary considerably from one year to the next. researchers have found that teachers’ effectiveness ratings differ from class to class, from year to year, and from test to test, even when these are within the same content area. in comparison, class size, teacher education, and teacher experience play a small role. any sound evaluation will necessarily involve a balancing of many factors that provide a more accurate view of what teachers in fact do in the classroom and how that contributes to student learning. specifically, the authors find that if the goal is to distinguish relatively high or relatively low performing teachers from those with average performance within a district, the error rate is about 26% when three years of data are used for each teacher.. for example, studies have found the effects of one-on-one or small group tutoring, generally conducted in pull-out sessions or after school by someone other than the classroom teacher, can be quite substantial. for new teachers, this means that they must meet existing state certification requirements and demonstrate mastery of the content area in which they teach, either by passing a content knowledge test or by having majored in the subject in an undergraduate or graduate program. there are good reasons for concern about the current system of teacher evaluation, there are also good reasons to be concerned about claims that measuring teachers’ effectiveness largely by student test scores will lead to improved student achievement. sorting of teachers to students across schools: some schools and districts have students who are more socioeconomically disadvantaged than others. there is no way, however, to adjust statistically for a teacher’s ability to pressure other instructors in estimating the teacher’s effectiveness in raising her own students’ test scores. tennessee studies revealed that african american students were almost twice as likely to be taught by the least effective teachers (sanders and rivers 1996). some districts have found ways to identify, improve, and as necessary, dismiss teachers using strategies like peer assistance and evaluation that offer intensive mentoring and review panels. also, principals often attempt to make assignments that match students’ particular learning needs to the instructional strengths of individual teachers. in her analysis of teacher preparation and student achievement across states, darling-hammond (2000) reports that “measures of teacher preparation and certification are by far the strongest correlates of student achievement in reading and mathematics, both before and after controlling for student poverty and language status..Teachers who have chosen to teach in schools serving more affluent students may appear to be more effective simply because they have students with more home and school supports for their prior and current learning, and not because they are better teachers. this possibility cannot be ruled out entirely, but some studies control for cross-school variability and at least one study has examined the same teachers with different populations of students, showing that these teachers consistently appeared to be more effective when they taught more academically advanced students, fewer english language learners, and fewer low-income students. some teachers are more effective with students with particular characteristics, and principals with experience come to identify these variations and consider them in making classroom assignments. the department of education should actively encourage states to experiment with a range of approaches that differ in the ways in which they evaluate teacher practice and examine teachers’ contributions to student learning. they use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and artifacts such as lesson plans, assignments, and samples of student work. classroom should have a well-educated, professional teacher, and school systems should recruit, prepare, and retain teachers who are qualified to do the job. many principals are themselves unprepared to evaluate the teachers they supervise. even if state data systems permit tracking of students who change schools, measured growth for these students will be distorted, and attributing their progress (or lack of progress) to different schools and teachers will be problematic. education is both a cumulative and a complex process, it is impossible fully to distinguish the influences of students’ other teachers as well as school conditions on their apparent learning, let alone their out-of-school learning experiences at home, with peers, at museums and libraries, in summer programs, on-line, and in the community.

A case study of student and teacher relationships and the effect

as we show in what follows, research and experience indicate that approaches to teacher evaluation that rely heavily on test scores can lead to narrowing and over-simplifying the curriculum, and to misidentifying both successful and unsuccessful teachers..Given the range of measures currently available for teacher evaluation, and the need for research about their effective implementation and consequences, legislatures should avoid imposing mandated solutions to the complex problem of identifying more and less effective teachers. there is also little or no evidence for the claim that teachers will be more motivated to improve student learning if teachers are evaluated or monetarily rewarded for student test score gains..The research base is currently insufficient to support the use of vam for high-stakes decisions about individual teachers or schools. than two decades of research findings are unequivocal about the connection between teacher quality and student learning. framework for this study includes five broad categories of measurable and policy-relevant indicators to organize the teacher characteristics assumed to reflect teacher quality. because certification standards between states differ significantly, several researchers have sought to evaluate the effects of the teacher training that certification indicates. this means that in a typical performance measurement system, more than one in four teachers who are in fact teachers of average quality would be misclassified as either outstanding or poor teachers, and more than one in four teachers who should be singled out for special treatment would be misclassified as teachers of average quality. fuller and alexander’s (2004) analysis identified similar students who were taught by texas math teachers who were also similar except that some were certified and others were not. of tvaas and star data indicated that teachers had a substantial effect on student achievement. next chapter describes the methodology used to review the literature on the relationship between teacher characteristics and their performance, and the chapter that follows presents the findings from this literature review. making matters worse, because most vam techniques rely on growth calculations from one year to the next, each teacher’s value-added score is affected by the measurement error in two different tests. some teachers are more effective with students with particular characteristics, and principals with experience come to identify these variations and consider them in making classroom assignments. when attached to individual merit pay plans, such approaches may also create disincentives for teacher collaboration. teacher evaluation primarily on student test scores does not accurately distinguish more from less effective teachers because even relatively sophisticated approaches cannot adequately address the full range of statistical problems that arise in estimating a teacher’s effectiveness. collaborative work among teachers with different levels and areas of skill and different types of experience can capitalize on the strengths of some, compensate for the weaknesses of others, increase shared knowledge and skill, and thus increase their school’s overall professional capacity. taken together, these multiple sources of evidence—however different in nature—all conclude that quality teachers are a critical. value-added approaches improve over these other methods, the claim that they can “level the playing field” and provide reliable, valid, and fair comparisons of individual teachers is overstated. the sensitivity of value-added teacher effect estimates to different mathematics achievement measures. fifth grade teachers being evaluated by their students’ test scores might have a greater interest in pressing fourth grade teachers to better prepare their students for fifth grade. in contrast to the work of hanushek and others who have looked at specific subgroups of studies (see, for example, mayer, mullens, moore, and ralph 2000; wayne and youngs 2003; whitehurst 2002), the literature review presented here represents an analysis of a wide variety of empirical studies examining the impact of teacher attributes on teacher performance. is often quite difficult to match particular students to individual teachers, even if data systems eventually permit such matching, and to unerringly attribute student achievement to a specific teacher. from the administration and pressure from advocates have already led some states to adopt laws that require greater reliance on student test scores in the evaluation, discipline, and compensation of teachers. a school will be more effective if its teachers are more knowledgeable about all students and can coordinate efforts to meet students’ needs. students in high-minority schools were 40 percent more likely to be taught by out-of-field teachers. of course, it could also be that affluent schools or districts are able to recruit the best teachers.. department of education, concludes that the errors are sufficiently large to lead to the misclassification of many teachers. “the missing link: estimating the impact of incentives on effort and effort on production using teacher accountability legislation. second important finding from this work was that the positive effects of teacher quality appear to accumulate over the years. (1992) estimates that the difference between having a good teacher and having a bad teacher can exceed one grade-level equivalent in annual achievement growth. quality: faqteacher quality and student achievement: q&ateacher quality and student achievement: references.. of course, to the degree that reduced class sizes, overall educational spending, and teacher salaries are related to teacher quality, these can be viewed as investments in teacher quality, albeit indirect. study designed to test this question used vam methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. research also suggests that scattering a handful of good teachers around the district is not going to produce wide-ranging results..Status measures primarily reflect the higher or lower achievement with which students entered a teacher’s classroom at the beginning of the year rather than the contribution of the teacher in the current year. goldhaber and brewer (1996) found that the presence of teachers with at least a major in their subject area was the most reliable predictor of student achievement scores in math and science. problems of measurement error and other sources of year-to-year variability are especially serious because many policy makers are particularly concerned with removing ineffective teachers in schools serving the lowest-performing, disadvantaged students..Teachers who have chosen to teach in schools serving more affluent students may appear to be more effective simply because they have students with more home and school supports for their prior and current learning, and not because they are better teachers. after the first few years of an exam’s use, teachers can anticipate which of these topics are more likely to appear, and focus their instruction on these likely-to-be-tested topics, to be learned in the format of common test questions. value-added approaches improve over these other methods, the claim that they can “level the playing field” and provide reliable, valid, and fair comparisons of individual teachers is overstated. intense interest in teacher policy is motivated by several compelling factors. “the missing link: estimating the impact of incentives on effort and effort on production using teacher accountability legislation. and data from two initiatives in tennessee—the tennessee value added assessment system (tvaas) and student teacher achievement ratio (star) project—and one in texas—the university of texas at dallas texas schools project—provide good starting points for understanding how much of an effect teachers have on student outcomes. in a related finding, an analysis of math achievement and dropout rates in a sample of california high schools (fetler 2001) found that schools whose dropout rates were in the highest 10 percent had 50 percent more new teachers than did schools in the lowest 10 percent. another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year. more critically, the solution does not solve the problem of nonrandom assignment, and it necessarily excludes beginning teachers with insufficient historical data and teachers serving the most disadvantaged (and most mobile) populations, thus undermining the ability of the system to address the goals policy makers seek. problems of measurement error and other sources of year-to-year variability are especially serious because many policy makers are particularly concerned with removing ineffective teachers in schools serving the lowest-performing, disadvantaged students. regardless of whether the distribution of students among classrooms is motivated by good or bad educational policy, it has the same effect on the integrity of vam analyses: the nonrandom pattern makes it extremely difficult to make valid comparisons of the value-added of the various teachers within a school. what holds a great deal more promise is refining the policies and practices employed to build a qualified body of teachers in elementary schools, middle schools, and high schools; for disadvantaged, special needs, and advantaged students; and for math, science, languages, english, social studies, and the arts. a group incentive system can exacerbate this narrowing, if teachers press their colleagues to concentrate effort on those activities most likely to result in higher test scores and thus in group bonuses. for new teachers, this means that they must meet existing state certification requirements and demonstrate mastery of the content area in which they teach, either by passing a content knowledge test or by having majored in the subject in an undergraduate or graduate program. inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that vam results are based on factors other than teachers’ actual effectiveness. this legislation, along with typical hiring and compensation systems, assumes that years of teaching experience, teacher certification, engagement in certain types of coursework, and performance on standardized assessments are indicators of high-quality teachers. in general, teachers with emergency certificates don’t perform as well as those with traditional certification. for example, one study of the tennessee data found that low-income students were more likely to benefit from instruction by a highly effective teacher than were their more advantaged peers (nye, konstantopoulos, and hedges 2004). a refined understanding of how teacher attributes affect their performance across these different teaching contexts can be helpful in determining the range of potentially effective policy options. in mathematics, a brief exam can only sample a few of the many topics that teachers are expected to cover in the course of a year. change measures are flawed because they may reflect differences from one year to the next in the various characteristics of students in a teacher’s classroom, as well as other school or classroom-related variables (e.

Improving Student Learning by Supporting Quality Teaching: Key

a group incentive system can exacerbate this narrowing, if teachers press their colleagues to concentrate effort on those activities most likely to result in higher test scores and thus in group bonuses. by competent supervisors and peers, employing such approaches, should form the foundation of teacher evaluation systems, with a supplemental role played by multiple measures of student learning gains that, where appropriate, should include test scores. some states are now considering plans that would give as much as 50% of the weight in teacher evaluation and compensation decisions to scores on existing tests of basic skills in math and reading. positive effects associated with being taught by a highly effective teacher, defined as a teacher whose average student score gain is in the top 25 percent, were stronger for poor and minority students than for their white and affluent counterparts. “teaching students and teaching each other: the importance of peer learning for teachers. likewise, darling-hammond (1999) found a significant positive association between achievement and teacher certification. such instability from year to year renders single year estimates unsuitable for high-stakes decisions about teachers, and is likely to erode confidence both among teachers and among the public in the validity of the approach..A teacher who prepares students for questions about the causes of the two world wars may not adequately be teaching students to understand the consequences of these wars, although both are important parts of a. this information could prompt k–16  discussions between districts and institutions of higher education regarding ways to ensure that teacher preparation programs explicitly address the districts’ needs. that influence student test score gains attributed to individual teachers. nor are improvements in teacher quality likely to be realized through the status quo. poor and minority students are much less likely to get well-qualified teachers than students who are better off. for example, with vam, the essay-writing a student learns from his history teacher may be credited to his english teacher, even if the english teacher assigns no writing; the mathematics a student learns in her physics class may be credited to her math teacher. growth measures implicitly assume, without justification, that students who begin at different achievement levels should be expected to gain at the same rate, and that all gains are due solely to the individual teacher to whom student scores are attached; growth measures do not control for students’ socioeconomic advantages or disadvantages that may affect not only their initial levels but their learning rates. in addition, if teachers see little or no relationship between what they are doing in the classroom and how they are evaluated, their incentives to improve their teaching will be weakened., studies have sought to evaluate the effects of teacher training by comparing teachers who take alternative routes to teaching with those who complete a traditional teacher preparation program. likewise, sanders (1998) and sanders and rivers (1996) argue that the single most important factor affecting student achievement is teachers, and the effects of teachers on student achievement are both additive and cumulative. change measures are flawed because they may reflect differences from one year to the next in the various characteristics of students in a teacher’s classroom, as well as other school or classroom-related variables (e. their research identifies teacher quality as the most important school-related factor influencing student achievement. a variety of reasons, analyses of vam results have led researchers to doubt whether the methodology can accurately identify more and less effective teachers. to the extent that this policy results in the incorrect categorization of particular teachers, it can harm teacher morale and fail in its goal of changing behavior in desired directions. this analysis examines the existing empirical literature on the relationship between teacher attributes and their effectiveness with the goal of informing policy on investing in teacher quality. in practice, therefore, evaluating teachers by their students’ test scores means evaluating teachers only by students’ basic math and/or reading skills, to the detriment of other knowledge, skills, and experiences that young people need to become effective participants in a democratic society and contributors to a productive economy. impending teacher shortage, estimated at more than two million teachers by 2007 (ingersoll 2003), could exacerbate the inequitable distribution of teacher quality in the coming decades unless policymakers and educational leaders find ways of increasing the supply of skilled teachers and ensuring that the lowest performing students are enrolled in their classes. for example, fifth-grade math students who had three consecutive highly effective teachers scored between 52 and 54 percentile points ahead of students who had three consecutive teachers who were least effective, even though the math achievement of both groups of students was the same prior to entering second grade (sanders and rivers 1996). consistently shows that teacher quality—whether measured by content knowledge, experience, training and credentials, or general intellectual skills—is strongly related to student achievement: simply, skilled teachers produce better student results. is simply no shortcut to the identification and removal of ineffective teachers. newly hired teachers, districts can establish and maintain intensive, long-term induction programs that focus on helping new teachers meet challenging professional performance standards. response to these perceived failures of current teacher policies, the obama administration encourages states to make greater use of students’ test results to determine a teacher’s pay and job tenure. they haven’t stopped doing what children do but the teachers don’t have time to deal with it. empirical studies that conform to a variety of accepted methodological approaches and use a range of measures of teacher effectiveness are used to ascertain what existing evidence says about the relationship between teacher attributes and their performance. quality: faqteacher quality and student achievement: q&ateacher quality and student achievement: references. this possibility cannot be ruled out entirely, but some studies control for cross-school variability and at least one study has examined the same teachers with different populations of students, showing that these teachers consistently appeared to be more effective when they taught more academically advanced students, fewer english language learners, and fewer low-income students..The tests most likely to be used in any test-based teacher evaluation program are those that are currently required under nclb, or that will be required under its reauthorized version. principals typically have too broad a span of control (frequently supervising as many as 30 teachers), and too little time and training to do an adequate job of assessing and supporting teachers. on emergency certificates don’t perform as well as fully certified teachers. jerald and ingersoll (2002) showed that the problem of out-of-field teachers actually got worse for disadvantaged students during the 1990s. the nonrandom assignment of students to classrooms and schools—and the wide variation in students’ experiences at home and at school—mean that teachers cannot be accurately judged against one another by their students’ test scores, even when efforts are made to control for student characteristics in statistical models. in highly mobile communities, if two years of data are unavailable for many students, or if teachers are not to be held accountable for students who have been present for less than the full year, the sample is even smaller than the already small samples for a single typical teacher, and the problem of misestimation is exacerbated. other approaches, with less reliance on test scores, have been found to improve teachers’ practice while identifying differences in teachers’ effectiveness. they use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and artifacts such as lesson plans, assignments, and samples of student work. once teachers in schools or classrooms with more transient student populations understand that their vam estimates will be based only on the subset of students for whom complete data are available and usable, they will have incentives to spend disproportionately more time with students who have prior-year data or who pass a longevity threshold, and less time with students who arrive mid-year and who may be more in need of individualized instruction. policy makers have recently come to believe that this failure can be remedied by calculating the improvement in students’ scores on standardized tests in mathematics and reading, and then relying heavily on these calculations to evaluate, reward, and remove the teachers of these tested students. a theoretical and empirical investigation of teacher collaboration for school improvement and student achievement in public elementary schools. better schools are collaborative institutions where teachers work across classroom and grade-level boundaries toward the common goal of educating all children to their maximum potential. evaluators may find it useful to take student test score information into account in their evaluations of teachers, provided such information is embedded in a more comprehensive approach. greater clarity on the empirical evidence regarding teacher quality can inform the wisdom of current practice, guide state efforts in the struggle with nclb compliance regarding teachers, and provide direction for future teacher policy. a study of teachers in alabama by ferguson and ladd (1996) found a correlation between a teacher’s higher act scores and higher reading scores for her students. findings provide little support for the view that test-based incentives for schools or individual teachers are likely to improve achievement, or for the expectation that such incentives for individual teachers will suffice to produce gains in student learning. and scores are also needed quickly if test results are to be used for timely teacher evaluation. academic ability: teachers with stronger academic skills perform better, whether these skills are measured by teachers’ sat or act scores, grade point average or selectivity of the college they attended. of teacher training and licensing procedures stem largely from a belief that the requirements for certification do not encompass all the characteristics that should be sought in teachers and thus should be reformed to require more content knowledge and displays of teaching competency (walsh and snyder 2004). they conclude from their analysis of 400,000 students in 3,000 schools that, while school quality is an important determinant of student achievement, the most important predictor is teacher quality. this means that in a typical performance measurement system, more than one in four teachers who are in fact teachers of average quality would be misclassified as either outstanding or poor teachers, and more than one in four teachers who should be singled out for special treatment would be misclassified as teachers of average quality. nclb has used student test scores to evaluate schools, with clear negative sanctions for schools (and, sometimes, their teachers) whose students fail to meet expected performance standards.. and new york: economic policy institute and teachers college press. practice, american public schools generally do a poor job of systematically developing and evaluating teachers. it is commonplace for teachers to report that this year they had a “better” or “worse” class than last, even if prior achievement or superficial socioeconomic characteristics are similar.

it follows that assigning experienced, qualified teachers to low-performing schools and students is likely to pay off in better performance and narrowing gaps.  star’s reliance on randomized samples, combined with the data-tracking capacity of tvaas, offered an important and unique opportunity to examine variations in student achievement where the only difference between classes was the teacher. and finally, if teachers are so important to student learning, how can we make sure all students receive the benefit of good teachers? it is relatively easy for teachers to prepare students for such tests by drilling them in the mechanics of reading, but this behavior does not necessarily make them good readers. in addition to changes in the characteristics of students assigned to teachers, this is also partly due to the small number of students whose scores are relevant for particular teachers..Another study confirmed that big changes from one year to the next are quite likely, with year-to-year correlations of estimated teacher quality ranging from only 0. this review examines empirical evidence on the relationship between teacher attributes and teacher effectiveness with the goal of informing federal, state, and local teacher policy. even if they show that monetary incentives for teachers lead to higher scores in reading and math, we will still not know whether the higher scores were achieved by superior instruction or by more drill and test preparation, and whether the students of these teachers would perform equally well on tests for which they did not have specific preparation. review of vam research from the educational testing service’s policy information center concluded,Vam results should not serve as the sole or principal basis for making consequential decisions about teachers. the sensitivity of value-added teacher effect estimates to different mathematics achievement measures.” it is a rational response to incentives and is not unlawful, provided teachers do not gain illicit access to specific forthcoming test questions and prepare students for them. while this approach would be preferable in some ways to attempting to measure value-added from one year to the next, fall and spring testing would force schools to devote even more time to testing for accountability purposes, and would set up incentives for teachers to game the value-added measures. even if they show that monetary incentives for teachers lead to higher scores in reading and math, we will still not know whether the higher scores were achieved by superior instruction or by more drill and test preparation, and whether the students of these teachers would perform equally well on tests for which they did not have specific preparation. although some reasoning and other advanced skills can be tested with multiple-choice questions, most cannot be, so teachers who are evaluated by students’ scores on multiple-choice exams have incentives to teach only lower level, procedural skills that can easily be tested. one study of teach for america (tfa)1 teachers in houston found that tfa teachers had a positive effect on student achievement scores when compared with other new teachers (raymond, fletcher, and luque 2001). these studies not only provide insight into the characteristics of good teachers, they reveal how these contribute to student learning and closing achievement gaps. because no school can anticipate far in advance that it will be asked to participate in the naep sample, nor which students in the school will be tested, and because no consequences for the school or teachers follow from high or low naep scores, teachers have neither the ability nor the incentive to teach narrowly to expected test topics. similarly, if teachers know they will be evaluated by their students’ scores on a test that predictably asks questions about triangles and rectangles, teachers skilled in preparing students for calculations involving these shapes may fail to devote much time to polygons, an equally important but somewhat more difficult topic in the overall math curriculum. and minority students are about twice as likely to have teachers with less than three years of teaching experience (national center for education statistics 2000). this runs counter to most people’s notions that the true quality of a teacher is likely to change very little over time and raises questions about whether what is measured is largely a “teacher effect” or the effect of a wide variety of other factors.” the carnegie foundation for the advancement of teaching recommended that teacher education programs require a 3. they use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and artifacts such as lesson plans, assignments, and samples of student work. as we show in what follows, research and experience indicate that approaches to teacher evaluation that rely heavily on test scores can lead to narrowing and over-simplifying the curriculum, and to misidentifying both successful and unsuccessful teachers. research base is currently insufficient to support the use of vam for high-stakes decisions about individual teachers or schools. legislatures should not mandate a test-based approach to teacher evaluation that is unproven and likely to harm not only teachers, but also the children they instruct. fuller and alexander’s (2004) analysis identified similar students who were taught by texas math teachers who were also similar except that some were certified and others were not. this could lead to the inappropriate dismissal of teachers of low-income and minority students, as well as of students with special educational needs. the 2004 estimates put the number of teachers who have not yet met the highly qualified standard at 20 percent in elementary schools and 25 percent in secondary schools (u. in schools with certain kinds of block schedules, courses are taught for only a semester, or even in nine or 10 week rotations, giving students two to four teachers over the course of a year in a given class period, even without considering unplanned teacher turnover. there are many reasons for concern about the current system of teacher evaluation, there are also reasons to be skeptical of claims that measuring teachers’ effectiveness by student test scores will lead to the desired outcomes. we also need to know more about the incentives and working conditions that will attract highly effective teachers to traditionally hard-to-staff schools. some states are now considering plans that would give as much as 50% of the weight in teacher evaluation and compensation decisions to scores on existing poor-quality tests of basic skills in math and reading. to raise student test scores, to the exclusion of other important goals, can demoralize good teachers and, in some cases, provoke them to leave the profession entirely. these negative effects can result both from the statistical and practical difficulties of evaluating teachers by their students’ test scores. teachers are also likely to be aware of personal circumstances (a move, an illness, a divorce) that are likely to affect individual students’ learning gains but are not captured by value-added models. a teacher who works in a well-resourced school with specialist supports may appear to be more effective than one whose students do not receive these supports. schools that have adopted pull-out, team teaching, or block scheduling practices will only inaccurately be able to isolate individual teacher “effects” for evaluation, pay, or disciplinary purposes. this approach, however, limits the usefulness of the results because teachers can then be compared only to other teachers in the same school and not to other teachers throughout the district. she also found a significant negative association between achievement and the presence of a high proportion of new or uncertified teachers in the school. in others, comprehensive systems have been developed for examining teacher performance in concert with evidence about outcomes for purposes of personnel decision making and compensation. however, progress has been made over the last two decades in developing standards-based evaluations of teaching practice, and research has found that the use of such evaluations by some districts has not only provided more useful evidence about teaching practice, but has also been associated with student achievement gains and has helped teachers improve their practice and effectiveness. to reduce it to 12% would require 10 years of data for each teacher. structured performance assessments of teachers like those offered by the national board for professional teaching standards and the beginning teacher assessment systems in connecticut and california have also been found to predict teacher’s effectiveness on value-added measures and to support teacher learning. patterns, which held true in every district and state under study, suggest that there is not a stable construct measured by value-added measures that can readily be called “teacher effectiveness. most of the research suggests that teachers who have had pedagogical training and who have received certification produce better student achievement scores than those who have not, although some studies dispute this finding (goldhaber and brewer 2000)., an emphasis on test results for individual teachers exacerbates the well-documented incentives for teachers to focus on narrow test-taking skills, repetitive drill, and other undesirable instructional practices. often they do not have a culture of high expectations for students and teachers or that values teacher learning, collegiality, and cooperation. in addition, some states’ efforts to reduce class size—and in so doing creating a need to increase the teacher workforce—have led to the hiring of more unqualified and untrained teachers, thus minimizing the possible benefits of lower class sizes (jepsen and rivkin 2002). despite the hopes of many, even the most highly developed value-added models fall short of their goal of adequately adjusting for the backgrounds of students and the context of teachers’ classrooms. if test scores subsequently improve, should a specific teacher or the tutoring service be given the credit? in this respect vam results are even less reliable indicators of teacher contributions to learning than a single test score. the usefulness of value-added modeling requires the assumption that teachers whose performance is being compared have classrooms with students of similar ability (or that the analyst has been able to control statistically for all the relevant characteristics of students that differ across classrooms). a result, standardized annual exams, if usable for high-stakes teacher or school evaluation purposes, typically include no or very few extended-writing or problem-solving items, and therefore do not measure conceptual understanding, communication, scientific investigation, technology and real-world applications, or a host of other critically important skills. most teachers, particularly those teaching elementary or middle school students, do not teach enough students in any year for average test scores to be highly reliable.” she contends that measures of teacher quality are more strongly related to student achievement than other kinds of educational investments such as reduced class size, overall spending on education, and teacher salaries. but in practice, teachers’ estimated value-added effect necessarily reflects in part the nonrandom differences between the students they are assigned and not just their own effectiveness..30 this means that only about 4% to 16% of the variation in a teacher’s value-added ranking in one year can be predicted from his or her rating in the previous year.: trying to define teacher qualityteacher quality and student achievement: at a glancedoes highly qualified mean highly effective?

teacher rewards based on comparative student test results can also create disincentives for teacher collaboration. the large magnitude of these error rates, the mathematica researchers are careful to point out that the resulting misclassification of teachers that would emerge from value-added models is still most likely understated because their analysis focuses on imprecision error alone. the same dramatic fluctuations were found for teachers ranked at the bottom in the first year of analysis. they are:What teachers know and can do is the most important influence on what students learn..In contrast to the approach used by darling-hammond, which equates teacher quality with specific qualifications, rivkin, hanushek, and kain (1998) identify teacher quality in terms of student performance outcomes. an analysis that synthesized findings from a group of studies showed that teachers with pedagogical training performed better than those who entered teaching without such training (greenwald, hedges, and laine 1996). million teachers to educate more than 46 million public elementary and secondary students (national center for education statistics 2000). approach taken here is similar to that used by wilson, floden, and ferrini-mundy (2001) in their review of the research on teacher preparation conducted for the u. often they do not have a culture of high expectations for students and teachers or that values teacher learning, collegiality, and cooperation.. some policy makers seek to minimize these realities by citing teachers or schools who achieve exceptional results with disadvantaged students. if performance rewards do not raise average teacher salaries, the potential for them to improve the average effectiveness of recruited teachers is limited and will result only if the more talented of prospective teachers are more likely than the less talented to accept the risks that come with an uncertain salary. by competent supervisors and peers, employing such approaches, should form the foundation of teacher evaluation systems, with a supplemental role played by multiple measures of student learning gains that, where appropriate, should include test scores. sum, teachers’ value-added effects can be compared only where teachers have the same mix of struggling and successful students, something that almost never occurs, or when statistical measures of effectiveness fully adjust for the differing mix of students, something that is exceedingly hard to do. these include the influences of students’ other teachers—both previous teachers and, in secondary schools, current teachers of other subjects—as well as tutors or instructional specialists, who have been found often to have very large influences on achievement gains. more critically, the solution does not solve the problem of nonrandom assignment, and it necessarily excludes beginning teachers with insufficient historical data and teachers serving the most disadvantaged (and most mobile) populations, thus undermining the ability of the system to address the goals policy makers seek. but in practice, teachers’ estimated value-added effect necessarily reflects in part the nonrandom differences between the students they are assigned and not just their own effectiveness. some advocates argue that admittedly flawed value-added measures are preferred to existing cumbersome measures for identifying, remediating, or dismissing ineffective teachers, this argument creates a false dichotomy. yet in practice, american public schools generally do a poor job of systematically developing and evaluating teachers..These systems for observing teachers’ classroom practice are based on professional teaching standards grounded in research on teaching and learning. overall, such teachers might be equally effective, but vam would arbitrarily identify the former teacher as more effective, and the latter as less so. in practice, therefore, evaluating teachers by their students’ test scores means evaluating teachers only by students’ basic math and/or reading skills, to the detriment of other knowledge, skills, and experiences that young people need to become effective participants in a democratic society and contributors to a productive economy. yet even when the incentives were substantial, teachers have not always been willing to go to or to stay in difficult schools.. experiments are underway to determine if offers to teachers of higher pay, conditional on their students having higher test scores in math and reading, actually lead to higher student test scores in these subjects. for example, developing an approach to policy that values different and multiple teacher characteristics based on the research evidence may prove promising. second important finding from this work was that the positive effects of teacher quality appear to accumulate over the years. value-added modeling of teacher effectiveness: an exploration of stability across models and contexts. and data from two initiatives in tennessee—the tennessee value added assessment system (tvaas) and student teacher achievement ratio (star) project—and one in texas—the university of texas at dallas texas schools project—provide good starting points for understanding how much of an effect teachers have on student outcomes. name as (required):Comments (max 2000 characters):Home > staffing and students > teacher quality and student achievement > teacher quality and student achievement: research review. clearly, the context of teaching is important and may affect the impact of the teacher attributes considered in this analysis. a study that examined the math achievement of elementary students also found that students taught by new, uncertified teachers did significantly worse on achievement tests than did those taught by new, certified teachers (laczko-kerr and berliner 2002). this runs counter to most people’s notions that the true quality of a teacher is likely to change very little over time and raises questions about whether what is measured is largely a “teacher effect” or the effect of a wide variety of other factors. following teacher qualities are related to higher student achievement are:Content knowledge: effective teachers have a solid background in the subject area they teach as measured by a college major or minor in the field. effect is stronger for poor and/or minority students than for their more affluent and/or white peers, although all groups benefit from effective teachers. because of the inability of value-added methods to fully account for the differences in student characteristics and in school supports, as well as the effects of summer learning loss, teachers who teach students with the greatest educational needs will appear to be less effective than they are. prior teachers have lasting effects, for good or ill, on students’ later learning, and several current teachers can also interact to produce students’ knowledge and skills. because no school can anticipate far in advance that it will be asked to participate in the naep sample, nor which students in the school will be tested, and because no consequences for the school or teachers follow from high or low naep scores, teachers have neither the ability nor the incentive to teach narrowly to expected test topics..A teacher who prepares students for questions about the causes of the two world wars may not adequately be teaching students to understand the consequences of these wars, although both are important parts of a. this could lead to the inappropriate dismissal of teachers of low-income and minority students, as well as of students with special educational needs. specifically, the authors find that if the goal is to distinguish relatively high or relatively low performing teachers from those with average performance within a district, the error rate is about 26% when three years of data are used for each teacher. policy makers are left with questions surrounding what counts as a quality teacher—information that could be valuable in guiding policies regarding whom to hire, whom to reward, and how best to distribute teachers across schools and classrooms. teacher evaluation and sanctions to test score results can discourage teachers from wanting to work in schools with the neediest students, while the large, unpredictable variation in the results and their perceived unfairness can undermine teacher morale. increases in teacher salaries, incentives such as loan-forgiveness programs, heightened teacher preparation requirements, and other efforts to prepare, recruit, and retain high-quality teachers are all associated with substantial costs. darling-hammond (1999) found that, although other factors had a stronger association with achievement, the presence of a teacher who did not have at least a minor in the subject matter that he or she taught accounted for about 20 percent of the variation in naep scores. structured performance assessments of teachers like those offered by the national board for professional teaching standards and the beginning teacher assessment systems in connecticut and california have also been found to predict teacher’s effectiveness on value-added measures and to support teacher learning. if new laws or policies specifically require that teachers be fired if their students’ test scores do not rise by a certain amount, then more teachers might well be terminated than is now the case. in this respect vam results are even less reliable indicators of teacher contributions to learning than a single test score. > staffing and students > teacher quality and student achievement > teacher quality and student achievement: research review. their analysis of these data, rivkin, hanushek, and kain (2005) found that teacher quality differences explained the largest portion of the variation in reading and math achievement. in addition, some critics believe that typical teacher compensation systems provide teachers with insufficient incentives to improve their performance. krueger’s re-analysis of the studies that hanushek included on the effect of pupil-teacher ratio and the effect of per-pupil expenditures demonstrates that other approaches to weighting the studies lead to a more consistent and positive story about the effect of these resources on student achievement. darling-hammond (1999) found that, although other factors had a stronger association with achievement, the presence of a teacher who did not have at least a minor in the subject matter that he or she taught accounted for about 20 percent of the variation in naep scores. a highly effective teacher, therefore, is one whose students show the most gains from one year to the next. with the use of student test scores to evaluate teachers. in addition, differences in student performance were more heavily influenced by the teacher than by student ethnicity or class, or by the school attended by the student. the measurement of average achievement for all of a school’s students is, though still not perfectly reliable, more stable than measurement of achievement of students attributable to a specific teacher. in order to implement needed policies associated with staffing every classroom—even the most challenging ones—with high-quality teachers, substantial and targeted investments must first be made in both teacher quality and education research. another analysis of the same data confirmed that students of tfa teachers did outperform those taught by other untrained teachers, especially in math; however, they did not perform as well as new teachers who had pedagogical training and certification (darling-hammond, holtzman, gatlin, and heilig 2005). after the first few years of an exam’s use, teachers can anticipate which of these topics are more likely to appear, and focus their instruction on these likely-to-be-tested topics, to be learned in the format of common test questions.

Research proposal influences of teacher effectiveness

there is also little or no evidence for the claim that teachers will be more motivated to improve student learning if teachers are evaluated or monetarily rewarded for student test score gains. these results mean is that incentives to work in hard-to-staff schools should also take into account the working conditions they provide for teachers. the most common strategy has been to offer pay increases or signing bonuses for teachers to come to high-need areas or to teach high-need subjects. goldhaber and brewer (1996) found that the presence of teachers with at least a major in their subject area was the most reliable predictor of student achievement scores in math and science. the technical and practical limitations of what test scores can accurately reflect, we conclude that changes in test scores should be used only as a modest part of a broader set of evidence about teacher practice. jerald and ingersoll (2002) showed that the problem of out-of-field teachers actually got worse for disadvantaged students during the 1990s. given the importance of teachers’ collective efforts to improve overall student achievement in a school, an additional component of documenting practice and outcomes should focus on the effectiveness of teacher participation in teams and the contributions they make to school-wide improvement, through work in curriculum development, sharing practices and materials, peer coaching and reciprocal observation, and collegial work with students.., vary widely) across time, across the classes that teachers teach, and across tests that are used to evaluate instruction, to be used for the high-stakes purposes of evaluating teachers.. denver’s pro-comp system, arizona’s career ladder, and the teacher advancement program are illustrative. while this approach would be preferable in some ways to attempting to measure value-added from one year to the next, fall and spring testing would force schools to devote even more time to testing for accountability purposes, and would set up incentives for teachers to game the value-added measures. in addition, some critics believe that typical teacher compensation systems provide teachers with insufficient incentives to improve their performance. major drawbacks to these efforts were: (1) not enough attention to what was needed to retain teachers, and (2) too much attention to individuals and too little on schools (liu, johnson, and peske 2003). it is commonplace for teachers to report that this year they had a “better” or “worse” class than last, even if prior achievement or superficial socioeconomic characteristics are similar. teachers with more than five years in the classroom seem to be the most effective.” other research helps pinpoint the dimensions of teacher quality. mobility is a much greater problem for poor and minority students; teachers are much more likely to move from urban to suburban schools than vice versa (hanushek, kain, and rivkin 2004). the study found that the students taught by certified teachers scored better on the state math achievement test.  star’s reliance on randomized samples, combined with the data-tracking capacity of tvaas, offered an important and unique opportunity to examine variations in student achievement where the only difference between classes was the teacher. on the other hand, the production function literature could be contested as too exclusive in the sense that other methodological approaches, particularly those that allow the researcher to focus on more refined measures of what teachers know and can do, can also make valuable contributions to what we know about the value of educational resources. are several studies providing evidence that the students of certified teachers perform better than students of uncertified teachers. review of the technical evidence leads us to conclude that, although standardized test scores of students are one piece of information for school leaders to use to make judgments about teacher effectiveness, such scores should be only a part of an overall comprehensive evaluation. inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that students are systematically grouped into fifth grade classrooms based on their fourth grade performance. a theoretical and empirical investigation of teacher collaboration for school improvement and student achievement in public elementary schools. finally, of the 207 studies that estimate the effect of teacher experience, 29% of the estimates were statistically significant and positive, 5% were statistically significant and negative, and 66% were not statistically significant. test scores to evaluate teachers unfairly disadvantages teachers of the neediest students..The purpose of this analysis is to review existing empirical evidence to draw conclusions about the specific characteristics that are linked with teacher performance..To enhance productive collaboration among all of a school’s staff for the purpose of raising overall student scores, group (school-wide) incentives are preferred to incentives that attempt to distinguish among teachers. rather, teacher policies need to reflect the reality that teaching is a complex activity that is influenced by the many elements of teacher quality. in mathematics, a brief exam can only sample a few of the many topics that teachers are expected to cover in the course of a year. for example, a teacher in a school with exceptionally talented teachers may not appear to add as much value to her students as others in the school, but if compared to all the teachers in the district, she might fall well above average. from the administration and pressure from advocates have already led some states to adopt laws that require greater reliance on student test scores in the evaluation, discipline, and compensation of teachers. teachers who teach a greater share of lower-income students are disadvantaged by summer learning loss in estimates of their effectiveness that are calculated in terms of gains in their students’ test scores from the previous year. is research that has shown that students of teachers who have greater academic ability—be it measured through sat or act scores, gpa, iq, tests of verbal ability, or selectivity of the college attended—perform better. central to nclb’s goal of closing the achievement gap by 2014 is the requirement that all teachers be highly qualified by the end of the 2005-06 school year. this data helps inform decisions about where to assign teachers, how to staff schools, and what supports and professional development opportunities are needed in order to maximize the benefits of the most valuable academic resource, teachers. regardless of whether the distribution of students among classrooms is motivated by good or bad educational policy, it has the same effect on the integrity of vam analyses: the nonrandom pattern makes it extremely difficult to make valid comparisons of the value-added of the various teachers within a school. these techniques measure the gains that students make and then compare these gains to those of students whose measured background characteristics and initial test scores were similar, concluding that those who made greater gains must have had more effective teachers. likewise, darling-hammond (1999) found a significant positive association between achievement and teacher certification. despite general agreement about the importance of high-quality teachers, researchers, practitioners, policy makers, and the public have been unable to reach a consensus about what specific qualities and characteristics make a good teacher. if performance rewards do not raise average teacher salaries, the potential for them to improve the average effectiveness of recruited teachers is limited and will result only if the more talented of prospective teachers are more likely than the less talented to accept the risks that come with an uncertain salary..The recent federal education legislation, no child left behind (nclb), further underlines the importance of having a high-quality teacher in every classroom in every school. given all the factors related to student performance, how much impact can we expect from teachers? secondary school teachers, all teachers in kindergarten, first, and second grades and some teachers in grades three through eight do not teach courses in which students are subject to external tests of the type needed to evaluate test score gains. this research typically uses teachers’ college degree to represent content knowledge..Because of the range of influences on student learning, many studies have confirmed that estimates of teacher effectiveness are highly unstable. in some cases, students may be pulled out of classes for special programs or instruction, thereby altering the influence of classroom teachers. fetler (1999) found that teachers with emergency teaching certificates did not perform as well as teachers who were fully certified, even when controlling for the amount of teaching experience. for instance, the board on testing and assessment of the national research council of the national academy of sciences stated,…vam estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable. advocates of evaluating teachers by students’ fall-to-spring growth have not explained how, within reasonable budgetary constraints, all spring testing can be moved close to the end of the school year..As noted above, even in a more stable community, the number of students in a given teacher’s class is often too small to support reliable conclusions about teacher effectiveness. policy makers have recently come to believe that this failure can be remedied by calculating the improvement in students’ scores on standardized tests in mathematics and reading, and then relying heavily on these calculations to evaluate, reward, and remove the teachers of these tested students. one recent study, economists found that peer learning among small groups of teachers was the most powerful predictor of improved student achievement over time. inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that students are systematically grouped into fifth grade classrooms based on their fourth grade performance. except at the very bottom of the teacher quality distribution where test-based evaluation could result in termination, individual incentives will have little impact on teachers who are aware they are less effective (and who therefore expect they will have little chance of getting a bonus) or teachers who are aware they are stronger (and who therefore expect to get a bonus without additional effort). some advocates of this approach expect the provision of performance-based financial rewards to induce teachers to work harder and thereby increase their effectiveness in raising student achievement. some schools expect, and train, teachers of all subjects to integrate reading and writing instruction into their curricula. these challenges arise because of the influence of student socioeconomic advantage or disadvantage on learning, measurement error and instability, the nonrandom sorting of teachers across schools and of students to teachers in classrooms within schools, and the difficulty of disentangling the contributions of multiple teachers over time to students’ learning. some attempts to redistribute good teachers to low-performing schools have not been entirely successful.

in schools with certain kinds of block schedules, courses are taught for only a semester, or even in nine or 10 week rotations, giving students two to four teachers over the course of a year in a given class period, even without considering unplanned teacher turnover. to begin with, how do you define a good teacher? others expect that the apparent objectivity of test-based measures of teacher performance will permit the expeditious removal of ineffective teachers from the profession and will encourage less effective teachers to resign if their pay stagnates. we noted above that an individual incentive system that rewards teachers for their students’ mathematics and reading scores can result in narrowing the curriculum, both by reducing attention paid to non-tested curricular areas, and by focusing attention on the specific math and reading topics and skills most likely to be tested. by competent supervisors and peers, employing such approaches, should form the foundation of teacher evaluation systems, with a supplemental role played by multiple measures of student learning gains that, where appropriate, could include test scores. the most common strategy has been to offer pay increases or signing bonuses for teachers to come to high-need areas or to teach high-need subjects. in addition, hanushek included 41 estimates of the impact of teacher test scores on student outcomes. each of these resource differences may have a small impact on a teacher’s apparent effectiveness, but cumulatively they have greater significance. each of these resource differences may have a small impact on a teacher’s apparent effectiveness, but cumulatively they have greater significance.• research has demonstrated a positive effect of certified teachers on high school mathematics achievement when the certification is in mathematics.” it is a rational response to incentives and is not unlawful, provided teachers do not gain illicit access to specific forthcoming test questions and prepare students for them. those who evaluate teachers could take student test scores over time into account, they should be fully aware of their limitations, and such scores should be only one element among many considered in teacher profiles. districts seeking to remove ineffective teachers must invest the time and resources in a comprehensive approach to evaluation that incorporates concrete steps for the improvement of teacher performance based on professional standards of instructional practice, and unambiguous evidence for dismissal, if improvements do not occur. skilled principals often try to assign students with the greatest difficulties to teachers they consider more effective. practice, american public schools generally do a poor job of systematically developing and evaluating teachers. for that to happen, school systems should recruit, prepare, and retain teachers who are qualified to do the job. some advocates of this approach expect the provision of performance-based financial rewards to induce teachers to work harder and thereby increase their effectiveness in raising student achievement. there are many pitfalls to making causal attributions of teacher effectiveness on the basis of the kinds of data available from typical school districts. prior teachers have lasting effects, for good or ill, on students’ later learning, and several current teachers can also interact to produce students’ knowledge and skills. (if teachers are found wanting, administrators should know this before designing staff development programs or renewing teacher contracts for the following school year.  they also found that, although advanced degrees in general were not associated with higher student achievement, an advanced degree that was specific to the subject area that a teacher taught was associated with higher achievement. these costs could be managed by targeting specific areas of need where teacher shortages are most pronounced, such as particular subject areas (e. tennessee studies revealed that african american students were almost twice as likely to be taught by the least effective teachers (sanders and rivers 1996). for now, suffice it to say that teachers who teach large numbers of low-income students will be noticeably disadvantaged in spring-to-spring test gain analyses, because their students will start the fall further behind than more affluent students who were scoring at the same level in the previous spring. to be sure, if new laws or district policies specifically require that teachers be fired if their students’ test scores do not rise by a certain amount or reach a certain threshold, then more teachers might well be terminated than is now the case.  simply, researchers looked for the change in students’ test scores according to the teacher they were assigned to. quite often, these approaches incorporate several ways of looking at student learning over time in relation to a teacher’s instruction. this high level of investment mirrors the general sentiment among policy makers, researchers, and the general public that teachers are perhaps the most valuable resource allocated to student education. but there is no current evidence to indicate either that the departing teachers would actually be the weakest teachers, or that the departing teachers would be replaced by more effective ones..In a letter to the department of education, commenting on the department’s proposal to use student achievement to evaluate teachers, the board on testing and assessment of the national research council of the national academy of sciences wrote:…vam estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable. in addition, differences in student performance were more heavily influenced by the teacher than by student ethnicity or class, or by the school attended by the student. their analysis of these data, rivkin, hanushek, and kain (2005) found that teacher quality differences explained the largest portion of the variation in reading and math achievement. this data helps inform decisions about where to assign teachers, how to staff schools, and what supports and professional development opportunities are needed in order to maximize the benefits of the most valuable academic resource, teachers. quite often, these approaches incorporate several ways of looking at student learning over time in relation to the teacher’s instruction. in addition, if teachers see little or no relationship between what they are doing in the classroom and how they are evaluated, their incentives to improve their teaching will be weakened. one study found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%..Given the range of measures currently available for teacher evaluation, and the need for research about their effective implementation and consequences, legislatures should avoid imposing mandated solutions to the complex problem of identifying more and less effective teachers. likewise, the holmes group (1986) advised that all major universities with substantial enrollments of preservice teachers (i., preparing, and retaining good teachers is the central strategy for improving our schools.{{53 }}another recent study found that students achieve more in mathematics and reading when they attend schools characterized by higher levels of teacher collaboration for school improvement. vam estimates have proven to be unstable across statistical models, years, and classes that teachers teach. rather, they are directly relevant to policy makers and to the desirability of efforts to evaluate teachers by their students’ scores. single teacher accounts for all of a student’s achievement. a teacher who works in a well-resourced school with specialist supports may appear to be more effective than one whose students do not receive these supports. some attempts to redistribute good teachers to low-performing schools have not been entirely successful. opposition to those who propose to eliminate all requirements for entering the teaching profession, this analysis supports a judicious use of the research evidence on teacher characteristics and teacher effectiveness. it follows that assigning experienced, qualified teachers to low-performing schools and students is likely to pay off in better performance and narrowing gaps. other approaches, with less reliance on test scores, have been found to improve teachers’ practice while identifying differences in teachers’ effectiveness. as this review has shown, there is already enough evidence to show unequivocally that good teachers are vital to raising student achievement and closing achievement gaps. center for public education will continue to monitor state and district efforts to provide each child with a highly qualified, effective teacher. the most frequently proposed solution to this problem is to limit vam to teachers who have been teaching for many years, so their performance can be estimated using multiple years of data, and so that instability in vam measures over time can be averaged out. need, mentioned above, to have test results ready early enough in the year to influence not only instruction but also teacher personnel decisions is inconsistent with fall to spring testing, because the two tests must be spaced far enough apart in the year to produce plausibly meaningful information about teacher effects. is the case in every profession that requires complex practice and judgments, precision and perfection in the evaluation of teachers will never be possible. review of the technical evidence leads us to conclude that, although standardized test scores of students are one piece of information for school leaders to use to make judgments about teacher effectiveness, such scores should be only a part of an overall comprehensive evaluation. grade point average for admission, and that teachers complete courses in an academic-core subject in four years before spending a fifth year learning about education (boyer 1983). inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that vam results are based on factors other than teachers’ actual effectiveness. addition to the size of the sample, a number of other factors also affect the magnitude of the errors that are likely to emerge from value-added models of teacher effectiveness. for example, one study of the tennessee data found that low-income students were more likely to benefit from instruction by a highly effective teacher than were their more advantaged peers (nye, konstantopoulos, and hedges 2004).

another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year. most teachers, particularly those teaching elementary or middle school students, do not teach enough students in any year for average test scores to be highly reliable. but they are silent on the question of what characterizes an “effective teacher. there was similar movement for teachers who were highly ranked in the first year. an invalid teacher evaluation system and tying it to rewards and sanctions is likely to lead to inaccurate personnel decisions and to demoralize teachers, causing talented teachers to avoid high-needs students and schools, or to leave the profession entirely, and discouraging potentially effective teachers from entering it. skilled principals often try to assign students with the greatest difficulties to teachers they consider more effective. need, mentioned above, to have test results ready early enough in the year to influence not only instruction but also teacher personnel decisions is inconsistent with fall to spring testing, because the two tests must be spaced far enough apart in the year to produce plausibly meaningful information about teacher effects. for example, students who do well in fourth grade may tend to be assigned to one fifth grade teacher while those who do poorly are assigned to another. in a related finding, an analysis of math achievement and dropout rates in a sample of california high schools (fetler 2001) found that schools whose dropout rates were in the highest 10 percent had 50 percent more new teachers than did schools in the lowest 10 percent. teachers also look very different in their measured effectiveness when different statistical methods are used. most of the research suggests that teachers who have had pedagogical training and who have received certification produce better student achievement scores than those who have not, although some studies dispute this finding (goldhaber and brewer 2000). tennessee and texas studies provide empirical evidence that teachers make a substantial difference in student achievement. surveys have found that teacher attrition and demoralization have been associated with test-based accountability efforts, particularly in high-need schools. as in the tennessee findings, jordan, mendro, and weerasinghe (1997) found that the difference between students who had three consecutive highly effective teachers (again defined as those whose students showed the most improvement) and those who had three consecutive low-effect teachers (those with the least improvement) in the dallas schools was 34 percentile points in reading achievement and 49 percentile points in math. recent study documents the consequences of students (in this case, apparently purposefully) not being randomly assigned to teachers within a school..The research base is currently insufficient to support the use of vam for high-stakes decisions about individual teachers or schools. in addition, some states’ efforts to reduce class size—and in so doing creating a need to increase the teacher workforce—have led to the hiring of more unqualified and untrained teachers, thus minimizing the possible benefits of lower class sizes (jepsen and rivkin 2002).  they also found that, although advanced degrees in general were not associated with higher student achievement, an advanced degree that was specific to the subject area that a teacher taught was associated with higher achievement. some comparative studies show larger gains by tfa teachers and others show fewer. another analysis of the same data confirmed that students of tfa teachers did outperform those taught by other untrained teachers, especially in math; however, they did not perform as well as new teachers who had pedagogical training and certification (darling-hammond, holtzman, gatlin, and heilig 2005). to begin with, how do you define a good teacher?. experiments are underway to determine if offers to teachers of higher pay, conditional on their students having higher test scores in math and reading, actually lead to higher student test scores in these subjects. most teachers will already have had their contracts renewed and received their classroom assignments by this time. some believe that the prospect of higher pay for better performance will attract more effective teachers to the profession and that a flexible pay scale, based in part on test-based measures of effectiveness, will reduce the attrition of more qualified teachers whose commitment to teaching will be strengthened by the prospect of greater financial rewards for success.” other research helps pinpoint the dimensions of teacher quality. a teacher who appears to be very effective (or ineffective) in one year might have a dramatically different result the following year, runs counter to most people’s notions that the true quality of a teacher is likely to change very little over time. schools and their communities have always sought out the best teachers they could get in the belief that their students’ success depends on it. research also suggests that scattering a handful of good teachers around the district is not going to produce wide-ranging results. researchers studying year-to-year fluctuations in teacher and school averages have also noted sources of variation that affect the entire group of students, especially the effects of particularly cooperative or particularly disruptive class members. even the most sophisticated analyses of student test score gains generate estimates of teacher quality that vary considerably from one year to the next. major drawbacks to these efforts were: (1) not enough attention to what was needed to retain teachers, and (2) too much attention to individuals and too little on schools (liu, johnson, and peske 2003). given all the factors related to student performance, how much impact can we expect from teachers? with the use of student test scores to evaluate teachers. surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. patterns, which held true in every district and state under study, suggest that there is not a stable construct measured by value-added measures that can readily be called “teacher effectiveness. response to these perceived failures of current teacher policies, the obama administration encourages states to make greater use of students’ test results to determine a teacher’s pay and job tenure. districts also need strategies to ensure that these schools have strong and resourceful principals and that teachers have sustained professional learning opportunities, including intensive long-term new teacher-induction programs, in which they can work with colleague to continually sharpen and upgrade their knowledge and skills. in addition, if probability is tested only in eighth grade, a student’s success may be attributed to the eighth grade teacher even if it is largely a function of instruction received from his seventh grade teacher. teachers are also likely to be aware of personal circumstances (a move, an illness, a divorce) that are likely to affect individual students’ learning gains but are not captured by value-added models. however, progress has been made over the last two decades in developing standards-based evaluations of teaching practice, and research has found that the use of such evaluations by some districts has not only provided more useful evidence about teaching practice, but has also been associated with student achievement gains and has helped teachers improve their practice and effectiveness., there is broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed. even more concerning is the array of policy statements regarding teacher preparation that have been set forth in the face of volumes of inconclusive and inconsistent evidence about what teacher attributes really contribute to desired educational outcomes.  in contrast, other studies did not indicate that teachers with graduate-level training in a content area performed better than did teachers having an undergraduate degree in their content area (rivkin, hanushek, and kain 2005; ferguson and ladd 1996). teachers also look very different in their measured effectiveness when different statistical methods are used. are a number of actions to take:Districts can step up recruitment efforts to hire teacher candidates who have strong academic credentials and who have completed a rigorous teacher preparation program. schools are collaborative institutions where teachers work across classroom and grade-level boundaries towards the common goal of educating all children to their maximum potential. is often quite difficult to match particular students to individual teachers, even if data systems eventually permit such matching, and to unerringly attribute student achievement to a specific teacher. initiated in 1990, this system provides extensive data on state achievement tests for all students in grades 2-8 in tennessee and allows for comparisons of teacher effects on students’ learning. nevertheless, a clear sense of which teacher attributes really lead to improved educational outcomes should guide these important investment decisions, particularly given the many competing policy options to enhance teacher quality, as well as other attractive education policy proposals. thus, a teacher who appears to be very ineffective in one year might have a dramatically different result the following year.  in the following sections, we review research findings on teacher characteristics that are commonly recognized measures of quality: content knowledge, teaching experience, training and credentials, and overall academic ability. addition to the size of the sample, a number of other factors also affect the magnitude of the errors that are likely to emerge from value-added models of teacher effectiveness. they are now very focused on phonics of the words and the mechanics of the words, even the very bright kids are… teachers feel isolated. however commonplace it might be under current systems for teachers to respond rationally to incentives by artificially inflating end-of-year scores by drill, test preparation activities, or teaching to the test, it would be so much easier for teachers to inflate their value-added ratings by discouraging students’ high performance on a september test, if only by not making the same extraordinary efforts to boost scores in the fall that they make in the spring. as a result, increasing scores on students’ mathematics exams may reflect, in part, greater skill by their teachers in predicting the topics and types of questions, if not necessarily the precise questions, likely to be covered by the exam. these approaches that measure growth using “value-added modeling” (vam) are fairer comparisons of teachers than judgments based on their students’ test scores at a single point in time or comparisons of student cohorts that involve different students at two points in time. nor does the research fully address evidence about teacher quality at the elementary and middle school levels, in subjects other than mathematics, or among different populations of students (such as high poverty, english language learners, or special education). school districts often fall short in efforts to improve the performance of less effective teachers, and failing that, of removing them.
although such survey data are limited, anecdotes abound regarding the demoralization of apparently dedicated and talented teachers, as test-based accountability intensifies. by using this approach, researchers are able to isolate the effect of the teacher from other factors related to student performance, for example, students’ prior academic record or school they attend. the technical and practical limitations of what test scores can accurately reflect, we conclude that changes in test scores should be used only as a modest part of a broader set of evidence about teacher practice., nonrandom assignment of students to teachers can be a function of either good or bad educational policy. a school will be more effective if its teachers are more knowledgeable about all students and can coordinate efforts to meet students’ needs. quality and student achievement: at a glanceteacher quality and student achievement: research review. concerns about statistical methodology, other practical and policy considerations weigh against heavy reliance on student test scores to evaluate teachers. the failure of policy makers to address some of the validity issues, such as those associated with the nonrandom sorting of students across schools, discussed above, would lead to even greater misclassification of teachers. quality and student achievement: at a glanceteacher quality and student achievement: research review. if used for high-stakes purposes, such as individual personnel decisions or merit pay, extensive use of test-based metrics could create disincentives for teachers to take on the neediest students, to collaborate with one another, or even to stay in the profession., psychometricians, and economists who have studied the use of test scores for high-stakes teacher evaluation, including its most sophisticated form, value-added modeling (vam), mostly concur that such use should be pursued only with great caution. overall, such teachers might be equally effective, but vam would arbitrarily identify the former teacher as more effective, and the latter as less so. most of the effective teacher studies, for example, have focused on elementary school. tennessee and texas studies provide empirical evidence that teachers make a substantial difference in student achievement. and inconclusive evidence, policy makers are side-stepping the research (or relying only on those studies that support their positions) to move forward with teacher policies, often without the benefit of research to guide their efforts. one study found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%. some comparative studies show larger gains by tfa teachers and others show fewer.., a subject-specific master’s degree) may be an important predictor of teacher effectiveness in some contexts (e. most teachers will already have had their contracts renewed and received their classroom assignments by this time. fetler (1999) found that teachers with emergency teaching certificates did not perform as well as teachers who were fully certified, even when controlling for the amount of teaching experience. a decade later the national commission on teaching and america’s future proposed major changes in teacher preparation and licensure, recommending that authority over these matters be shifted from public officials to professional organizations (nctaf 1996). even where these accounts are true, they only demonstrate that more effective teachers and schools achieve better results, on average, with disadvantaged students than less effective teachers and schools achieve; they do not demonstrate that more effective teachers and schools achieve average results for disadvantaged students that are typical for advantaged students. the department of education should actively encourage states to experiment with a range of approaches that differ in the ways in which they evaluate teacher practice and examine teachers’ contributions to student learning. willingness of policy makers and taxpayers to devote such a large proportion of education dollars to teachers highlights the undisputed importance of teachers in realizing educational goals. models cannot fully adjust for the fact that some teachers will have a disproportionate number of students who may be exceptionally difficult to teach (students with poorer attendance, who have become homeless, who have severe problems at home, who come into or leave the classroom during the year due to family moves, etc..The mathematica models, which apply to teachers in the upper elementary grades, are based on two standard approaches to value-added modeling, with the key elements of each calibrated with data on typical test score gains, class sizes, and the number of teachers in a typical school or district. schools that have adopted pull-out, team teaching, or block scheduling practices will have additional difficulties in isolating individual teacher “effects” for pay or disciplinary purposes. due process requirements in state law and union contracts are sometimes so cumbersome that terminating ineffective teachers can be quite difficult, except in the most extreme cases. began by noting that some advocates of using student test scores for teacher evaluation believe that doing so will make it easier to dismiss ineffective teachers. there are many pitfalls to making causal attributions of teacher effectiveness on the basis of the kinds of data available from typical school districts. as this review has shown, there is already enough evidence to show unequivocally that good teachers are vital to raising student achievement and closing achievement gaps. the same dramatic fluctuations were found for teachers ranked at the bottom in the first year of analysis. and such response to incentives is not unprecedented: an unintended incentive created by nclb caused many schools and teachers to focus greater effort on children whose test scores were just below proficiency cutoffs and whose small improvements would have great consequences for describing a school’s progress, while paying less attention to children who were either far above or far below those cutoffs. despite the hopes of many, even the most highly developed value-added models fall short of their goal of adequately adjusting for the backgrounds of students and the context of teachers’ classrooms. for example, low-performing schools often have weak organizational supports for teachers. one study has identified a teacher quality “tipping point” when the proportion of underqualified teachers is about 20 percent of the total school faculty. for example, with respect to teacher characteristics, hanushek (1997) identified 171 estimates related to the impact of “teacher education” on student performance..The mathematica models, which apply to teachers in the upper elementary grades, are based on two standard approaches to value-added modeling, with the key elements of each calibrated with data on typical test score gains, class sizes, and the number of teachers in a typical school or district. but who are not yet classroom teachers) should adopt the four-year liberal arts baccalaureate as a prerequisite for acceptance into their teacher education programs. legislatures should not mandate a test-based approach to teacher evaluation that is unproven and likely to harm not only teachers, but also the children they instruct. some teachers might be relatively stronger in teaching probability, and others in teaching algebra. however, because of the broad agreement by technical experts that student test scores alone are not a sufficiently reliable or valid indicator of teacher effectiveness, any school district that bases a teacher’s dismissal on her students’ test scores is likely to face the prospect of drawn-out and expensive arbitration and/or litigation in which experts will be called to testify, making the district unlikely to prevail. value-added methods can support stronger inferences about the influences of schools and programs on student growth than less sophisticated approaches, the research reports cited above have consistently cautioned that the contributions of vam are not sufficient to support high-stakes inferences about individual teachers. than two decades of research findings are unequivocal about the connection between teacher quality and student learning. districts also need strategies to ensure that these schools have strong and resourceful principals and that teachers have sustained professional learning opportunities, including intensive long-term new teacher-induction programs, in which they can work with colleague to continually sharpen and upgrade their knowledge and skills. analysis reviews a wide range of empirical studies that examine the impact of teacher characteristics on teacher effectiveness in order to draw conclusions about the extent to which these characteristics are, in fact, linked with teacher performance. that influence student test score gains attributed to individual teachers. we also need to know more about the incentives and working conditions that will attract highly effective teachers to traditionally hard-to-staff schools.• tests that assess the literacy levels or verbal abilities of teachers have been shown to be associated with higher levels of student achievement. indeed, it is just as reasonable to expect that “learning begets learning”: students at the top of the distribution could find it easier to make gains, because they have more knowledge and skills they can utilize to acquire additional knowledge and skills and, because they are independent learners, they may be able to learn as easily from less effective teachers as from more effective ones. they show strong, systematic differences in expected achievement gains related to different teachers using a variance-components model.• studies show the national teachers examination and other state-mandated tests of basic skills and/or teaching abilities are less consistent predictors of teacher performance. once in the classroom, teachers should be evaluated on a regular basis in a fair and systematic way. vam methods have also contributed to stronger analyses of school progress, program influences, and the validity of evaluation methods than were previously possible. given the importance of teachers’ collective efforts to improve overall student achievement in a school, an additional component of documenting practice and outcomes should focus on the effectiveness of teacher participation in teams and the contributions they make to school-wide improvement, through work in curriculum development, sharing practices and materials, peer coaching and reciprocal observation, and collegial work with students. further, they contend that lower achieving students are the most likely to benefit from increases in teacher effectiveness. as a result, reliance on student test scores for evaluating teachers is likely to misidentify many teachers as either poor or successful. these approaches that measure growth using “value-added modeling” (vam) are fairer comparisons of teachers than judgments based on their students’ test scores at a single point in time or comparisons of student cohorts that involve different students at two points in time.

Go HOme Sitemap