High Stakes Standardized Testing in America: The History

Are they being tested too much?
When is enough is enough?

This essay will explore the history of testing in American education, the introduction of standardized testing in American Secondary Education, the philosophical underpinning of these events, the historical perspectives, and the ethical standpoint that led to where we are today. In addition, the essay will also touch on the ontological perspectives, axiological perspectives, and the epistemological perspectives regarding testing and what it means for students to KNOW something that they were taught and how we measure/define knowing.

The frequency with which students are assessed for content understanding and general attainment of information has been rising in the American Education System (Madaus & Clarke, 2001). Currently in Most County Schools students are tested 8-14 times per semester. Those are just state and county mandated tests including benchmarks, Students’ Learning Outcomes (SLOs), and high school graduation tests. When teacher created tests and quizzes are included, an individual student taking a four class load in a block schedule would have been tested 20-25 times by the end of each semester. It is my belief that this frequency of testing is excessive. Subjecting students to this high frequency of testing and the magnitude some of these tests bear each testing period is ridiculous to say the least. In my view, I do not believe that the high-stakes tests and the frequency in which they are offered improve accountability for teachers, administrators or school districts. If this was the case, Finland would not be ranked number 1 in the world for science and mathematics since Finland only test its secondary students just twice in their secondary education careers (Washington Post, 2012).

In America, policymakers argue that in-order to improve students’ performance teacher, administrators, and school districts need to be held accountable for students’ achievement (Ravitch, 2002).  However, the frequency under which these tests are offered has been found to be associated with students not taking testing in general seriously anymore (Ravitch, 2002). Despite of the increase in testing frequency, American students’ scores when ranked with their peers in the developed world around the world has been declining steadily (Washington Post, 2012).

Testing and methods for measuring students understanding of content can be traced to the Socratic era in ancient Greece. During the Socratic era, students were asked to respond to questions posed by their instructor to gauge their understanding of concepts and to encourage their critical thinking. Socrates used a dialogue between himself and his students to gauge their understanding and to help them create their own understanding of concept. Even before Socrates, conversational dialogue was used to assess students’ understanding and knowing (Frost, 1989).

Testing in the American Education System was modeled after education systems in Europe.  Colonists brought the idea of testing with them when they founded schools in the newly formed United States (Urban & Wagoner, 2009). Testing can be traced directly to the one-room schools and the church schools in colonial America (Urban & Wagoner, 2009). Even apprenticeship schools used testing to gauge mastery of student’s learning. Testing was never used to evaluate the teachers’ effectiveness at that time and students who failed were deemed to be incapable of learning and therefore were subsequently left behind (Madaus & Clarke, 2001).

In the late 1800s, prestigious universities including Harvard, Princeton, Johns Hopkins, and Yale introduced college entrance examinations as a basis for admission.  Other universities did not have this requirement.  To further complicate the issue, each prestigious university had its own separate entrance exam.  The different requirements for admission at each of the universities, led school principals and parents to complain that is was difficult to prepare students for the multitude of college entrance exams at these universities.  To harmonize the process, the College Entrance Examination Board was created to prepare and oversee a single test for college admission (Urban & Wagoner, 2009). This was the beginning of the standardized tests phenomenon that we see today in the education system in America.

In the early 1900s teachers also were required to take entrance exams. But, once they were interviewed and offered a job with an interviewing panel that included a clergy and the local school board members, a teacher would never again be subjected to testing related to their performance, suitability and/or capacity to teach. Testing for results based accountability in the American education is a contemporary phenomenon (Ravitch, 2002).

Moreover, the early 1900s was a tumultuous time in education. This is the time when educational psychology was introduced into the education field. Education psychologists believe that there is a need to justify education as a scientific endeavor. Thus, demonstrating that education can be measured through experimentation and testing was a major aim of educational psychologists at the time.  The leading educational psychologist of the early 20th century, Edward L Thondike, was determined to demonstrate that education is an exact science through education testing. Most education psychologists of the 1920s and the 1930s were heavily interested in devising a testing instrument to help teachers diagnose students’ understanding of concepts and consequently to develop interventions based on data. However, the educational psychologists of the time never intended for their tests and data accumulated from the testing to be used for educational accountability.

The 1930s witnessed the Great Depression. Due to e economic hardship of the period, education progressives gained huge influence. They wanted schools to be friendly to students who were not interested in traditional schooling. Educational progressives of the time cared more about students’ adjustment in schools. The emphasis on a child’s social adjustment took the front seat over grades, subject mastery and discipline (Urban & Wagoner, 2009). They started using the testing instrument developed by educational psychologists to identify and develop remedial education for disinterested kids in schools. These educational progressive felt that education was a right for all children and believed in the philosophy that every child can learn. This era was the beginning of social promotion as we know it today. All these events happened at the time when there were no job to be had by high school dropout during the peak of the depression and therefore keeping kids in school was a better option at the time. The testing that was done during this period was mainly to inform teachers where students were and how to device learning goals to help them learn. The data collected had no bearing to student’s promotion nor was it used as a tool for evaluating the performance of teachers, administrators, or school districts.

The 1950s and early 1960s were a special time in American education. From the Sputnik report, the decisions of Wade vs. Board of Education, and the release of the book “What Ivan knows that Jonny Doesn’t?” created an atmosphere for educationists and policymakers to try to find answers to what was perceived to be going wrong with the education system in America (Urban & Wagoner, 2009). The reports, the book, and the decisions for equal education opportunities for all Americans lead in one way or another to the introduction of data driven accountability in the American Secondary Education System.

The report by sociologist James Coleman in 1966 entitled “Equality of Educational Opportunities” was the landmark report that started to pique the interest of policymakers in using achievement data to hold teachers, administrators, and districts accountable for student’s low performance. The report was significant in many ways including its emphasis on a shift from input oriented education system to results oriented education system. Prior to this report, educationalists believed that many of the low achieving problems in the school systems will eventually be eliminated through more funding. The Coleman report shifted the emphasis onto accountability. This shift led many policymakers to start examining how school resources affected student’s performance and achievement. The 1960s was a very interesting time in America. Events such as the civil rights movement provided most of the impetus to what was happening in the education system. The drive for education equality and opportunities for all Americans led to more scrutiny on student’s score data. The gap that existed and that continues to exist between white Americans and other minorities groups especially African Americans, pushed for accountability in education to improve achievement for the racially disadvantaged groups.

The establishment of the National Assessment of Education Progress and the Department of Education in the 1970s also led to a shift from inputs (resource) to outputs (results).  This shift was fueled by the readily available testing data which allowed policy makers to compare student achievement across regions and ethnic groups.  The international testing of mathematics and science provided even more data on how American secondary school students faired when compared to students from other industrialized nations. The fact that American students performed poorly on mathematics and science tests when compared to other industrialized countries added more pressure for policymakers to tie student’s achievement to teachers, administrators, and districts and to hold them accountable for poor student performance. 

The 1960s and 1970s also witnessed a growing tension between the professional educators who believed in the input model (resources will solve the underachievement problems) and the policymakers’ output model (results and accountability will drive instruction). Public pressures from parents, stakeholders and policymakers to see improvement in the low achievement scores among minority groups have kept the focus on using standardized testing for accountability. In the 2000s, laws like “The No Child Left Behind Act” and “Race To The Top,” new evaluation systems such as Teacher Keys were introduced.  These laws and evaluation systems placed renewed emphasis on using standardized testing as a mechanism for accountability.

Currently, there is a war between these two camps or paradigms in the American Education System. On one hand, the results from accountability and data driven evaluation have shown some promise in states such as Massachusetts, Virginia, Texas, and North Carolina (Ravitch, 2002). The achievement gap between blacks and white students in these states has narrowed after the introduction of results based assessment for teachers, administrators, and school districts. However, elsewhere in the country the results are mixed, and in many states and districts across the country, the achievement gap between the racial groups—whites and Asians on one hand and blacks and Latinos on the other—is widening even faster.  On the other hand, professional educators argue that more resources are needed to narrow this achievement gap as educational budgets have been continually slashed over the past decade.

Presently, American education will continue to be dominated with these two paradigms: the professional education paradigm who believes increased resources will solve the problems and the policymaker paradigm who believes public education should follow the business model of incentives and sanctions based on performance. As the war wages on, whatever paradigm wins will determine the direction that the American education system will go. In my view, it is going to be very difficult to change the current testing culture to include performance-based assessment that measure what students’ can do. The pressure put forth by the testing companies, businesses and universities who are profiting magnificently from the current testing environment is too great for policymakers to ignore. I am most definitely sure that all the testing companies will join hands to fight tooth and nails whoever is trying to change the current system that is benefiting them greatly (Frediriksen, 1984).

While the battle rages on, both camps need to realize that:

  • Throwing money at education by and in itself rarely produce results. To achieve improvement system-wide, focused approach and long-term strategies are needed.
  • Good teachers are essential to high-quality education. Treatment of teachers as valuable professionals including a living wage will be helpful.
  • The cultural assumptions and values surrounding education can do more to support or undermine it.
  • Education system should strive to keep parents informed and work with them. Parents are neither impediments to nor saviors of education.
  • Education systems need to consider what skills today’s students will need in future and teach accordingly. Teaching for the present job opportunities is a disservice to our young people because most of jobs they will be working on are possibly not created yet.

There is no argument that knowledge is important.  The question, however, is how we assess that knowledge.  While in today’s school environment standardized testing is the main method utilized for assessing students’ knowledge, Socratic dialogue and other dialogue techniques is a better method for assessing student knowledge.  For the Greeks being able to articulate concepts and being  able to do the task or the skills associated with the learning experience was  a basis for ensuring students had adequately grasped the concepts conveyed by the teacher (Frost, 1989).  In contrast, standardized tests merely diagnose what students have learned on a prescribed curriculum rather than what they can do or perform (e.g. report writing, synthesizing information, conducting basic and advanced research topic).  These tests are therefore, limited in their ability to truly measure what students have learned during a course (Madaus & Clarke, 2001).

For me, the pendulum has swung too far over to using standardized tests as measures of accountability.  Instead, I would like to see more performance based testing used in the classroom which measures what students can do with their knowledge and less standardized testing which only simplistically measures recall of basic information.  Others, however, have argued that performance based assessment also has limitations including time constraints, resource constraints, and the training required to effectively assess students’ knowledge with these methods (Linn, 2013).  While I acknowledge these potential limitations, I firmly believe that performance based assessment is a critical component of a comprehensive assessment of student achievement based on my years of teaching. The use of standardized testing as the sole method for assessing student performance is inadequate and short sighted.  If we truly want to understand if students have absorbed the material and are able to apply this knowledge in their everyday lives, we need to include performance based testing as part of a comprehensive assessment strategy. 




Best Education in the World: Finland, South Korea Top World Rankings, U.S. Ranked Moderate (2012). Washington Post: Accessed: http://www.huffingtonpost.com/2012/11/27/best-education-in-the-wor_n_2199795.html

Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning.

          American Psychologist 39(2), 78-81.

Frost, S. E. (1989). Basic teachings of the great philosophers. Garden City, NY: Random House, Inc.

Madaus, G. F., & Clarke, M.(2001). The adverse impact of high stakes testing on minority students: evidence from 100 years of test data. In G. Orfield and M. Kornhaber (Eds.), Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation.

Ravitch, D.(2002). A brief history of testing and accountability. Accessed: http://www.hoover.org/publications/hoover-digest/article/7286

Urban, J. W., & Wagoner, L. J.(2009). American education: A history. New York, NY:  Taylor & Francis.

Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 70(9), 703-713.

Wiggins, G. (1988). Rational numbers: Scoring and grading that helps rather than hurts learning. American Educator, 20(25), 45- 48




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s