Archives For Testing

What came first…the NAEP Chicken or the CCSS Egg?

Screenshot 2016-03-29 10.37.46First, let’s define terms:

The National Assessment of Educational Progress (NAEP) is the “largest nationally representative and continuing assessment of what America’s students know and can do in various subject areas.”

The Common Core State Standards (CCSS) are “a set of goals and expectations for the knowledge and skills students need in English language arts and mathematics at each grade level.”

Back in 1992 and through 2007, the test reading framework for the NAEP centered on three broadly defined genres for assessment content: literary, informational, and document. By 2009, however, the NAEP was revised to offer eight defined genres of assessment content, part of a larger shift to separate reading content into distinct categories.  Of the eight genres in the 2009 reading frameworks, reading content was categorized into more specific forms of nonfiction: literary nonfiction; informational text; exposition; argumentation and persuasive text; and procedural text and documents. There was fiction included on the 2009 test along with selections of poetry, some of which could also be categorized as fiction.

Before 2009, a nonfiction selection might fall into any one of the broadly defined genre categories. After 2009, 5/8 of the NAEP or 63% of the reading frameworks on the NAEP test were in well defined sub-sets of nonfiction.

Now consider, while the NAEP was being revised, in 2009 the Common Core State Standards (CCSS) were in development. The CCSS designers for literacy placed an emphasis on complex informational texts (nonfiction) stating:

“Most of the required reading in college and workforce training programs is informational in structure and challenging in content; postsecondary education programs typically provide students with both a higher volume of such reading than is generally required in K-12 schools and comparatively little scaffolding.”

These designers were pushing to expand reading beyond the fiction and literary analysis that traditionally dominated the ELA classes, particularly at the high school level. This was an effort to include reading in other content areas as necessary for the post-secondary experience. As a result, there were standards developed for literacy in grades 6-12 in History/Social Studies, Science, & Technical Subjects

By 2010, 42 states had adopted the Common Core standards  and began revising curriculum to align with  the The Key Shifts of the CCSS and reducing fiction from being 50% of a student’s reading diet in 4th grade to 30% of the reading diet of a graduating senior.

The connection between NAEP and the CCSS was evident, and the recommendations in the literacy standards of the Common Core called attention to this connection:

Screen Shot 2016-03-28 at 7.51.04 PM

Six Years Later: The Rise of NonFiction

Not surprisingly, six years later, one of the anecdotal findings released from the 2015 NAEP is the increase in nonfiction assigned by teachers in both grades 4 & 8 . This  information came from a voluntary survey where teachers could select the genre they emphasized in class “to a great extent.”

In 2015, fourth grade teachers who had previously created a 25% point gap favoring fiction over nonfiction in 2011, led the reduction of fiction to 15%  in 2013 and to single digit 8% in 2015.

Similarly, in eighth grade, the 34% preference for emphasizing fiction declined to 24% in 2013, and to 16% in 2015.

Screenshot 2016-03-25 17.06.47

The Egg Hatches…and It Looks a Little Different

The truth is, all the emphasis on increasing nonfiction in schools at the expense of fiction has had an positive impact on the genre. An article in the October issue Publisher’s Weekly Moment of Truth: Trends in Nonfiction for Young Readers by Sophie McNeill offered comments from bookstore owners and librarians about the increased interest in factual prose:

Suzanna Hermans of Oblong Books & Music in Rhinebeck, New York who says,

“Common Core has raised awareness of kids’ nonfiction. We are seeing parents and teachers talking about it differently in home and at school.”

Sharon Grover, head of youth services at Hedberg Public Library in Janesville, Wisconsin, adds:

“Nonfiction has really improved in recent years. Books are more readable, with more pictures and less straight recitation of facts. Kids really appreciate that, since they have become used to reading websites and apps.”

The article also referred to the 21st Century Children’s Nonfiction Conference (2014) which advertised its aim “to display the verve and capabilities of nonfiction, and to show that it can be just as creative as fiction.”

Creative?
Verve?

All this added attention to increasing nonfiction appears having an impact on the genre itself, not only in the in quantity produced but also in the characteristics of nonfiction itself. While the nonfiction genre is generally understood to be based on real events, a statement by the Newbery Award winning children’s nonfiction author Russell Freedman seems to blur those clear lines that the NAEP and Common Core have tried to separate as distinct. Freedman has stated:

“A nonfiction writer is a storyteller who has sworn an oath to tell the truth.”

Note the word storyteller?
Can truth be that objective?

Sounds a little like non-fiction is borrowing a little from the fiction genre playbook.

Eggs and Evolution

Whether it began with the the NAEP Chicken or the CCSS egg, the pressure to emphasize nonfiction is like any other evolutionary force in nature. While the Common Core has fallen out of favor with many states, with at least 12 states introducing legislation to repeal the CCSS standards outright, the nonfiction genre is growing and responding and adapting under the current favorable conditions.

The reduction of fiction in favor of more readable nonfiction in grades 4 & 8, as evidenced by the NAEP survey, continues. The evolution of the nonfiction genre may increase readership as well, especially if engaging texts increase interest in reading in the content areas of history, social studies, science and the technical subject areas.

Today’s educators may break a few more fictional eggs, but the end result could be a better omelet.

Notice how I am trying to beat the character limit on headlines?

Here’s the translation:

For your information, Juniors: Connecticut’s Common Core State Standards Smarter Balanced Assessment [Consortium] is Dead on Arrival; Insert Scholastic Achievement Test

Yes, in the State of Connecticut, the test created through the Smarter Balanced Assessment Consortium (SBAC) based on the Common Core State Standards will be canceled for juniors (11th graders) this coming school year (2015-16) and replaced by the Scholastic Achievement Test  (SAT).

The first reaction from members of the junior class should be an enormous sigh of relief. There will be one less set of tests to take during the school year. The second sigh will come from other students, faculty members, and the administrative team for two major reasons-the computer labs will now be available year round and schedules will not have to be rearranged for testing sessions.

SAT vs. SBAC Brand

In addition, the credibility of the SAT will most likely receive more buy-in from all stakeholders. Students know what the brand SAT is and what the scores mean; students are already invested in doing well for college applications. Even the shift from the old score of 1600 (pre-2005) to 2400  with the addition of an essay has been met with general understanding that a top score is  800 in each section (math, English, or essay). A student’s SAT scores are part of a college application, and a student may take the SAT repeatedly in order to submit the highest score.

In contrast, the SBAC brand never reported student individual results. The SABC was created as an assessment for collecting data for teacher and/or curriculum evaluation. When the predictions of the percentage of anticipated failures in math and English were released, there was frustration for teachers and additional disinterest by students. There was no ability to retake, and if predictions meant no one could pass, why should students even try?

Digital TestingScantron

Moreover, while the SBAC drove the adoption of digital testing in the state in grades 3-8, most of the pre-test skill development was still given in pen and pencil format. Unless the school district consistently offered a seamless integration of 1:1 technology, there could be question as to what was being assessed-a student’s technical skills or application of background knowledge. Simply put, skills developed with pen and pencils may not translate the same on digital testing platforms.

As a side note, those who use computer labs or develop student schedules will be happy to know that SAT is not a digital test….at least not yet.

US Education Department Approved Request 

According to an early report (2006) by The Brooking’s Institute, the SBAC’s full suite of summative and interim assessments and the Digital Library on formative assessment was first estimated to cost $27.30 per student (grades 3-11). The design of the assessment would made economical if many states shared the same test.

Since that intial report, several states have left the Smarter Balanced Consortium entirely.

In May, the CT legislature voted to halt SBAC in grade ii in favor of the SAT. This switch will increase the cost of testing.According to an article (5/28/15) in the CT Mirror “Debate Swap the SAT for the Smarter Balanced Tests” :

“‘Testing students this year and last cost Connecticut $17 million’, the education department reports. ‘And switching tests will add cost,’ Commissioner of Education Dianna Wentzell said.”

This switch was approved by the U.S. Department of Education for Connecticut schools Thursday, 8/6/15, the CT Department of Education had asked it would not be penalized under the No Child Left Behind Act’s rigid requirements. Currently the switch for the SAT would not change the tests in grades 3-8; SBAC would continue at these grade levels.

Why SBAC at All?

All this begs the question, why was 11th grade selected for the SBAC in the first place? Was the initial cost a factor?

Since the 1990s, the  State of Connecticut had given the Connecticut Achievement Performance Test (CAPT) in grade 10, and even though the results were reported late, there were still two years to remediate students who needed to develop skills. In contrast, the SBAC was given the last quarter of grade 11, leaving less time to address any low level student needs. I mentioned these concerns in an earlier post: The Once Great Junior Year, Ruined by Testing.

Moving the SBAC to junior year increased the amount of testing for those electing to take the SAT with some students taking the ASVAB (Armed Services Vocational Aptitude Battery) or selected to take the NAEP (The National Assessment of Educational Progress).

There have been three years of “trial testing” for the SBAC in CT and there has been limited feedback to teachers and students. In contrast, the results from the SAT have always been available as an assessment to track student progress, with results reported to the school guidance departments.

Before No Child Left Behind, before the Common Core State Standards, before SBAC, the SAT was there. What took so them (legislature, Department of Education, etc) so long?

Every Junior Will Take the NEW SAT

Denver Post: Heller

Denver Post: Heller

In the past, not every student elected to take the SAT test, but many districts did offer the PSAT as an incentive. This coming year, the SAT will be given to every 11th grader in Connecticut.

The big wrinkle in this plan?
The SAT test has been revised (again) and will be new in March 2016.

What should we expect with this test?

My next headline?

OMG. HWGA.

Who wants to rewrite curriculum this summer?

(Anyone? Anyone?…..)

Let’s be honest. Writing  or rewriting curriculum is a ongoing process that, while necessary, is not always seen as the most positive experience. Moreover, the suggestion of spending summer days writing curriculum (paid or unpaid) may trigger range of emotions, some strangely akin to the model offered by Swiss psychiatrist Elisabeth Kübler-Ross in her 1969 book, On Death and Dying.

That model is commonly referred to as the “five stages of grief”, and those five stages have been applied to many different sciences, from the financial markets (The Five Stage of Bit-Coin Understanding in Fortune Magazine) to sports (The Five Stages of NFL Fan Grief in The Atlantic). The premise of the film Groundhog Day is that the protagonist Phil, is forced to repeat each day as he fails at each stage. Perhaps it is that kind of failure that makes educators confront curriculum revision exhibiting a range of different emotions.

Because the  Kübler-Ross model was developed to address the lack of curriculum in hospitals, the model contains language that is especially applicable to any form of curriculum in general. The descriptions in each of the five stages chronicle the emotional rollercoaster that educators at any grade level or in any content area may experience in addressing revisions to curriculum.

Five emotional stages of curriculum?

Five emotional stages of curriculum development: Denial, Anger, Bargaining, Depression, and in front, Acceptance.

 

 

1. Denial and Isolation

The first reaction to writing curriculum may be to deny the necessity of the rewrite entirely. According to Kubler Ross, this first reaction, “is a normal reaction to rationalize overwhelming emotions.” Since the overwhelming emotion from teachers at the end of the year (June) is most likely exhaustion, the idea of starting over may not generate a great deal of enthusiasm with the following protests:

“Didn’t we just do this last year?”
“I just finished the whole thing! Why does it need revising?”
“It was the snow day cancellations…I hear there is no polar vortex predicted for next year.”
“Face it…no matter what we write, we are never going to be able to get to World War II.”

2. Anger

As some approach writing curriculum in denial, others may express anger that may be aimed at inanimate objects. In the Kubler-Ross model, frustrations are directed at the PARCC or CCSS or any other state testing. The challenge to revise curriculum means confronting the incendiary topic of testing:

“There are not enough school hours to complete everything in that binder [of curriculum]to prepare students for that test.”
“Take away the tests, and I’ll deliver the curriculum.”
“It’s those tests that need to change!”

3. Bargaining

The normal reaction to feelings of helplessness and vulnerability can be seen as need to regain control, and in education the Kubler-Ross stage might be captured with statements like these:

“Forget the revision…..I promise I will be more organized next year. I plan on buying post-it notes.”
“Release me from lunch duty, and I’ll have time to deliver the curriculum as is!”
“All we need to do is practice fidelity to all of the the program(s) we have already…I mean all of them….simultaneously….”

4. Depression

In the Kubler-Ross model, sadness and regret predominate this stage of depression. Educators may recognize that they have spent less time on things that matter, and in a series of admissions, agree that something about the events of the past year went terribly wrong:

“I never get to the poetry unit.”
“Someday, I might actually teach about World War II.”
“I confess…Ithrew out the pile of ungraded papers that were in the bottom of my desk drawer since April.”

5. Acceptance

This final stage is marked by those who will finally approach the task of curriculum revision with a sense of calm and commitment. Kubler-Ross is careful to point out that this (hopefully final) stage is not a stage of happiness, but rather a stage of acceptance that can demonstrate a dignity and grace to provide a curriculum that will be carefully revised for the rest of us.

Thank you in advance to all those educators who will remain calm and accept the need to revise curriculum to meet the ever changing demands in education today. They need to get started right away because, good grief…
September is only a few months away!

During the 88th Saturday Reunion Weekend at Teacher’s College in NYC (3/28/15), author and educator Kylene Beers delivered three professional development sessions based on Notice and Note: Strategies for Close Reading, a book she co-authored with Bob Probst. Each session was overflowing with standing room only crowds.Screenshot 2015-03-28 22.29.02

During the afternoon keynote in the Nave of Riverside Church, she delivered her beliefs, and every one of the 2,100 seats was filled.

Screenshot 2015-03-28 20.51.18She opened her address with a historical connection between literacy and power by referring first to the notion that years ago a signature was all that was necessary to prove a person literate. Exploiting this belief were those in power who prepared and wrote contracts, becoming wealthy at the expense of those who could only sign their names with an “X”.

“Literacy in this country has always been tied to wealth.” Beers explained adding, “With literacy comes power, and with power comes great privilege.”

This was the theme of her keynote, that in this age of communication and messaging, literacy equals power and privilege.

Moving to the present and the communication and messaging skills necessary for the 21st Century, Beers justified improving  literacy skills to operate on digital platforms as one way to empower students, but she called into question the practice of prevention by some school districts.

“When schools say they do not want to have students develop a digital footprint,” she cautioned, “they limit their students access to that kind of power.”

Continuing to argue for the empowered students, Beers directed the audience’s attention to making learning relevant for students remarking,  “There is a problem if everything is assigned by me!” By letting students choose what they want to read, she suggested that teachers can make learning relevant for the student. Employing choice to encourage more reading, however, contrasts to the recommendations of the Common Core State Standards that students read fewer texts in order to read “closer.”

“Why fewer?” she asked the crowd of educators, “when the single best predictor is of success is volume of reading. One book for six weeks will never be as helpful as six books in six weeks.”

Teachers must let the students choose what they want to read, Beers argued, raising her voice:

“Damn the Lexiles ! The best book is the one the kid chooses to read… [a student’s] ‘want-ability’ is more important than readability.”

And what should students read? Beers asked the crowd.

“Literature.”

“In the 21st Century, the most important role that literature plays is in developing student values such as compassion and empathy,” she contended. “Brain research shows we get to that compassion best through the teaching of literature.”

Beers called attention to recent disturbing headline events that had students marginalize others: racist chants made by a fraternity, and a teenager’s suicide due to bullying.

It is the role of literature, she explained, to give the reader the experience being the outsider, the marginalized. Reading and learning from literature gives students an understanding of others and an opportunity to lead “literate lives measured by decency, civility, respect, compassion, and, at the very least, ethical behavior.”

Coming to the end of the keynote, Beers saved her scorn for the answer-driven test preparation and testing that dominates schools today:

“A curriculum built on test prep might raise scores, but it will fail to raise curiosity, creativity, and compassion.”

Beers castigated the limits of “bubbled” answers by pointing out that deep thinking never begins with an answer. In connecting back to the role of literature in education she added, “Ethics and compassion are not so easily bubbled.”

As a final invocation Beers reiterated her belief in teachers, those who have met the challenges in order to encourage all students and who never needed a mandate to leave no child behind:

“Success is not found in a test; great teachers are our best hope for a better tomorrow!”

The crowd erupted into applause paying tribute to Kylene Beers, an leader in education whose strong voice reverberated in the cathedral and whose equally strong beliefs reverberated with their own.

Testing a Thousand Madelyns

February 25, 2015 — 1 Comment

My niece is a beautiful little girl. She is a beautiful girl on the outside, the kind of little girl who cannot take a bad picture. She is also beautiful on the inside. She is her mother’s helper, fiercely loyal to her older brothers, and a wonderful example for her younger brother and sisters. She is the gracious hostess who makes sure you get the nicest decorated cupcake at the birthday party. She has an infectious laugh, a compassionate heart, and an amazing ability “to accessorize” her outfits. For the sake of her privacy, let’s call her Madelyn.

Two years ago, the teachers at her school, like teachers in thousands of elementary schools across the United States, prepared Madelyn and her siblings for the mandated state tests. There were regular notices sent home throughout the school year that discussed the importance of these tests. There was a “pep-test-rally” a week before the test where students made paper dolls which they decorated with their names. A great deal of time was spent getting students enthused about taking the tests.

Paper dollSeveral months later, Madelyn received her score on her 4th grade state test. She was handed her paper doll cut-out with her score laminated in big numbers across the paper doll she had made.

Madelyn was devastated.

She hated her score because she understood that her score was too low. She hid the paper doll throughout the day, and when she came home, she cried. She could not hang the paper doll on the refrigerator where her brother’s and sister’s scores hung. The scores on their paper dolls were higher.

She cried to her mother, and her mother also cried. Her mother remembered that same hurt when she had not done well on tests in school either. As they sobbed together, Madelyn told her mother, “I’m not smart.”

Now, the annual testing season is starting again. This year, there will be other students like Madelyn who will experience the hype of preparation, who will undergo weeks of struggling with tests, and then endure a form of humiliation when the results return. The administrators and teachers pressured to increase proficiency results on a state test, often forget the damage done to the students who do not achieve a high standard.

That paper doll created during the fervor of test preparation is an example of an unintended consequence; no one in charge considered how easily scores could be compared once they were available to students in so public a manner. Likewise, many stakeholders are unaware that the rallies, ice-cream parties, and award ceremonies do little to comfort those students who, for one reason or another, do not test well.

There is little consolation to offer 10-year-old students who see the results of state tests as the determiner of being “smart” because 10-year-old students believe tests are a final authority. 10-year-old students do not grasp the principles of test design that award total success to a few at the high end, and assign failure to a few at the low end, a design best represented by the bell curve, “the graphic representation showing the relative performance of individuals as measured against each other.” 10-year-old students do not understand that their 4th grade test scores are not indicators for later success.

Despite all the advances in computer adaptive testing using algorithms of one sort or another, today’s standardized tests are limited to evaluating a specific skill set; true performance based tests have not yet been developed because they are too costly and too difficult to standardized.

My niece Madelyn would excel in a true performance based task at any grade level, especially if the task involved her talents of collaboration, cooperation, and presentation. She would be recognized for the skill sets that are highly prized in today’s society: her work ethic, her creativity, her ability to communicate effectively, and her sense of empathy for others. If there were assessments and tests that addressed these particular talents, her paper doll would not bear the Scarlet Letter-like branding of a number she was ashamed to show to those who love her.

Furthermore, there are students who, unlike my niece Madelyn, do not have support from home. How these students cope with a disappointing score on a standardized test without support is unimaginable. Madelyn is fortunate to have a mother and father along with a network of people who see her all her qualities in total; she is prized more than test grades.

At the conclusion of that difficult school year, in a moment of unexpected honesty, Madelyn’s teacher pulled my sister aside.
“I wanted to speak to you, because I didn’t want you to be upset about the test scores,” he admitted to her. He continued, “I want you to know that if I could choose a student to be in my classes, I would take Madelyn…I would take a thousand Madelyns.”

It’s testing season again for a thousand Madelyns.
Each one should not be defined by a test score.

Graphic by Christopher King that accompanied the editorial piece "In Defense of Annual Testing"

Graphic by Christopher King that accompanied the editorial piece “In Defense of Annual Testing”

My Saturday morning coffee was disrupted by the headline in the New York Times opinion piece, In Defense of Annual School Testing  (2/7/15) by Chad Aldeman, an associate partner at Bellwether Education Partners, a nonprofit education research and consulting firm. Agitating me more than the caffeine in the coffee was clicking on Aldeman’s resume. Here was another a policy analyst in education, without any classroom experience, who served as an adviser to the Department of Education from 2011 to 2012. Here was another policy wonk with connections to the testing industry.

In a piece measuring less than 800 words, Aldeman contended that the “idea of less testing” in our nation’s schools, currently considered by liberals and conservative groups alike, “would actually roll back progress for America’s students.”

…annual testing has tremendous value. It lets schools follow students’ progress closely, and it allows for measurement of how much students learn and grow over time, not just where they are in a single moment.

Here is the voice of someone who has not seen students take a standardized test when, yes, they are very much in “that single moment.” That “single moment” looks different for each student. An annual test does not consider the social and emotional baggage of that “single moment” (EX: no dinner the night before; using social media or video game until 1 AM; parent separation or divorce; fight with friend, with mother, with teacher; or general text anxiety). Educators recognize that students are not always operating at optimum levels on test days. No student likes being tested at any “single moment.”

Aldeman’s editorial advocates for annual testing because he claims it prevents the kinds of tests that take a grade average results from a school. Taking a group average from a test, he notes, allows “the high performers frequently [to] mask what’s happening to low achievers.” He prefers the kinds of new tests that focus on groups of students with a level of analysis possible only with year to year measurement. That year to year is measurement on these expensive new tests is, no doubt, preferred by testing companies as a steady source of income.

His opinion piece comes at a time where the anti-test movement is growing and states are looking at the expenses of such tests. There is bipartisan agreement in the anti-test movement that states students are already being assessed enough. There are suggestions that annual testing could be limited to at specific grade levels, such as grades 3, 8, and 11, and that there are already enough assessments built into each student’s school day.

Educators engage in ongoing formative assessments (discussions, polls, homework, graphic organizers, exit slips, etc) used to inform instruction. Interim and summative assessments (quizzes/test) are used continuously to measure student performance. These multiple kinds of assessments provide teachers the feedback to measure student understanding and to differentiate instruction for all levels of students.

For example, when a teacher uses a reading running record assessment, the data collected can help determine what instruction will improve a child’s reading competency. When a teacher analyzes a math problem with a child, the teacher can assess which computational skills need to be developed or reviewed.

Furthermore, there are important measures that cannot be done by a standardized test.  Engaging students in conversations may provide insight into the  social or emotional issues that may be preventing that child’s academic performance.

Of course, the annual tests that Aldeman suggests need to be used to gain information on performance do not take up as much instructor time as the ongoing individual assessments given daily in classrooms. Testing does use manpower efficiently; one hour of testing can yield 30 student hours of results, and a teacher need not be present to administer a standardized test. Testing can diagnose each student strengths and/or weaknesses at that “single moment” in multiple areas at the same time. But testing alone cannot improve instruction, and improving instruction is what improves student performance.

In a perverse twist in logic, the allocation of funds and class time to pay for these annual tests results in a reduction of funds available to finance teachers and the number of instructional hours to improve and deliver the kind of instruction that the tests recommend. Aldeman notes that the Obama administration has invested $360 million in testing, which illustrates their choice in allocating funds to support a testing industry, not schools. The high cost of developing tests and collecting the test data results in stripping funds from state and local education budgets, and limits the financial resources for improving the academic achievement for students, many of those who Aldeman claims have “fallen through the cracks.”

His argument to continue annual testing does not refer to the obscene growth in the industry of testing, 57% in the past three years up to $2.5 billion, according to the Software & Information Industry Association. Testing now consumes the resources of every school district in the nation.

Aldeman concludes that annual testing should not be politicized, and that this time is “exactly the wrong time to accept political solutions leaving too many of our most vulnerable children hidden from view.”

I would counter that our most vulnerable children are not hidden from view by their teachers and their school districts. Sadly their needs cannot be placed “in focus” when the financial resources are reduced or even eliminated in order to fund this national obsession with testing. Aldeman’s defense is indefensible.

time clock americanYes, American teachers do work more hours than their international counterparts, but exactly how much more could be a matter of perception versus reality, and testing may be to blame.

A recent study comparing the number of hours worked by American teachers shows the difference in instructional time is not as significant as has been publicized in the past. Researcher Samuel E. Abrams, director of the National Center for the Study of Privatization in Education at Teachers College, Columbia University, has published his findings in a working paper titled “The Mismeasure of Teaching Time“. His research contradicts claims of American teachers working twice or even 73% more hours than their counterparts in other countries, correcting these claims by grade level to 12% (elementary) 14% (middle/intermediate), and 11% (high school).

The reason for the difference, Abrams suggests, was the the Schools and Staffing Survey (SASS) offered by the Paris-based Organization for Economic Cooperation and Development (OECD) used to collect data on this topic:

The most recent data reported to OECD is from the 2007-08 survey, which was 44 pages long and contained 75 questions.  Teaching time is the 50th question and it asks teachers to round up the number of hours. As a result, responses were often inflated.

In addition to suggesting that the process of answering 50 questions clouded the responses of teachers taking the survey, Abrams contended that the inflated time also came from a misinterpretation of “teaching time” calculated by the OECD as the “net contact time for instruction.” By definition, excluded from net contact time are activities such as professional development days, student examination days, attendance at conferences, and out of school excursions.

In applying the OECD definition of teaching time, Abrams concluded that one contributing factor to the over-estimation by American teachers was the large number of hours spent assessing students.

Using examples from school districts in Massachusetts, Abrams offered a breakdown of the time teachers spend assessing students in grades 2-8:

  • For students in grade two, 48 hours are lost to interim assessments tied to the state exams
  • For students in grades three and six, 48 hours are lost to interim assessments and 16 hours are lost to state exams in ELA and math;
  • For students in grades four and seven, 48 hours are lost to interim assessments and 20 hours are lost to state exams in ELA, ELA composition, and math;
  • For students in grades five and eight, 48 hours are lost to interim assessments and 24 hours are lost to state exams in ELA, math, and science. 

Averaging a student school week at a very generalized 35 hours means that students in Massachusetts grades K-8 could spend approximately 1.5-2 weeks of each school year being assessed. Spreading this time out over the school year may contribute to the perception of a never-ending test season.

The report considered the time American educators spend assessing students at every grade level contributed to the misperception of teaching time. More importantly, the study highlighted the disparity in pedagogical practice between the education systems in United States compared to other countries. Like so many other researchers, Abrams contrasted American schools with Finland’s school system. He noted that the difference in teaching time between the two countries was not as great as originally publicized, but that the difference of practice is the “polar opposite.” In Finland, the structure of the school day has 15 minute breaks between classes or 15 minutes of play for every 45 minutes of instruction, for a total of 75 minutes per day, with no standardized tests. The result is that Finland’s teachers demonstrate little confusion on defining teaching time.

The data provided by Abrams suggests that American teachers do work more than other teachers worldwide. Using Paris-based OECD figures to convert the percentage of time into regular 40 hour weeks means that American elementary teachers work 2.4 weeks (12%); middle/intermediate teachers work 2.75 weeks (14%) and high school teachers work 2.2 weeks (11%) more than other teachers worldwide.

If the demand for assessment is the reason for the difference,  I am confident that most American teachers could think of other things to do during those weeks other than testing.

I am sure their students feel the same way.

I just completed attending the ICT Language Learning Conference for Learning Language where ICT stands for “information communication technologies,” a term that encompasses both methods and technology resources. Here in the United States, the most appropriate synonym would be what we refer to as”IT” or information technology. (So, if you are in the US and see “ICT”, please read “IT”)

Florence side 3

The winding streets of Florence, Italy

This international conference was held in Florence, Italy, a city of amazing architecture, museums crammed with magnificent art, winding streets and incredibly narrow sidewalks. Finding the right path through the city maze was challenging.

While I was at the conference, I had an opportunity to compare my understanding of the education systems in the United States with several educational systems in other 54 countries. I was fortunate to share a presentation created with fellow educator Amy Nocton, a world language teacher at RHAM High School in Hebron, Connecticut. Our session (Blogging to Share, Exchange, and Collaborate)  highlighted how we use blogging in our instruction in grades 6-12.

Because of my own interests, I attended sessions that featured integrating technology in instruction. After a dozen sessions, I came to three important takeaways:

1. Students at every grade level are more motivated when content is integrated with ICTs;

2. Measuring the effectiveness of ICTs poses a challenge for all stakeholders;

3. Educators have limitations in integrating ICTs.

The issues in these three takeaways are the same issues that I see in the education systems in the United States. We educators know that the students enjoy using technology as a learning tool, but we are not sure which of these tools are the most effective in meeting the needs of students while delivering instruction. The concern of educators worldwide in accessing or “grading” students when they use ICTs is a major roadblock, a concern aggravated by individual comfort levels for educators using ICT. An individual educator’s aggravation may increase exponentially  against a rapidly changing technology landscape where platforms and devices change but educational systems and their filters and limitations appear to crawl towards the end of the 20th Century.

In short, we educators are never going to learn all this stuff.

I suppose it is comforting to see the same problems that American educators experience are playing out on a global scale. At the least, we are not alone.

On the other hand, it is frustrating to see that there are educators from other countries perseverating on the same problems. Everyone seems to recognize the excitement generated when ICTs are used in class, but there are choruses (and in many different languages at this conference), of “We still do not have access!” or “Are these ICTs really working?” or even “Many teachers do not know how to use the ICTs!”

Florence 4

When on this narrow path….

After several presentations, I also grew concerned that ICTs perceived as limited to assessment measurement.  A few presenters offered their research with highly scripted programs where students could be “interactive” by answering predictably scripted responses. While these scripted programs are a step more engaging than a curriculum prescribed textbook, they are only a small digital step above the pencil and (scantron) form type of response. Such controlled platforms are on the same path as the testing programs (SBAC, PARCC) being developed back in the United States to address the need, or the mandates, in measuring student understanding. Even at this conference, the message about the ability of ICTs to assess and grade may be drowning out the more creative possibilities that ICTs offer.

In contrast, I did hear a reference to student choice where a presenter, Feyza Nur Ekizer of Giza University, offered her students a chance to develop “knowledge envelopes” or portfolios to gather as much information on a topic so they would be prepared to answer with a written response on that topic. She gave her students choice in what they found on a broad topic (ex: love), and reported (not surprisingly) that the students wrote longer and more detailed responses than they ever had before in a response weeks later. Her use of technology was minimal, but the students had control over their paths of inquiry in gathering information for their “knowledge envelopes.”

Florence side 2

…or on this narrow path…

At this time in digital history, there are many platforms available for student to choose how and what to gather for information in authentic inquiry research.The presenters at this conference had done a great deal of work, and they shared their learning on the platforms they had chosen for their own inquiry. We were, as are our students, the passive recipients of information; we were on each presenter’s narrowed path.

Worldwide, our students (K-12) are far more comfortable working across platforms in gathering information (from websites, social media, blogs, and other visual/audio media) than their educators. Why would we want them to step backwards and use only what we require to prove their understanding? We should not limit the use of ICTs to assessment delivery systems when students can use ICT to create their own multi-media texts individually and collaboratively if they are given the opportunity.

...there may be little choice.

…there may be little choice.

In addition, students (worldwide!) should not have to wait for educators to become experts with ICTs when platforms are growing exponentially. Instead of trying to master the expanding field of ICTs, educators must see how the expertise they already have in a content area should be used to guide students through choice.

The role of teacher should shift to guiding students in developing content and understanding. Teachers who are skilled in a discipline’s content can help students determine the accuracy, relevancy, and legitimacy of information in developing student inquiry on topics.

ICTs must not be the exclusive means of measuring understanding, instead ICTs should be included in how students develop their understanding of content.

For students, there are many different paths (or platforms) to choose in learning content and there are certainly more paths to come. ICT should not be used exclusively to restrict students to the narrow paths of measurement alone. Based on my discussions with other attendees, there may be other educators from the conference who recognize how much this ICT path of student choice and inquiry may be narrowing unless we act to change it.

The amazing city of Florence, Italy!

The amazing city of Florence, Italy!

Students will encounter challenges in choosing ways to use ICTs as I did walking the narrow pathways on city streets of Florence witnessing amazing and magnificent sites. Through student choice in ICTs coupled with teacher guidance, students will also gain the freedom to explore those amazing and magnificent topics that interest them.

Since I write to understand what I think, I have decided to focus this particular post on the different categories of assessments. My thinking has been motivated by helping teachers with ongoing education reforms that have increased demands to measure student performance in the classroom. I recently organized a survey asking teachers about a variety of assessments: formative, interim, and summative. In determining which is which, I have witnessed their assessment separation anxieties.

Therefore, I am using this “spectrum of assessment” graphic to help explain:

Screenshot 2014-06-20 14.58.50

The “bands” between formative and interim assessments and the “bands” between interim and summative blur in measuring student progress.

At one end of the grading spectrum (right) lie the high stakes summative assessments that given at the conclusion of a unit, quarter or semester. In a survey given to teachers in my school this past spring,100 % of teachers understood these assessments to be the final measure of student progress, and the list of examples was much more uniform:

  • a comprehensive test
  • a final project
  • a paper
  • a recital/performance

At the other end, lie the low-stakes formative assessments (left) that provide feedback to the teacher to inform instruction. Formative assessments are timely, allowing teachers to modify lessons as they teach. Formative assessments may not be graded, but if they are, they do not contribute many points towards a student’s GPA.

In our survey, 60 % of teachers generally understood formative assessments to be those small assessments or “checks for understanding” that let them move on through a lesson or unit. In developing a list of examples, teachers suggested a wide range of examples of formative assessments they used in their daily practice in multiple disciplines including:

  • draw a concept map
  • determining prior knowledge (K-W-L)
  • pre-test
  • student proposal of project or paper for early feedback
  • homework
  • entrance/exit slips
  • discussion/group work peer ratings
  • behavior rating with rubric
  • task completion
  • notebook checks
  • tweet a response
  • comment on a blog

But there was anxiety in trying to disaggregate the variety of formative assessments from other assessments in the multiple colored band in the middle of the grading spectrum, the area given to interim assessments. This school year, the term interim assessments is new, and its introduction has caused the most confusion with members of my faculty. In the survey, teachers were first provided a definition:

An interim assessment is a form of assessment that educators use to (1) evaluate where students are in their learning progress and (2) determine whether they are on track to performing well on future assessments, such as standardized tests or end-of-course exams. (Ed Glossary)

Yet, one teacher responding to this definition on the survey noted, “sounds an awful lot like formative.” Others added small comments in response to the question, “Interim assessments do what?”

  • Interim assessments occur at key points during the marking period.
  • Interim assessment measure when a teacher moves to the next step in the learning sequence
  • interim assessments are worth less than a summative assessment.
  • Interim assessments are given after a major concept or skill has been taught and practiced.

Many teachers also noted how interim assessments should be used to measure student progress on standards such as those in the Common Core State Standards (CCSS) or standardized tests. Since our State of Connecticut is a member of the Smarter Balanced Assessment Consortium (SBAC), nearly all teachers placed practice for this assessment clearly in the interim band.

But finding a list of generic or even discipline specific examples of other interim assessments has proved more elusive. Furthermore, many teachers questioned how many interim assessments were necessary to measure student understanding? While there are multiple formative assessments contrasted with a minimal number of summative assessments, there is little guidance on the frequency of interim assessments.  So there was no surprise when 25% of our faculty still was confused in developing the following list of examples of interim assessments:

  • content or skill based quizzes
  • mid-tests or partial tests
  • SBAC practice assessments
  • Common or benchmark assessments for the CCSS

Most teachers believed that the examples blurred on the spectrum of assessment, from formative to interim and from interim to summative. A summative assessment that went horribly wrong could be repurposed as an interim assessment or a formative assessment that was particularly successful could move up to be an interim assessment. We agreed that the outcome or the results was what determined how the assessment could be used.

Part of teacher consternation was the result of assigning category weights for each assessment so that there would be a common grading procedure using common language for all stakeholders: students, teachers, administrators, and parents. Ultimately the recommendation was to set category weights to 30% summative, 10% formative, and 60% interim in the Powerschool grade book for next year.

In organizing the discussion, and this post, I did come across several explanations on the rational or “why” for separating out interim assessments. Educator Rick DuFour emphasized how the interim assessment responds to the question, “What will we do when some of them [students] don’t learn it [content]?” He argues that the data gained from interim assessments can help a teacher prevent failure in a summative assessment given later.Screenshot 2014-06-20 16.50.15

Another helpful explanation came from a 2007 study titled “The Role of Interim Assessments in a Comprehensive Assessment System,” by the National Center for the Improvement of Educational Assessment and the Aspen Institute. This study suggested that three reasons to use interim assessments were: for instruction, for evaluation, and for prediction. They did not use a color spectrum as a graphic, but chose instead a right triangle to indicate the frequency of the interim assessment for instructing, evaluating and predicting student understanding.

I also predict that our teachers will become more comfortable with separating out the interim assessments as a means to measure student progress once they see them as part of a large continuum that can, on occasion,  be a little fuzzy. Like the bands on a color spectrum, the separation of assessments may blur, but they are all necessary to give the complete (and colorful) picture of student progress.

At the intersection of data and evaluation, here is a hypothetical scenario:Screenshot 2014-06-08 20.56.29

A young teacher meets an evaluator for a mid-year meeting.

“85 % of the students are meeting the goal of 50% or better, in fact they just scored an average of 62.5%,” the young teacher says.

“That is impressive,” the evaluator responds noting that the teacher had obviously met his goal. “Perhaps,you could also explain how the data illustrates individual student performance and not just the class average?”

“Well,” says the teacher offering a printout, “according to the (Blank) test, this student went up 741 points, and this student went up….” he continues to read from the  spreadsheet, “81points…and this student went up, um, 431 points, and…”

“So,” replies the evaluator, “these points mean what? Grade levels? Stanine? Standard score?”

“I’m not sure,” says the young teacher, looking a bit embarrassed, “I mean, I know my students have improved, they are moving up, and they are now at a 62.5% average, but…” he pauses.

“You don’t know what these points mean,” answers the evaluator, “why not?”

This teacher who tracked an upward trajectory of points was able to illustrate a trend that his students are improving, but the numbers or points his students receive are meaningless without data analysis. What doesn’t he know?

“We just were told to do the test. No one has explained anything…yet,” he admits.

There will need to be time for a great deal of explaining as the new standardized tests, Smarter Balanced Assessments (SBAC) and the Partnership for Assessment of Readiness for College and Careers (PARCC), that measure the Common Core State Standards (CCSS) are implemented over the next few years. These digital tests are part of an educational reform mandate that will require teachers at every grade level to become adept at interpreting data for use in instruction. This interpretation will require dedicated professional development at every grade level.

Understanding how to interpret data from these new standardized tests and others must be part of every teacher’s professional development plan. Understanding a test’s metrics is critical because there exists the possibility of misinterpreting results.  For example, the data in the above scenario would appear that one student (+741 points) is making enormous leaps forward while another student (+81) is lagging behind. But suppose how different the data analysis would be if the scale of measuring student performance on this particular test was organized in levels of 500 point increments. In that circumstance, one student’s improvement of +741 may not seem so impressive and a student achieving +431 may be falling short of moving up a level. Or perhaps, the data might reveal that a student’s improvement of 81 points is not minimal, because that student had already maxed out towards the top of the scale. In the drive to improve student performance, all teachers must have a clear understanding of how the results are measured, what skills are tested, and how can this information can be used to drive instruction.

Therefore, professional development must include information on the metrics for how student performance will be measured for each different test. But professional development for data analysis cannot stop at the powerpoint!   Data analysis training cannot come “canned,” especially, if the professional development is marketed by a testing company. Too often teachers are given information about testing metrics by those outside the classroom with little opportunity to see how the data can help their practice in their individual classrooms. Professional development must include the conversations and collaborations that allow teachers to share how they could use or do use data in the classroom. Such conversations and collaborations with other teachers will provide opportunities for teachers to review these test results to support or contradict data from other assessments.

Such conversations and collaborations will also allow teachers to revise lessons or units and update curriculum to address weakness exposed by data from a variety of assessments. Interpreting data must be an ongoing collective practice for teachers at every grade level; teacher competency with data will come with familiarity.

In addition, the collection of data should be on a software platform that is accessible and integrated with other school assessment programs. The collection of data must be both transparent in the collection of results and secure in protecting the privacy of each student. The benefit of technology is that digital testing platforms should be able to calculate results in a timely manner in order to free up the time teachers can have to implement changes suggested because of data analysis. Most importantly, teachers should be trained how to use this software platform.

Student data is a critical in evaluating both teacher performance and curriculum effectiveness, and teachers must be trained how to interpret rich pool of data that is coming from new standardized tests. Without the professional development steps detailed above, however, evaluation conversations in the future might sound like the response in the opening scenario:

“We just were told to do the test. No one has explained anything…yet.”