Monthly Archives: October 2015

Yes, Mr. President, Tests Must Be “Worth Taking”

I am glad President Obama has admitted that there is too much standardized testing in our public schools. He has produced a “Testing Action Plan” to address this issue. He wants our testing policies to be smarter, and he and his advisors have come up with a set of guiding principles to that end. Every one of the principles is both something I agree with and something the administration’s policies directly contradict, but it would take hundreds of thousands of words to address all that, so I will focus on just the first principle, that tests must be “Worth Taking”.

This, in fact, is the essential issue. The fact that there are too many tests is just a symptom. One sentence in the definition of this principle gets to the heart of the matter: “And assessments should provide timely, actionable feedback to students, parents, and educators that can be used to guide instruction and additional supports for students.” Indeed.

A problem with any large scale standardized test is that the results are never immediately available. The further the test departs from a pure multiple choice model, the longer it takes to score, and the greater the time gap becomes. The results of the standardized tests currently in use are generally not available for months. Even someone who has never taught should be able to understand that “feedback” that is not received until the next semester is by definition neither “timely” nor “actionable”.

On top of that, for most of these tests, students and teachers are not allowed to see which questions were answered incorrectly. There is no way to know whether the student actually wasn’t able to answer the question, or just misread it, or filled in the wrong bubble on the answer sheet, or was too tired to concentrate by that point. A test score is just a number. It is of no instructional use without detail and context.

Let us drop the facade that these tests are intended to be of any use to the students who take them or their teachers. If they are of use to anyone, it is policymakers and high level administrators who want some way to compare the performance of students, teachers, and schools. That is a perfectly understandable goal, and I actually support it. I just don’t think these tests are the best tool for achieving it.

We already have a more reasonable test for this purpose, the NAEP, which uses sampling and has no stakes automatically attached to it, and for which no one does any test prep. It provides information that is as reliable as most of the best standardized tests, which is to say, fairly reliable. I am not aware of any standardized test that has ever been proven to provide truly reliable information about anything other than socioeconomic status.

I have purposely used the word “test” and not “assessment”, because these words are too often used interchangeably. There are many kinds of assessment other than tests. Tests and reports and anything else that is essentially paperwork will never give the full picture of what is happening at a school. You have to actually go there and observe and talk to people, and you have to do it for a significant period of time, and more than once, and sometimes without warning. It does not have to be more complicated or more expensive than the thousands of hours and billions of dollars that have been spent on testing companies and consultants in recent years. Similar things have long been successfully done in countries such as the UK.

Mr. President, the problem is not that there are too many tests. It is that the tests, by your own definition, are not worth taking. Why is it so difficult for otherwise intelligent, thoughtful people like you and your advisors to understand that?

Data is not a thing.

Many supposedly highly educated people do not understand data. They use it all the time, rely on it, even revere it to a level that approximates worship, but since they don’t understand it, it becomes merely fuel for confirmation bias. Any study that tells them what they want to hear is “hard data”, and any study that doesn’t has “flawed methodology” or is otherwise suspect.

In education, one of the current buzzwords is “data-driven”. As with most buzzwords, its meaning is unclear. The picture that comes most immediately to my mind is of those drivers who go off a bridge or into a lake because their GPS told them to, but I actually support the concept, if not the name. I just don’t think it is anything new. The tools may be new, but not the concept.

The fundamental thing about data that most people don’t seem to get is that a data point is not actually anything in and of itself. It is a symbolic representation of something, and like all symbols, it can mean a lot of different things. The data-related skill a teacher needs to use is data interpretation, and this is something all good teachers have always done.

If one student fails a multiplication test, it may be because she doesn’t understand multiplication, or because her mother just died, or because the boy behind her kept pulling her hair during the test, or for any number of other reasons. A good teacher who knows her students will have a pretty good hypothesis and then investigate. If a whole class fails a multiplication test, it may be because they don’t understand multiplication, but it may be because the test was confusing, or because there was a big earthquake that morning and they couldn’t concentrate. Again, a good teacher will have a pretty good hypothesis and then investigate.

A single test score is rarely a reliable indicator of anything. Aggregated test scores more frequently indicate something, but what exactly they indicate is not obvious. This is more true the more impersonal and standardized the test. I have written questions for standardized tests. It is an extremely difficult thing to do well, because you can never predict all the possible ways a test taker could misunderstand the question, and no question ever tests only one skill. For example, a teacher who knows his students well can craft math questions that use only vocabulary the students know, phrased in ways the students will understand. The teacher can then be fairly sure that the questions are actually measuring the students’ ability to do the math. This is not true of a test developer unaware of those particular students’ language skills.

Many people discussing education these days use the term “data” to mean “test scores”. They also use “technology” to mean computers and smartphones and the like. Both usages annoy me. A pencil is technology too. So is a book. In the same vein, the fact that a particular student’s father is in jail is data. The fact that another student’s mother has cancer is also data. Neither is a reason not to try to help the students do their best work, but both are possible explanations for a student’s low performance on a particular test, or a particular series of tests. This is one of the many reasons no test should have high stakes attached to it. All tests provide data, but not all of it is of any use, and there are other kinds of data that matter at least as much.

This is why I don’t like the term “data-driven”. What is the alternative? Utter obliviousness? All good teaching is guided by data, but that data may come from many sources. If a teacher needs a computer algorithm to know how his students are doing, then either he is not a very good teacher, or he has too many students. (I will digress here to mention that the studies I have seen that “prove” that class size doesn’t matter tend to look at the difference between having, say, 18 students in a class and having 22. No, at that size, the difference is not significant. Why don’t they look at places like the LAUSD, where class sizes of 35-40 are not unusual? Is there a difference between 20 students and 40? You betcha.)

I’m not saying computer algorithms can’t be useful for teachers. They can, but they are not reliable enough to supersede good judgement based on daily interactions with students and all the other data available to teachers in the course of their work. One of the rallying cries of people who protest high-stakes tests is, “Students are not data points!” The reverse is also true. Data points are not students. Test scores are just representations of particular students’ performance on particular measures at particular times. They may indicate situations to investigate, but they are not sufficient evidence to declare students or their teachers failures.

Here is some data:

Self-described education reformers have used many millions of dollars to successfully influence education policy over the last couple of decades. While claiming to want to make teaching a more attractive career, they have promulgated a narrative that blames the gross inequities in our public education system on lazy, uncaring, unintelligent and/or racist teachers who are too hard to fire, and the unions that protect them at the expense of children. They claim that high-stakes tests are the best way to determine who these bad teachers are.

Experienced, dedicated teachers, even the recipients of prestigious awards, have been leaving the field in droves, some even discouraging young people from entering it. Now we are seeing reports of growing teacher shortages.

Are these events related? There is not enough information here to be sure, but I could make a pretty good hypothesis.

Watch Longmire!

Everyone who appreciates good storytelling should watch Longmire, now on Netflix. It is a magnificent show, one of my absolute favorites. I was devastated when A&E cancelled it after three seasons, and thrilled when Netflix picked it up. The fourth season is currently available on Netflix, along with the first three, and fans are having an event tomorrow to encourage them to renew it for a fifth. I intend to participate.

I am not sure whom to credit with the great writing on the show. It is based on books by Craig Johnson, which I have not yet read, but intend to, and Johnson is given writing credit along with show creators Hunt Baldwin and John Coveny and a number of other writers on individual episodes. I imagine it is a collaborative effort. However it is done, the result is spectacular.

The actors are, without exception, excellent. Robert Taylor so completely inhabits Sheriff Walt Longmire of Absaroka County, Wyoming that I had no idea he was Australian until long after I became addicted to the show. I would have guessed he was from no farther from Wyoming than maybe Montana. He also has the great actor’s ability to make clear what the character thinks and feels with minimal dialogue (Walt is a man of extremely few words) and to make me care deeply about what happens to him. Lou Diamond Phillips is equally believable as Walt’s more socially adept best friend, Henry Standing Bear, and Katee Sackhoff makes full use of the potential of perhaps the most layered, complex and unusual female character on television, Deputy Vic Moretti. I almost hate to single anyone out though, because the whole cast is wonderful.

Although there are funny moments, Longmire is by no means a lighthearted show. It is emotionally intense, but one of the things I love about it is that it has very little of the graphic violence and general grossness that permeate so much of modern entertainment. I seem to be less able than most people to become desensitized to that sort of thing. I admit there have been a few moments in a few episodes when I had to look away, but very few. The intensity comes from the realistic human drama and the powerful performances.

Even if you think you have no interest in the life of a Wyoming sheriff, I urge you to give this show a try. Start from Season 1, Episode 1, because the plot is complex, and there are many layers of subtext. I defy you not to want to see the next episode, and the next, and the next…

