(This post was co-authored with Arnold Packer.)
Reliability and Validity are the Alpha and Omega of testing. A test that is reliable can be counted on each time it’s given, while a valid test measures what it is supposed to. Tests that meet these two criteria are the gold standard of assessment..
For example, making someone swim 100 yards to test whether or not he can swim would be a valid and reliable test. If you sink, you flunk, and that’s true each time the test is given and is independent of who is doing the testing.
However, when teachers are trying to assess ‘soft’ skills, the waters get murky. How can we measure the ability to work with others, process information from disparate sources, communicate persuasively, or work reliably?
Consider the concept of reliability. Is an employee who is late to work one day most weeks reliable? What grade would you assign her, on a scale of 1-5? Suppose you found her explanations plausible (child care problems, for example) and you cut her slack—and gave her a 5– because you’ve been there yourself? However, another employer might give that same employee a grade of 2 or 3, because, after all, late is late.
There’s no set scale for measuring ‘working with others,’ meaning that the rating may vary depending upon who’s doing the rating. And what to one teacher is ‘persuasive communication’ may fall flat with another. There’s just no easy way to measure those all-important ‘soft’ skills.
categories: Op-ed
At first I thought that was a legalistic, hair-splitting response—until I read about a principal who did get involved, was subsequently sued by the angry parent of the offending child, and lost. That’s horrifying, but it’s the reality.
That’s 15 segments, each 8-10 minutes in length, a total of 2 hours of television, roughly. You might be interested to know what went into creating those two hours. Each piece generally entailed three days of shooting, perhaps 6 hours of videotape each day. That 6 (hours) X 3 (days) X 15 (segments) = 270 hours in all. 
