Forgive me for being contrary: I know I threw a few friends when I wrote last week we shouldn’t assess projects in PBL (though my full argument was far more nuanced than my headline/thesis), and now I know I take the risk of irking more friends by making the argument which follows.
Among the many caveats to my argument, I’ll prioritize these two:
First, I too am appalled by the misuse and abuse of current or future standardized testing, particularly in regards to punishing schools and teachers. What Bill Ferriter wrote recently on this topic is nothing short of brilliant. “It’s time that you start asking your policymakers some difficult questions about their positions on value-added measures of teacher performance. If Jackson is right, those policies — which have rapidly become the norm instead of the exception in most states in America — are wasting our time AND our money.“
I want quality testing to be used for meaningful purposes: advancing student learning, not teacher-bashing.
Second, these important advances in testing are certainty not the end of the line; they don’t represent a complete arrival at a place of testing excellence. They are instead a significant and meaningful advance from the status quo toward that place of excellence, an advance I think we should applaud. For more on the continued advances needed, see this recent Edweek post and the report from the Gordon Commission on the Future of Assessment in Education upon which it is commenting.
But here goes: Common Core Assessments PARCC and SBAC (Smarter Balanced) tests shouldn’t be any shorter in their time duration than they are planned to be.
1. Because we shouldn’t be so quick to call this testing time “lost” to teaching and learning. In even only a moderately good testing experience, testing time is learning time– sometimes superior learning time.
2. Because these new tests assess in ways far more authentic and meaningful than any previous generation of standardized K-12 educational tests, and assess the deeper learning our students greatly need to learn to be successful (learning which far too few are indeed learning), assessment information we need to improve their “deeper learning.”
But both of these things will be compromised or lost if the tests get any shorter.
The length of these tests is being hotly debated and combated.
Edweek published last week a short article about the duration of the tests, and it is worth reviewing.
New tests being designed for students in nearly half the states in the country will take eight to 10 hours, depending on grade level, and schools will have a testing window of up to 20 days to administer them, according to guidance released today.
The tweets which followed the Edweek piece were not at all positive: the following tweet is entirely representative of the attitude in the feed of tweets about the Edweek post, although it is not entirely representative of the tone of those tweets, because many were more vulgar.
Let me flesh out my argument:
1. We shouldn’t be so quick to call this testing time “lost” to learning: in even a moderately good quality testing experience, it is quite the opposite.
I don’t believe that time spent taking a good test is “time away from learning.” It doesn’t even have to be a great test– just a good test will do. When I look back at my K-16 education, I am certain that on average, I learned more, was more engaged, more challenged, more interested, more analytical and creative, when I was taking a half-decent test than I was when I was sitting in class watching a teacher talk in the front of the room.
Quite often– though not always– my test-taking times as a student were among the very most intellectually exciting and growth-oriented events and experiences in my education.
As a teacher, I saw this too: I saw my students rise to the opportunity to show me what they knew, to share with me their thinking, to probe deeper and understand more in the course of taking a test than in any other time in their learning.
We call this kind of quality assessment experience “assessment AS learning.”
Please, let’s not keep thinking of test-taking time, when it is a good test, as time “away from learning.”
What about these tests in particular, PARCC and SBAC? Why should we think they will be of even moderate quality when their predecessors have been so poor?
Well, it is a matter of public record that Smarter Balanced testing is being designed, in part, the key performance task parts, by the smart people at CWRA, the College Work Readiness Assessment/CAE. (I have reason to think too that PARCC testing is being influenced by the CWRA model as well.)
Very few educators, public or private, are very familiar with CWRA, but let me share with you what my students say about the CWRA test-taking experience.
Let me quote one of my students from this video:
I actually learned from my [experience of the] test; I am not a really “science-y” kind of person and mine had a couple of science questions in it, and I had to read through the documents and I learned a lot from the mock interviews and I thought that was really interesting because I don’t find that with normal tests.
If we shorten the lengths of these tests, almost inevitably what will be cut is the time for the more open ended, more authentic, more interesting and engaging performance tasks: in other words, the time when students have the opportunity to be more engaged and actually find themselves learning from the experience of the test.
2. These new tests assess in ways far more authentic and meaningful than any previous generation of standardized K-12 educational tests, and assess the deeper learning our students greatly need to learn to be successful (learning which far too few are indeed learning).
How do I know? See this January Edweek report on a fascinating independent UCLA study of deeper learning and PARCC/SBAC. (I’ve embedded the full UCLA report at bottom of this post). Note that this was funded by the Hewett foundation, which has established itself as a highly credible and deeply committed supporter of “deeper learning.”
What is meant here by “deeper learning?” The report explains it in the executive summary this way:
Study results indicate that PARCC and Smarter Balanced summative assessments are likely to represent important goals for deeper learning, particularly those related to mastering and being able to apply core academic content and cognitive strategies related to complex thinking,communication, and problem solving.
They they spell out the four levels of Deeper learning from the Webb scale, which they used for their analysis:
Webb’s system categorizes DOK (Depth of Knowledge) into the following four levels, essentially:
DOK1: Recall of a fact, term, concept, or procedure; basic comprehension.
DOK2: Application of concepts and/or procedures involving some mental processing.
DOK3: Applications requiring abstract thinking, reasoning, and/or more complex inferences.
DOK4: Extended analysis or investigation that requires synthesis and analysis across multiple contexts and non-routine applications.
I’m stipulating here that these deeper learning skills are of extreme value for our students and for their future success. I think I can stipulate too that these deeper learning skills are, simply stated, not being mastered by nearly enough of our K-12 learners. I could take more space here to share statistics to this effect, but I’m choosing not to, although I will point to one of the reference sources on this topic I cite most frequently, the book Academically Adrift.
If we can agree we want students to master this greater depth of knowledge, and if we can agree that not enough of our students are doing so, then the value of better discerning whether our students are or are not doing so becomes clear. Surely we want to know as educators first, whether our students as a whole are, generally, mastering deeper learning, so as to adjust the course of our programs accordingly, and second, which of our students are and which aren’t, so as to intervene accordingly.
This is not any lesser the case for our most innovative educators. As regular readers here surely recognize, I argue frequently here for flipping instruction in more ways than one, and believe we should have vastly more experimentation, iteration, and learning from failure in our instructional practices.
But, those of us trying hardest to transform learning need to know and need to be able to demonstrate that our instructional innovation is working, and hence need to have the data demonstrating students are indeed learning the deeper knowledge we want and seek for them.
And these new tests will reveal this information far more effectively than any previous generation of testing.
Edweek explains the findings of the research:
The center concludes that the assessments hold a lot of promise for improving teacher practice and student learning.
In examining the potential rigor of the coming tests, [researchers] Herman and Linn were guided by Norman Webb’s “depth of knowledge” classification system, which assigns four levels to learning, from Level 1, which features basic comprehension and recall of facts and terms, to Level 4, which involves extended analysis, investigation, or synthesis.
Herman and Linn examined the work so far of the Partnership for Assessment of Readiness for College and Careers, or PARCC, and the Smarter Balanced Assessment Consortium for signs that they would demand the kinds of learning at Levels 3 and 4 of the so-called “DOK” framework.
The researchers found reason for optimism that the assessments will demand those skills. They singled out, in particular, the more lengthy, complex performance tasks being crafted by the two groups, saying they seemed likely to assess skills at DOK Level 4.
What does this compare to, you might ask? How do we know this represents real progress? Because the UCLA researchers thoroughly compared the new tests, based on their released prototype questions which we have to trust will be representative of the test when released, to the best of current standardized tests.
Herman and Linn noted a RAND study from last year that examined released items from 17 states reputed to have challenging exams and found “depth of knowledge” levels overwhelmingly [only] in the 1s and 2s in mathematics, and those in English/language arts a bit more rigorous.
While the unfinished work of the two consortia can’t be directly compared with existing state tests, they said, the two groups still appear to be on track to creating tests that are more rigorous than what most states currently administer.
The CRESST report itself explains further:
Overall, [currently] 3-10% of US elementary and secondary students were assessed on deeper learning on at least one state assessment.
The situation will be very different if Smarter Balanced assessments reflect its content specifications and PARCC follows its current plans. By RAND’s metric, fully 100% of students in tested grades using consortia tests will be held accountable for deeper learning.
As you look at a question like this, ask yourself: what part of this test-taking experience do I want to cut away the time from? Why? This is learning and testing.
Here’s another one:
But we have reason to fear the testing time will be cut, with the result that the deeper learning assessment and deeper learning experience will be cut also. The UCLA report argues, pleads in a sense, against cutting:
The performance tasks themselves represent a potential danger point. Some of the Chief State School Officers in the Smarter Balanced Governing States, for example, pushed back on Smarter Balanced initial plans for multiple performance tasks over several days because of time demands and the cost burdens of scoring.
Let’s not cut the time of these tests– and let’s do what we can to support their use for meaningful assessment as deeper learning and of deeper learning.