Performance Task Assessment, sometimes referred to simply as Performance Assessment, is coming soon in a substantial and significant way to K-12 schooling;  21st century principals and other educational leaders would do well to familiarize themselves with this method and began to make plans for successful integration of this new, alternative format of assessments.

[the following 10 or so paragraphs lay out some background for my “10 Things;” scroll down to the section heading if you want to skip over the background discussion]

President Obama and Secretary Duncan have been assuring us for several years that they will take standardized testing “beyond the bubble,” and both PARCC and Smarter Balanced are working hard at developing new common core assessment using the format of perfomance task assessment.

As PARCC explains,

PARCC is… contracting with [other organizations] to develop models of innovative, online-delivered items and rich performance tasks proposed for use in the PARCC assessments. These prototypes will include both assessment and classroom-based tasks.

Smarter Balanced, meanwhile, states that by 2014-15,

Smarter Balanced assessments will go beyond multiple-choice questions to include performance tasks that allow students to demonstrate critical-thinking and problem-solving skills.

Performance tasks challenge students to apply their knowledge and skills to respond to complex real-world problems. They can best be described as collections of questions and activities that are coherently connected to a single theme or scenario.

These activities are meant to measure capacities such as depth of understanding, writing and research skills, and complex analysis, which cannot be adequately assessed with traditional assessment questions.

Samples of the performance tasks being developed for grades K-8 are available here.  One quick example, from a 7th grade assessment, provides students a series of videos and articles with information  about bottled water, and then poses them the following:

Task 5: Argument essay: Should our school ban bottled water or not?

Students will be prompted to write an argument essay, revised from their first draft or completely re-envisioned, in which they craft an argument, and provide reasons and information supporting that argument, on the topic of whether their school should ban bottled water or not.

These new performance task assessment approach is something with which I’ve developed familiarity with and fondness for due to my administration over the past three years as a high school principal of the College Work Readiness Assessment (CWRA), which has high school freshman and seniors undertake a roughly two hour exam primarily composed of a single, open-ended, (relatively) authentic (or authentically situated) essay, written in response to a prompt and a series of documents.

CWRA is an exact parallel (indeed, it is word for word the same test) to the far more prominent and widely used Collegiate Learning Assessment (CLA), employed by more than 300 colleges and universities nationally.   CLA and CWRA are administered by an organization called Council for Aid to Education, (CAE), which has recently been contracted by OECD to develop assessments of universities across Europe and the OECD membership.  CAE is also participating in the consortium Smarter Balanced is assembling to develop these new performance tasks.

CLA, the Collegiate Learning Assessment, received a fair amount of attention about 18 months ago when a book, Academically Adrift, was published, drawing inferences and conclusions about the effectiveness, and more pointedly, lack of effectiveness of many colleges and universities in improving the higher order thinking skills of their students during their college years.  (I wrote about that book here and here).

Because that book reports on differentials between university educational elements which contribute to successful and unsuccessful results on the CLA performance task assessments, it is a valuable resource for K-12 educators seeking to better prepare their students for these types of assessments, even if lessons need to be drawn from across the secondary/postsecondary divide.

Because the CLA is controversial among some university educators who oppose the external measurement,  its administrators accordingly have prepared several reports defending against their critics and advocating for their approach.  These reports also function as advocacy for their work to provide their services to European universities, (where what we call “higher order thinking skills” are called “generic skills,” presumably because they are not attached particularly to individual academic disciplines but are “generic” to all academic disciplines).   One such report is a fascinating white paper, “The Case for Generic Skills and Performance Assessment in the United States and International Settings.”

Though this report is written as advocacy for the adoption of performance task assessment in North American and European universities, much of the report can be read as advocacy for its adoption and wider implementation in K-12 learning.  In my following discussion, I am doing drawing upon that report and other recent publications.

Ten Things for Educators to Think About regarding Performance Task Assessment.

1. Performance Task Assessment is an improvement because it assesses critical higher order thinking skills rather than subject/content knowledge.

Higher order thinking skills  matter now more than ever, and what effective performance task assessment demands of students and measures in learning are these skills.   As the CAE report explains:

Employers now seek individuals who are able to think critically and communicate ef ectively in order to meet the requirements of the new knowledge economy (Hart Research Associates, 2006; Levy & Murname, 2004).  Therefore, the skills taught in higher education are changing; less emphasis is placed on content-specific knowledge and more is placed on general higher-order skills, such as: analytic reasoning and evaluation, problem solving, and written communication.

Recent theories of learning that reflect the change in emphasis from specific content domains to a focus on higher-order skills are redefining the concept of knowledge. Herbert Simon (1996) argues that the meaning of “knowing” has changed from being able to recall information to being able to find and use information.

Branford, Brown, and Cocking (2000) note that the “…sheer magnitude of human knowledge renders its coverage by education an impossibility; rather, the goal is conceived as helping students develop the intellectual tools and learning strategies needed to acquire the knowledge to think productively.”

As Tony Wagner often says, it doesn’t matter nearly so much what you know anymore as it does what you can do with what you know.  Let’s stop measuring what students have memorized and do a better job of measuring what they can do with their understanding and skills.

Some experts argue that higher order thinking skills can only be developed and only assessed when embedded in a content area, and if were one to accept that position, it is a significant challenge to the value of these performance task assessments.  But I don’t:  as Stephen Downes has written recently, “Critical thinking cuts a wide swatch across all disciplines. Just like with mathematics, the principles of critical thinking do not change from one domain to the next.”

2. Performance Task assessment is an improvement because of its open-ended, essay format.

In his 2008 book, The Global Achievement Gap, Tony Wagner cites Andreas Schleicher:

“US students tend to be rather good in multiple choice tests, when four choices are laid out.  They have a much harder time when they’re given open-ended tasks. ”

[Wagner:] Our students often cannot apply what they have learned to a new problem or context they haven’t seen before.  That’s what an open-ended task requires.  You’re not given a list of answers to guess from but instead have to construct one, using previously acquired skills and knowledge.

What are the hidden costs and consequences to our country of having adopted an accountability system that relies extensively on inexpensive, “objective” multiple choice tests versus more complex and open-ended tests that demand real thinking and a deeper understanding of concepts?

It is high time we changed course and began asking our students not to select their answer among a few options, but analyze, fashion, explain and write their own solutions to complicated problems.

3. Performance Task Assessment is, at least relatively and sometimes absolutely, popular with students: they like this test.

I genuinely believe, based on my three years experience administering the performance task assessment CWRA test, that students, not always and not universally, but generally much prefer the experience of taking performance task assessments.  (How much this pleasure will wane with multiple administrations, however, I cannot answer– some surely, but I don’t think entirely).

As evidence, I offer this video and the selected quotes from students below.

We all found it extremely intriguing and enriching, and personally what I like about is so much is that sometimes when you are in class you think “when am I ever going to use this information” and “I cannot think of a single life experience when I would need to use this equation or this random fact from history.”

But with this test, I found myself sitting in a room with a computer and have a pool of information to dive into,  which is a really great feeling and you understand and realize that what you are learning is relevant and important.  I think that this realization made it exciting and fun and has given me an almost newfound respect for the information I learn on a day to day basis.

We got a little speech [from the Head of School] before the test about how it wasn’t really a test, and we were like “yeah right ok whatever” but then we got in there and then we realized it really wasn’t a test.  It was a load of fun. We came out of the test and all the underclassmen were laughing at us because we were laughing and we were like “oh, why did you do that?” and  “how did you do that?”  and “that is really smart”  and “I did it completely different.”

4.  Performance Task Assessment performance, in combination with High School GPA, better predicts college success (as measured by senior-year GPA) than does the SAT (alone or in combination with HSGPA).

A recent study by Zahner and Steedle (2011) presented at AERA 2012 (Vancouver) demonstrated, that, when combined with gpa, performance task assessment better predicts collegiate academic success than the SAT.

The implications are two-fold: this kind of assessment may in time emerge as a new supplement or even alternative to SAT for college selection, and hence we’d do well to prepare our students for better PTA performance just as now we prepare them for the SAT, and second, that supporting students in being successful on performance task assessments seems to better set them on the road for academic success in college, something we all want for them.

5. Performance Task Assessment essays will be graded by computer.  

Like many readers, I don’t love this aspect of the new assessments, which is also true for CLA/CWRA.  It should be said that a set of essays are first evaluated by humans in order to establish norms and rubrics from which the computer system evaluation can be generated, and then the computer takes over.   Moderately persuasive research has concluded that the”automated essay scoring” programs are more accurate in scoring when comparing its results to expert human graders than is a typical human grader– but it surely we all find this type of scoring off-putting.

Others have been been writing about computer scoring of essay writing lately, especially Justin Reich at Edweek in a very impressive series.    Much of what the analysis boils down to is what one’s goals are, and what one’s alternatives are.  If the goal is for a teacher to work closely with a group of students to improve their essay writing, automated scoring is deeply detrimental.   But if the alternative is to not assign open-ended student essays at all, then maybe this system is preferable.   For these new performance task assessments, which I think are superior to multiple choice, automated scoring makes them economically possible in a way they simply were not before, and for that I am grateful, even though this process will have its gaps and limitations.

6.  Performance Task assessment is a test worth teaching to.

This is the argument made by the founder of the Collegiate Learning Assessment, former college President Dick Hersh:

The CLA and CWRA are intended to be powerful signaling devices—they make  clear that specific higher order learning is valued because that is what the measures require; they allow an institution to gather formative data that informs institutional improvement; they allow for institutional comparisons and thus the ability to benchmark quality; they signal that such outcomes can only be collectively accomplished across the entire curriculum; and by measuring value added they permit both the individual student and institution to measure progress or the lack thereof in a way that allows for correction.

Tony Wagner agrees, succinctly:

The beauty of the CLA is that it tests skills all teachers should be accountable for teaching in every class.

The test of any test is whether employing it will narrow what teachers teach or how they teach them, and if so, will they do so toward things which are less important for our students’ future?

There is some fear here; this test will not be perfectly a test worth teaching to.  It may be that this assessment, like others, disincents focus upon the fine and performing arts, on oral and digital video communication, and perhaps on fiction reading and creating writing.  But I think it does continue to value the broad and deep study of history, social studies, and sciences, and it insists students work harder to think more clearly about what they are reading and hearing, to conduct more divergent thinking about problems they encounter, and to express themselves and their own ideas more effectively in writing.   These are all good things for any test to direct teaching toward.

7.  Performance Task Assessment needs to be embedded and employed in schools, in the classroom, at every grade level, because it better assesses and higher order thinking skills and better prepares kids for these looming new assessments.

These new types of assessments are being chosen and developed for national standardized testing because they do a better job assessing higher order thinking skills and written expression, and because they give better information back to institutions and educators about how good a job we are doing teaching these essential things.  If this is the case, why would only do so via once a year external tests?

One of the great promises of implementing performance task assessment is that we can use the national testing as a model and exemplar, and then do a better job bringing these practices into internal, on-going, formative assessment in each and every classroom.    It will only be pragmatic to do so, by the motivation of better preparing students for the high stakes assessments, but as it is pragmatic/strategic for externally driven reasons, perhaps it can become educationally valuable too.  Why not seek a win-win here?

This has been the intent and effort of the CLA/CWRA team for years: to use the model of their large scale external testing as an influence upon the assessment of professors and teachers.

The CAE report calls for this approach.

The most important step: get published CLA Performance Tasks into the hands of the faculty so
that they can:

a. Use them in their classroom where they have greater knowledge of the strengths and weaknesses of their students;
b. Develop Performance Tasks that are based on the scoring guide of the published tasks;
c. Choose case studies and problems for text material that is congruent with the documents in the CLA Performance Tasks rather than the content dominated textbooks extant;
d. Adopt a student-centered approach to teaching that calls for much more analytic-based writing on the part of the students and diagnostic feedback to the student about how they can improve their performance.

In sum, the above steps comprise an early version of what we hope will become a reinforcing system of
continuous improvement of teaching and learning.

I am aware of several school districts which is using this CLA strategy, implementing the CWRA at least on a pilot basis, and then immediately working to implement and incorporate the performance task assessment broadly in their system.

In Virginia Beach, Jared Cotton, as Assistant Superintendent, leveraged the CWRA which was being administered at the high school level and worked with teachers to develop parallel, analogous performance task assessments with district students in 4th and 8th grade.  Dr. Cotton has said that the teacher collaborative work to develop and then to score these assessments was one of the most powerful professional development experiences he had ever witnessed with his faculty.

In Albemarle County, Virginia, Superintendent Pam Moran, has been this summer working with teachers K-12 in summer summits to develop performance task assessments, with the intent that every student will have the experience at least once each year of taking such an exam: good preparation and good learning.   She told me that she believes these assessments are very strongly aligned with her district’s Life-long learner standards, and that ensuring that teachers are using, across the board, these aligned assessments will go far to ensure her teachers are teaching them and her students are learning them.  (I wrote more about Albemarle’s initiatives at the bottom of this recent post.)

8.  Performance task assessment will be better prepared for when critical thinking is taught explicitly and emphatically.

I fear that, as important as critical thinking is, too often its teaching is left by teachers as implicit.  “Of course we teach critical thinking,” they say and believe: “it just happens naturally in the way we teach.”

But research suggests this is simply not the case; that we do better by students when we call it out and ensure students are explicitly learning it on a regular basis.    When critical thinking instruction is made explicit as a thread or a course, percentile gain is is reported in a meta-analysis to be 33, compared to 4 when “general CT principles are not made explicit.”   (see Marzano’s Teaching and Assessing 21st century Skills, citing Abrami 2008 meta-analysis).

Because critical thinking is so heavily emphasized in performance task assessment, and schools will get these results in significant reports, it will likely go some distance to make critical thinking teaching a larger and more explicit emphasis in our schools.  Principals which want to get a jump on preparing their students for these coming assessments would do well to work with faculties to review and upgrade critical thinking instruction.

9.  Performance task assessments will be better prepared for when students write more and read more.  

One of the strongest findings of the research conducted in the book Academically Adrift is that we can differentiate what does and doesn’t prepare college students for success on the CLA, and from that perhaps we can moderately extrapolate what will and won’t prepare students for success on their soon-to-be arriving performance task assessments.

The key according to the authors?  Students who had more demanding coursework did dramatically better in their learning growth as measured by the performance task assessment, with demanding coursework defined as having at least one course in a semester which requires a minimum of 40 pages of reading a week and 20 pages of writing a semester.   Perhaps this needs to be scaled down from post-secondary to K-12, but in the end, the key is that we ensure students are regularly reading and writing in significant amounts.

10.  Performance task assessment is far from perfect.

I know I am offering myself as quite the champion of these coming assessments, and perhaps I am entirely mistaken.  Perhaps it is something like Churchill on democracy– the worst possible testing/assessment regime imaginable, except for every other testing regime imaginable.  The execution may be awkward; the online internet portals may be clunky.   The tasks might end up seeming more be canned than meaningful and authentic; the documents and artifacts may strike students as fake, contrived, or even stupid; the automated scoring might be faulty and ultimately deeply disappointing.  The data educators and parents get back might be hard to interpret and harder to act upon.

But I strive to be optimistic, and I speak from some personal, positive experience from three years administering the CWRA.   I think there is a great opportunity here for educators to make the most of a long-overdue updating of the way we administer standardized testing.   Standardized testing isn’t going away– let’s make it be the best it can be and try to make the most positive applications of them that we can.

Post-script: CLA/CWRA response to Assessment “Red Herrings”

As noted above, CLA/CWRA President Roger Benjamin is well aware of the resistance and reluctance for institutions to adapt new assessment regimes, and understands the controversy that external assessment creates.    Rather than oppose external measurement vehicles for learning because of the flaws which many of these vehicles contain, he and his organization are advocating we make and use better vehicles.

In his a recent CLA report, Benjamin addresses what he calls seven “red herrings” which are used in argumentation against assessment.  Three of them are worth sharing in this context:

“Since it is impossible to measure all of what is important in education, it is  impossible to measure anything that is important.

Response. Just because we cannot measure every aspect of education does not mean we  cannot measure important components of it perfectly well. For example, it is possible to benchmark critical thinking, analytical reasoning, problem solving, and written communication, often called higher order skills. These skills are considered crucial in the knowledge economy by most colleges and universities (see mission and general education statements) faculty, many employers, and observers.

What is really important is what goes on in the classroom between the teacher and the student.

Response. Yes, this remains true. However, there is growing consensus about the need for reform of undergraduate education that can be characterized along three dimensions noted above which comprise a shift to: 1) a student-centered approach, 2) a case or problem approach in courses and curriculum, and 3) more open-ended assessment instruments. To achieve such changes faculty need assistance, tools that help them make the shifts in their pedagogy — course design, text selection, and assessments that tell them whether and how much they are improving.

Content is what is important in undergraduate education.

Response. Of course content is important. But in today’s knowledge economy equally, if not more, important is the application of what one knows to new situations. Before the onset of the knowledge economy there was a sense that there was a knowable stock of knowledge. It was the job of lecturers to pour content into students who were passive receptacles to be filled to the brim. We now live in an age where one can google to access “facts”. It is more important to be able to access, structure and use information than merely accrue facts.”


ADDENDUM: Added September 3.

Shortly after publishing the above, I was reading Daniel Koretz‘s important book, Measuring Up: What Educational Testing Really Tells Us.   Koretz makes several points about performance task assessment on pages 59-64, in a discussion that is not especially supportive of PTA.

First, he helps us understand that this is not an especially new practice, which I realize in reviewing what I wrote above I implied it was.   Koretz:  “Late in the 1980s there was a widespread effort to replace multiple choice format with other forms of tests, many of them which fell under the rubric of performance assessment or, more vaguely yet, authentic assessment.”

The examples of these performance assessments he provides ranges widely, but some are parallel with what I am describing in this post.

Second, Koretz dives into an issue of performance task assessment which is fascinating, and which I didn’t delve into above: whether these authentic assessments ought to have a single right answer, or whether they should be designed so that there are multiple correct answers.   Idealistically, I prefer the latter; practically, I know that often it is the former: there is a particular “solution,” interpretation or approach which is preferred and given the best score.

Koretz tells a slightly odd, somewhat amusing but also somewhat obnoxiously self-congratulatory anecdote about attending a lecture on Performance assessment in which the speaker and advocate implored listeners to design assessments with no single correct answer.   Koretz explains this as silly, because in real life, such as a pilot flying an airplane, there is only one right answer; the speaker afterwards commented to Koretz that she was unsure which of the nearby hotels was the Hilton, and he told her that this was a question with only a single correct answer.    But, good author and expert Dr. Koretz, we’re not trying to assess student intellectual powers regarding such simple topics as locating a nearby hotel, or even following a airplane cockpit checklist for landing strip approaches; we are trying to evaluate their thinking on much more complicated territory.

To my embarrassment, Dr. Koretz next says that periodically people claim this authentic assessment to be “new, even pathbreaking,” but, he explains with some superiority, this is “anything but new.”

More substantively, he confronts the idea which is a central argument in my post above, that performance task assessment are “better suited to measuring higher order thinking skills.  There is something to this view, but it is overly simple.  Research has shown that the format of the tasks presented to students does not always reliably predict which skills they will bring to bear, and students often fail to apply higher-order skills to the solution of tasks that would seem to call for them.”

Surely he is partially correct, but it seems that this is in large part a design issue: the tasks themselves, and the scoring of the student answers, need to be designed well to elicit and evaluate student higher order thinking skills.  Some tasks and rubrics will perform better than others, and we need to keep improving them, as CLA/CWRA does, accordingly.

The final point Koretz makes exactly corresponds to one of my main points.   Teachers and schools take cues from prominent, high stakes, external measurement tools.   If elementary teachers know that in third grade there will be a high stakes multiple choice tests, Koretz says, there are reports that some teachers will begin administering multiple choice tests in Kindergarten.  Hence, he states without endorsement or critique,

Reformers argue that the new tests would encourage instruction not only by testing rich and demanding content but also by modeling tasks would make for good instruction.   That is, the assessment tasks themselves would exemplify types of work that teachers should include in their ongoing instruction.