[graphic from Digital Learning Now]

This post continue a small project here at 21k12 of viewing the coming Common Core Standards through a backwards prism: the testing assessments that will evaluate student and school success at learning and teaching Common Core standards.  These new assessments sit at a junction of topics I’m interested in and working on regularly: integrating technology, next generation and digitally enhanced assessment, computer adaptive assessment, and  performance task assessment.

These new Common Core (CCSS) assessments are the product in part of Secretary Arne Duncan’s call for a new generation of Assessments, Assessment 2.0 he calls it, about which I have written before.   To advance this vision of moving “beyond the bubble,” the US DOE is spending, via Race to the Top funding, more than $300 M in developing new kinds of tests and testing systems, split between two major programs, PARCC and Smarter Balanced.

As the Ed Leadership article by Nancy Doorey reports,

The assessment consortia are drawing on new advances in technology, cognitive science, and measurement as they develop this improved generation of assessments.

They hope these new systems will address concerns about existing state assessments—that many assessments measure skills too narrowly; return results that are “too little, too late” to be useful; and do not adequately assess whether students can apply their skills to solve complex problems, an ability students need to succeed in college, the workplace, and as citizens.

Both tests are administered digitally and online, and will require in most states and districts a massive technological infrastructure improvement to be implemented.   Administering them digitally and online offers many advantages, including the ability to offer adaptive testing (which is currently intended for SB only, not PARCC), and faster results returned to teachers for instructional purposes.

Eight questions worth asking about the the new assessments:

1.  Will they come on-time or be delayed, and will the technology be ready for them?    Although the test design is funded (enormously), the technological infrastructure upgrades are not externally funded, and it remains a very open question whether and from where this funding will come.   If districts are unable to meet the requirements, will the 2014-15 launch date for these digital and online tests be postponed?

Engaging Ed fears they will.

Digital Learning Now, in a recent report embedded below, pleads with the consortia: Don’t Delay.

Don’t phase in. With two years left to prepare, the combination of a long test window and supporting outdated operating systems allows almost all schools to support online testing now. Going further to support paper-and-pencil testing in and past 2015 is unnecessary, expensive, and reduces comparability.

It is also is unwise for districts to seek to compromise by the use of less than 2:1 ratios of computers to students.    Imagine the schools which are trying to use current computer labs to assess their students– it will take 12 disruptive weeks to roll all students through the labs, and the labs themselves won’t be available for any other learning during that time.

As the DLN report says,

Minimum requirements are just that – the bare minimum technical specifications needed for the technology to work. It is critical that districts plan not for the minimum, but for what is needed to deliver a highquality learning experience for students and teachers.

The 12-week testing window may accommodate student-to-computer ratios as high as 4:1 with multiple shifts in a testing lab, but that would fundamentally disrupt the instructional program for several weeks at the end of the year. Low student-to-computer ratios would also highlight the difference between the no/low-tech instructional environment and the online testing environment.

2.  Even if the digital devices are in place, will the bandwidth be strong enough to support the testing? 

There is a great obligation for states and districts to ensure their students’ testing experience is efficient, effective, and relatively smooth- certainly the experience of not being able to advance through a test or have entered data be lost will crush their experience.

The obligation to enhance bandwidth is also an opportunity of course: an opportunity for our students to enjoy, year-round, the experience of being connected learners, collaborators, and creators.

3.  Will the tail of these tests wag the dog and transform teaching and learning into the digital age?   The technological upgrades for the testing can be a tail that wags the dog that influences everything else, including the ability to dramatically re-center learning toward students and make enormously larger critical masses of students online and hence far more feasible use of online, digital textbooks and educational resources.  I wrote about this before here, and SETDA has a report on it.

As the Digital Learning Now Report says,  Our nation’s schools stand at an important “inflection point” in the history of education. Taken together, the implementation of CCSS, the shift to online assessments, the availability of affordable devices, and the growing number of high-quality digital instructional tools create an unprecedented opportunity to fundamentally shift the education system to personalize learning around the individual needs of every student.

The 2014-15 implementation of the new tests creates a timeline. With just 21 months, states and districts must act now.

But some school districts might just take a simpler path and deploy the technology for assessment only, with no attention to the potential which this change makes available– which is heartbreaking.

For schools outside of PARCC and SB, the implications are significant.   If every public school in your school’s state is massively upgrading their student technology platform and is transforming learning to exploit these technology platforms, how will your school respond:  is your non-public school going to follow suit and match, or choose intentionally an opposite path of classical or low-tech educating?

4.   Will these tests provide a dramatic step forward in prioritizing 21st century skills and digital literacies?  And because they are high stakes tests, educators of all stripes will need to re-examine yet again how effectively their students are learning these skills.

From Ed. Leadership:

  •  In language arts, these include executing electronic searches, selecting credible sources, and developing a written argument supported by evidence from those sources. In math, these include solving applied math problems that require using modern tools such as statistical packages and dynamic graphing software.

This sounds great, for these are nothing less than essential capacities for 21st century careers and citizenship.   But the quality of these questions and the richness of the simulation in the test-taking experience will determine whether this learning is meaningful and substantial or just canned and limited.

5.  Will these tests significantly “raise the bar” of academic expectations at every grade level?  

Ed Leadership: 

The assessments will require students to comprehend and analyze texts across all content areas that are at a higher level of complexity than those that many districts now use.  Accordingly, teachers and students should expect to see more challenging reading materials on these assessments.

If so, schools outside of these assessments will also need to consider the implications.  Private school educators who have been comfortable and confident that their rigor easily outpoints public schools in their community will need to consider again.

5.  Will “automated essay grading” be successful and be respected by educators?   The new tests go much farther “beyond the bubble” than any previous large scale tests, and demand at every grade level student writing, including analytical and creative writing tasks.   But they will be graded by computer, using new software designed for this purpose.

The DLN report labels this new kind of student writing robo-grading “intelligent scoring,” a choice of terminology which amuses me and may offend others with its tinge of Orwellianism and/or unfounded optimism.

Developments in intelligent scoring have also made it possible to include a significant amount of writing on these new tests, as well as constructed-response items and innovative performance tasks.

I like very much the “outcomes” of this scoring, but the jury is out on its quality, I think.   In my experience speaking about assessment to a wide variety of educational audiences, I have found very few topics which spark as much of an uproar as the idea we are going to grade student writing with software.

DLN points to the work begin sponsored by the Hewlett Foundation, a group I respect for the focus on “deeper learning” and their longstanding support of Project-Based Learning.

Hewlett Foundation funded Automated Student Assessment Prize (ASAP) was constructed to support the aims of the state testing consortia – better tests of higher-order skills at a lower price. Meeting these objectives will require automated scoring of constructed-response tasks.

 In a February demonstration, nine testing companies showed that “machine scoring engines did as well as or better than the human graders,” as reported byDr. Mark Shermis, author of the study summarizing the demonstration, Contrasting State-of-the-Art Automated Scoring of Essays. ASAP is planning a math prize, an innovative item prize, and classroom trials of online writing assessment platforms.

Must reading on this topic is the three part series in EdWeek by Justin Reich.   His first post concludes with this summary:

Automated Essay Scoring in Review

So to review:

1) Automated Essay Scoring programs predict how humans would score an essay
2) They require a “training set” of essays scored by human raters, a sample of the full set of essays to be scored
3) The don’t “read” an essay like humans do, rather, they use hundreds of algorithms simultaneously to “faithfully replicate” human grading
4) They place no constraints on what kinds of writing can be scored (except poetry and some other classes of creative writing) or what kinds of questions can be asked. Technology has now reached the point where if humans can reliably evaluate the quality of an essay, then AES programs can reliably predict those quality ratings.

In a subsequent post, Reich takes this nuanced position– and the jury is still out.

Lots could go wrong here.   Test designers could write dumb essay questions or dumb rubrics. (But they are not forced to do so by the technology; AES programs can predict scores on sophisticated questions or source-based questions as well as they can with simpler questions; the limiting reagent is the capacity of humans to agree on scores with a nuanced rubric, not the limit of the technology.

Similarly, students can game the rubric, but they can’t game the AES programs. Students might be able to game a human scoring a rubric, but students can’t game a program evaluating the frequency of co-located stemmed words. The key limitation in the system is human training.

But for all that, I think it’s entirely plausible that by leveraging the power of computer programs to instantly and inexpensively predict essay scores, we can create more sophisticated assessments and have those better assessments drive better classroom practices. The worst case scenario that I envision is that even though we make kids do more writing, it’s still stupid writing. In our current policy context, having kids write more is a downside I’m willing to risk.

6.  Will these new assessments impress us with their “authenticity,” and real-world connections?  Will they support assessment as learning? 

Ed Leadership:

[The new tests will include] more complex, real-world tasks in addition to the more traditional selected-response and short-answer questions.

In the DLN report, Linda Darling-Hammond is cited in support of this claim:

Linda Darling-Hammond, Senior Research Advisor for Smarter Balanced, speaks to this point when describing students’ opportunity for performance tasks under the next-generation assessments: “Performance tasks ask students to research and analyze information, weigh evidence, and solve problems relevant to the real world, allowing students to demonstrate their knowledge and skills in an authentic way.”

In PARCC,  Ed. Leadership explains, what I refer to as performance task assessments are called Performance Based Assessments.

 For each grade and course tested, the performance-based assessments will focus on the hard-to-measure standards, such as the grade 11–12 English language arts standard that calls for students to “synthesize information from a range of sources (for example, texts, experiments, simulations) into a coherent understanding of a process, phenomenon, or concept, resolving conflicting information when possible”

Tasks may include short-, medium-, and extended-response items as well as computer-enhanced items. Simulations may also be used when needed to obtain a better measure of a standard, with more sophisticated simulations to be added as the technology infrastructure in member states evolves.

For example, the mathematics standards call for “making inferences and justifying conclusions.” Simulations of a wide variety of experiments could be used to determine whether students can generate a model of the relationship among multiple variables, draw inferences, and justify those inferences with data.

Clearly, preparing students for success on this type of testing will demand we administer to students performance task assessment models and modules.  See my related post here: Performance Task Assessment: 10 Things for Educators to Think About

7.  Will these new assessments bring us dramatic advances for “formative assessment?

In contrast to some previous standardized testing regimes, both new tests offer more than just single, end-of-year, high stakes summative exam.   Both are committed to implementing both interim and formative assessments.

As a recent Ed Week article argues, the time is now for all educators to advance their practice in formative assessment.

First off, it is important to recognize that formative assessment works. That’s right: Ample research evidence is now at hand to indicate emphatically that when the formative-assessment process is used, students learn better—lots better.

Formative assessment is, at bottom, an ends-means process in which teachers and/or students rely on assessment consequences (the ends) to decide whether any adjustments are warranted in what they’re doing (the means). It’s really not surprising that formative assessment works so well.

Interim, aka midyear, assessments are explained as follows in Ed Leadership:

Midyear assessments. Midyear assessments will feature rich performance tasks that mirror the types of tasks included in the summative performance-based assessments. States and districts may choose to administer—even require—a midyear assessment.

Formative, aka diagnostic, assessments are even more interesting to me, because they open up potentially rich new opportunities for teachers, alone or preferably in collaboration with colleagues, to use online tools to design their own, comparable and aligned, assessments.

Many, most or all of the tools for formative assessment will be lodged in online platforms and portals maintained by the testing consortia; PARCC ‘s is explained as follows:

Partnership Resource Center, which is expected to launch in 2013. This web-based platform will offer a continually expanding collection of resources for teachers, students, administrators, and parents, such as released test items, formative assessments, model content frameworks, professional development resources, practice tests, and student and teacher tutorials.

7b. For those educators outside of the new tests yet  interested in these formative tools will these online platforms for designing performance assessments be available to all, or only to participating public education schools?

Ed Leadership reports that:

“The federal grant requires that all assessment content developed with grant funds be made freely available to all states—even those that don’t belong to a consortium—that request it for administering assessments.”

This statement only partially answers my question about access to it by non-public educators.  It is my hope that these resources will be widely available, and that educators outside the system use them to design digitally empowered, formative, performance task assessments for their students, and to use them in ways which enhance their practice of formative assessment.

8.  Will Educators, both teachers and administrators, be provided and have access to the significant and substantial professional development resources which will be so essential to making the most of this enormous investment in new assessments? 

The DLN report implores states and districts to do so.

Invest in teaching training. States and districts should sponsor a variety of Common Core and online assessment learning opportunities for teachers. Teachers and school leaders must have meaningful practice – in both formal professional development sessions and informally through regular application in the classroom setting – in order to become comfortable with these shifts before the full implementation of the assessments.

PARCC released an implementation guide and will have Professional Development Modules and Online Professional Learning Modules available in spring 2013.

Smarter Balanced will convene teacher cadres from member states in summer 2013, in addition to launching professional development materials and a full digital library of best practices and professional learning resources.

But with districts struggling still with severe budget limitations, this may be shortchanged, and that would be a great loss.