Chapter 11: Assessment Formats and Quality
Janine Davis
Mx. Jackson is teaching a unit on historical figures of the Harlem Renaissance. They have taught several popular lessons where students have researched, discussed, analyzed, and explored many aspects of writers and artists of the time period. Some of the most engaging discussions were about the effect of population migration on the people and art from this time. Mx. Jackson had high hopes for the final assessment, but was dismayed to see that most students missed several questions; the average score was 65%, with a range of 30-90%. No one earned an A, and most students had a C or below. What happened?
In Chapter 4, we discussed some key principles of assessment, and now we will explore how to write effective assessments. The purpose of assessing should never be to trick or confuse students; assessment should provide a chance for students to show what they have learned, and for the teacher to determine if their teaching has been effective. The following chapter outlines some important considerations about assessment formats, timing, and use. One of the first considerations when writing an assessment, after knowing the standards and objectives to which it will be aligned, is what kind of questions you will ask.
Selected Response
The most familiar kind of selected response assessment to the general public is a multiple choice or true/false question. You have probably encountered (and maybe hated) the kind of question that has multiple correct answers; that is also a selected response format, along with True/False or Yes/No questions. Multiple choice questions will have a stem, which is the initial question or phrase, and then choices that include both the correct answer and distractors that are not correct. Distractors should be feasible as answers, but should not be selected to deliberately confuse students. All of the answer choices should be about the same length. While grading students’ answers on multiple choice assessments is fast, it can be very time consuming to write good multiple choice questions. Consider the following example:
Which root word has a prefix and a suffix?
a) legend
b) unsupportive
c) themselves
d) celebrate
This question is a released item from the 2010 Virginia Grade 5 Reading SOL assessment.
This test item shows whether students have been exposed to and can apply prior knowledge of prefixes and suffixes. What do you notice about the question and the way it is written? Does it meet all of the above criteria? What would have needed to happen before students attempt this question so that it is an accurate assessment of their learning?
Constructed Response
A constructed response question may ask students to fill in just one word or a short phrase, as with a fill-in-the-blank question. You may have seen a vocabulary test with a word bank–this is an example of a constructed response assessment item. A teacher may decide to provide a word bank based on their knowledge of student preparation and their long-term goals for the content. While a word bank reduces the level of challenge, it may heighten students’ skill at making educated guesses, if those words have word parts that they have learned.
Constructed response questions may also be essay prompts, or prompts that call for a longer response than a few words. When writing questions that call for longer student-constructed responses, it is important to be clear about the length and format of the response. Should students use a certain essay format that they have learned in the course? Is writing in complete sentences a crucial aspect of how they will answer? Could students create a Venn diagram or other graphic to demonstrate their knowledge of two or more areas and how they relate to each other? One way to assess student writing that reduces some of the subjectivity is to use a writing rubric. The following is an example of an effective constructed response question.
Question 3
(Suggested time–40 minutes. This question counts as one-third of the total essay section score.)
In many works of fiction, houses take on symbolic importance. Such houses may be literal houses or unconventional ones (e.g., hotels, hospitals, monasteries, or boats).
Either from your own reading or from the list below, choose a work of fiction in which a literal or unconventional house serves as a significant symbol. Then, in a well-written essay, analyze how this house contributes to an interpretation of the work as a whole. Do not merely summarize the plot.
In your response you should do the following:
- Respond to the prompt with a thesis that presents a defensible interpretation.
- Provide evidence to support your line of reasoning.
- Explain how the evidence supports your line of reasoning.
- Use appropriate grammar and punctuation in communicating your argument.
This item is a released essay prompt from the 2021 AP English Literature exam; a list of books that students may use as examples accompanies the question.
While the question is clear regarding what students should do, it is important to note that this question assumes some prior knowledge; in this case, that knowledge is a component of the curriculum for the course. Here are some topics that a student taking this assessment should have been exposed to during the course in order to succeed on this assessment item:
- What does a well-written essay involve in this context?
- How will the scorers define appropriate grammar and punctuation?
- What is symbolism? What is interpretation of a work? How can symbols affect interpretations?
- What is a thesis?
- How do I support reasoning with evidence?
What other constructed response questions have you encountered in your schooling? How does the way you think about them compare to how you approach multiple choice questions?
Performance Assessment
Contrary to the name, performance assessments do not need to involve some kind of performance. Instead, they are an authentic assessment of student understanding. An article by Patricia Hilliard (2015) explains that performance assessments should be “complex, authentic, process/product-oriented, open-ended, and time-bound” (para. 4). As a student in a teacher education course, when you write a lesson plan, you are completing a performance assessment–it is a task that is completed by professionals in the field. What makes something authentic? Compare the following two scenarios, both of which occurred in a Spanish II course where the students are learning the vocabulary of politics and community:
- Teacher A’s class will walk two blocks to the City Hall building and work together to construct a pamphlet in Spanish to introduce the roles of those who serve in the local government and how they can help Spanish-speaking members of the community.
- Teacher B’s Spanish I class will complete a series of worksheets in Spanish that include sentences about the system of government in Spanish-speaking countries.
The students in Teacher A’s class are completing a project that will be useful to real people in their community; they will refer to people and places that exist in the real world. This kind of task is authentic. Simply completing a worksheet, as Teacher B’s class did, is not an authentic task. The sentences on the worksheet will have students practice the required vocabulary, but they do not apply to an actual situation, and will not be used beyond just for the worksheet.
The following is an example of a performance assessment.
Example 5: Animal Testing
Scenario: You are assistant to the director of operations at Xenocybernetics, a pharmaceutical company that has used animal testing.
Documents:
- Descriptions of squabbles among employees
- Hostile emails from the public
- Newspaper editorials
- Articles on the benefits of animal testing
- Industry guidelines on animal testing
- Statistics on animal testing
Task: Determine whether a negative portrayal of the company in a newspaper article was accurate; also, determine whether the company’s director has been libeled.
As you can see in the above example, often students are given a role from which to craft their position or view a certain situation. A similar kind of writing assessment is called a RAFT, which stands for Role, Audience, Format, and Topic. RAFT assignments can be an opportunity for students to choose a point of view that they would like to explore or a kind of writing that they wish to develop. One local teacher created the option to develop a dating profile for Charlemagne (Campbell, 2023). For more examples of how and why to implement RAFT writing tasks, see ReadWrite Think’s resource.
Portfolios
Often people think of art class when they hear about portfolios, but portfolios can be effective assessments in all content areas. Portfolios consist of purposefully chosen work that demonstrates desired skills or content knowledge. One kind of portfolio might ask students to select their best work in each of seven different writing formats, while another kind may ask students to include a series of materials to demonstrate their knowledge of the world of work and budgeting–a job advertisement, a reference letter, a resume, an artifact relating to the kind of employment they seek–all of these might be relevant components for inclusion. The following site, Authentic Assessment Toolbox, offers a deep exploration of portfolios and their uses. Portfolios can be very effective assessments, but it is important that teachers understand how to set up and score the task. Additionally, portfolios can be time-consuming for students, but typically completing such a task leads to feelings of accomplishment. The ability to select one’s best work–if that is the type of portfolio that students are completing–also helps reduce anxiety and empower learners.
- See this article at NSTA for a detailed description of a student portfolio, examples, and rubrics aligned with the Next Generation Science Standards. This portfolio is designed for middle school students, extends the traditional science fair project, and includes a project proposal, a research paper, various parts of the final paper, and a display board.
Examples
Let’s envision what the assessment and objectives would look like for the unit from our example, where students would be learning about the Harlem Renaissance. Remember that we will only know if an assessment is appropriate and aligned if we know the objectives first. All of the below Do objectives might share a single Understand objective:
Students will understand that the events of a time period have an impact on the creative work produced during that time period. There would also be Know objectives associated with each Do objective, but Table 11.1 will focus on the Do objectives.
Table 11.1: Example Assessments for Do Objectives
Format | Objectives | Question/Prompt |
Selected Response | Do: Students will be able to identify famous writers of the Harlem Renaissance. | Who was a famous poet of the Harlem Renaissance?
A. Louis Armstrong |
Constructed Response | Do: Students will know the causes and effects of the Harlem Renaissance. | Identify at least three historical events that led to the Harlem Renaissance. Explain how these events led to the Harlem Renaissance, and what the long-term effects of this time period were. Write an essay of at least three paragraphs to answer the prompt. |
Performance Assessment | Do: Students will conduct research about a topic and share their results. | Select a historical figure from the Harlem Renaissance and research their work and accomplishments. Develop an individual, two-minute presentation with visual aid such as a slide deck to inform the audience about your chosen figure. |
Portfolio | Do: Students will demonstrate sustained research and writing on a topic related to the Harlem Renaissance. | Construct a digital portfolio with at least ten artifacts chronicling the progression of your understanding about a single topic of interest during the Harlem Renaissance. (A list of possible topics and artifacts and a rubric for evaluation will be shared with students.) |
Key Considerations for Assessments
We have already discussed several key components of assessments, such as when they are given, how they are used, how we can create questions and prompts, and that each kind of assessment has some strengths and weaknesses. There are some other major considerations to be aware of when constructing assessments; these are described in the following section.
Reliability
Macmillan (2017) describes reliability as “the extent to which the scores are free from error” (p. 86); that is, are the scores consistent across students and learning targets. There are several factors that may complicate scores and affect reliability, such as
- Can the student read and understand what the question is asking? If you are measuring a student’s ability to multiply two digit numbers, but the assessment includes a word problem and students have not had instruction in how to solve a word problem, the rest has low reliability.
- Have a student’s legally-required accommodations been provided? Teachers must deliver accommodations such as providing a read aloud, reduced distraction setting, or extended time as indicated on an IEP or 504 plan.
- Is the student tired, hungry, sick, or distracted? Any of these can affect a student’s performance on an assessment.
- Did the student guess the right answer without actually knowing it? This is more common for multiple choice, especially True/False questions.
- Has the student encountered the same question or content in the past? If the student just wrote an extended paper on symbolism in a novel and encountered a question on a summative assessment asking about symbolism in that novel, they will likely be prepared to do well on that question.
Your Learning Management System (LMS) or another tool for digital assessment, such as those linked later in this chapter, can help provide a deeper analysis of student performance. One example is an item analysis resource from Canvas. The kinds of measurements that you may analyze to determine reliability can show how likely students who did well on the test were to also get a single question correct, or just the percentage of students in the class who answered a single question correctly.
Validity
The concept of validity of assessments is complex. Validity relates to the conclusions we can draw about how students perform on an assessment. It can help think of instructional goals as a package of items that can be measured separately. If the goal of a summative assessment is to measure the degree to which students can read a passage from left to right, top to bottom, it would not be a valid assessment to ask students who was the main character in the story–that kind of question would be a valid assessment for a different set of instructional goals. Instead, a teacher would need to observe a student read a passage, perhaps with the aid of a marker to move along the page as they go.
It would be nearly impossible and definitely unwieldy to assess everything that you teach during an instructional unit. In order to create a valid assessment, teachers must use their professional judgment to select a sample of questions that will show whether students learned the content in all of the sub-areas where you provided instruction (Macmillan, 2017). A test blueprint can provide evidence of how many questions of what types will be on the assessment, or what skills will be assessed. A common assessment for teacher licensure in the state of Virginia, the Virginia Communication and Literacy Assessment, provides a blueprint of what kinds of items will appear on the test. The VCLA test blueprint does not delineate a number of questions, but by looking at the “Do” objectives, it is clear that the assessment will involve identifying main idea, structure, and grammatical errors; writing a summary of a passage; and composing an essay with arguments that are supported with evidence. It is important to note that this kind of assessment is a high-stakes assessment–teachers must demonstrate these skills on the assessment to be licensed to teach in Virginia–and is not aligned with the goals of a single class. When you plan for your classroom, you will implement this idea on a smaller scale, and you should have more freedom to craft the kind of assessment that will be appropriate for your students given the instruction you have planned.
Clarity, Specificity, and Revision
Assessments can be stress-inducing and frustrating. The goal of a teacher should be to reduce these issues. The following is a non-exhaustive list of some ways to eliminate these problems:
- Use terminology and language that your students will understand.
- Clarify what kind of response will constitute a correct answer:
One word? A sentence? Three paragraphs? Ten pages? - Clarify how this item will be weighted as a part of the overall assessment.
Are all multiple choice questions worth 2 points, but the essay is worth 50 points, and the entire assessment is worth 100 points? This information may affect how someone approaches an assessment. - Explain what kinds of help are allowed.
Especially for tests that are taken through digital tools, students should understand if they are permitted to use a proofreader, consult colleagues, view their notes, use the internet, etc. - Develop and share policies (schools, departments, and/or teachers may develop these together) regarding what happens when students do not perform to the level expected on assessments.
Designing and delivering assessments and analyzing the results is one of the most important tasks that a teacher will do. Remember that student needs in the form of special education accommodations are a critical component of this process–some students may require extra time, read-aloud supports, or a scribe to record their answers–these accommodations, if formalized in an IEP or 504 plan, are required by law. If, after reviewing the results of an assessment, you find that students did not perform as you expected, you can and should develop opportunities for students to relearn the material and demonstrate competency. Simply moving on to the next unit without addressing common misconceptions can lead to frustration for all involved.
Cultural Validity in Assessment
Students have different experiences that can contribute to how they perform on assessments, or whether they understand what the question or prompt is asking. For example, even in a standardized test where all students receive the same reading passage, students are bringing varied personal experiences and background knowledge with them as they decode and comprehend the passage. A student who lives in a rural area on a farm is likely to perform better on a reading passage about chicken farming, even if the comprehension questions are designed to be answered just by reading the passage.
When developing assessment questions, it is important to consider whether aspects of the assessment might not be equitable for all students. A teacher should ask themselves if there is required knowledge or skills that are a part of this question or assessment that they have not yet taught. Is there another way that a student could show the teacher that they have learned this information? This is particularly important for English Learners. When we assess English Learners in their non-dominant language, we are simultaneously assessing their content knowledge and their mastery of English. Therefore, assessments may not accurately represent English Learners’ understanding of the content if they do not have the English skills necessary to comprehend the question or respond in a way that represents their actual understanding of the content.
It is also important to watch for implicit bias that can present itself within assessment design. Consider if, in writing prompts or word problems, you have represented one group of people as being successful and another group as less successful or skilled. Analyze the names you use in assessments as well: do they always come from White, Anglo-Saxon origins, or do you attribute positive characteristics to names or groups that align with identities that have more power (i.e., male, White, heterosexual)? It can be engaging to work students’ names into assessments, but this should be done intentionally and cautiously. One local teacher used an example of estimating the weight of several football players and used their actual names on an assessment, which led to student embarrassment and distracted them from the purpose of the assessment.
Tools for Digital Assessment
There are many online options to collect and analyze student performance data. The following options that are available for teachers:
Most of the above options will be free for both teachers and students, but teachers must explore whether these options are available for their use. Some school districts have policies about whether students can access certain websites or apps. Districts may also have rules about whether students can download apps or create logins for new programs.
Another important consideration is what purpose the digital tool will serve. For example, Kahoot! is commonly used as a review game for a whole class, while Quizlet may be most useful as a set of flashcards for students to use independently or in small groups to prepare for assessments. Google Docs can be useful for collaborative writing, but if you would like to see what each person contributes, it can help to have a system where students identify their contributions (color coding the text they add or adding their initials, for example).
Revisiting an Assessment Quandary
Mx. Jackson was proud of their instruction and the students were engaged, but the problem in this case was one of assessment alignment. First, the summative assessment was mostly multiple choice and focused on dates and specific pieces of literature and art, while the instruction had been deeper than just identifying details of the time period. Second, there was excellent discussion during the unit, but not a lot of formative assessment to show what each student knew that related to the objectives before the summative assessment. Additionally, the teacher’s objectives asked students to consider how people are affected by the time period and place where they live in a general way–an assessment that asked students to construct a response that showed their deep knowledge would have been more effective than a multiple choice assessment.
Key Chapter Takeaways
- Teachers should use a variety of assessment formats to reach all students.
- The major categories of assessment formats are selected response, constructed response, performance assessments, and portfolios.
- Students should be familiar with the formats that they will encounter in the assessments they take.
- Effective assessments involve careful consideration of reliability, validity, fairness, and clarity.
Application Questions
- What are some of the most memorable assessments you recall from your own schooling? In what ways did the format of the assessment affect your experience of them?
- Conduct a brief survey of your friends or family members: What are their feelings about different kinds of assessments, and why?
- When delivering a summative assessment, you find that half of the class scored well below what you expected. When you ask some of the students privately the next day, you learn that the students just got the new assignments for a local travel soccer team, and several were devastated to learn that they did not make the team. What would your next steps be?
- Select an assessment that you might give in your future classroom and consider what kinds of instruction will be needed to ensure that the assessment is valid, reliable, and fair.
References
Campbell, T. (2023). Raft assignments and student engagement. [unpublished Masters thesis, University of Mary Washington].
Hilliard, P. (2015, December 7). Performance-based assessment: Reviewing the basics. Edutopia. https://www.edutopia.org/blog/performance-based-assessment-reviewing-basics-patricia-hilliard
Macmillan, J. H. (2017). Classroom assessment: Principles and practice that enhance student learning and motivation (7th edition). Pearson.