[MUSIC] Welcome to this third session on human judgement scoring. In this session we want to look at essays. I'm sure you've all written long pieces of writing in exams at university or you've written them in high school where you had to write essays in history or social studies or geography or English. I want to talk about essays, which has been an area of interest of mine for quite some time, and how we can work towards more reliable and valid scoring of them. Again, we're focusing in the area of informal work, informal assessment processes that need to be judged or evaluated by a human. Essays are an old assessment technique. Some 2 to 3,000 years ago, essays were developed in Imperial China as a means for selecting people for work in the government civil service. So, this is not a new technique, we've been using this long piece of writing as a way to judge student scholarly ability for a long time. The core of any essay is the task, prompt or question that you pose. And every question has two components; a cognitive task, what kind of thinking do you want the student to do when they write, are they to discuss, compare, contrast, analyze and so on; and of course they have to have some content specified, which might be the causes of World War I, the impact of a setting on character's development, or the role of mutation in disease. So, there's always the content that's specified, and a cognitive task that is specified. Why do we use such things? Well, fortunately for most of us, the creating of an essay question prompt or task, can be done quite easily. A colleague of mine, his wife was an English teacher, the day before the exam questions were due, she would sit down and write her essay questions for her grade 13 English class, didn't take her much time at all to write the questions. And clearly, when we focus on cognitive task and content, we're focusing the attention of the learner on something very important in the curriculum. Now, the downside: if you can create them easily, they take time to mark. And any of you who have been English teachers will know, that reading all that writing by students takes time. And so the amount of work, whether you using objective scoring or subjective scoring, is probably about the same. If you're using essays for examination purposes, that is, a set period of time where they come in, and they do this piece of writing at one go. Then you're asking them to create a cogent response to a question that they probably haven't seen before. Though I have to say my university professors used to tell us there will be eight topics, and the topics will generally be these things, without telling us the specific wording. But what we're doing, is producing on demand, at a certain time and place, where there's no opportunity to go back and get some information from an encyclopedia, from the internet, or to look in our notes, we can't revise before we hand it in, we have to deliver and hand it in. That makes it a first draft piece of writing, which means our standards in scoring this have to take into account that there were no extra resources and no extra time, no feedback, that we didn't follow good writing teaching, we just said sit down, write. So, this won't be there best piece of writing, but we have to judge it as a first draft. And I hope you've wondered, in your career, what is it that gives us scores when we're judging essays? What part of an essay is triggering us to give higher or lower marks? In the 1960s, researcher Ellis Page did a study where he compared 1,000 scripts. My goodness, a lot of work. Six judges marked a 1,000 scripts and then he used a computer program to mark the same scripts. And they taught the computer program to use certain things like, how accurate was the grammar, how big was the vocabulary, how rare where the words, how long was it, what was the ratio of active to passive voice, and so on. And he used these language rules to evaluate the essays. And what he found was that the average correlation between the humans and the humans was lower than between the machines score and the humans. This persuades us that what's really going on, what we're really doing when we're marking essays - whether they're take home essays or on demand essays - is we're looking at the language, we're looking at the style, we're looking at the organization, we're looking at how prepared the writing was in many ways - we're not actually looking at the content or the thinking, and that's unfortunate. Unless you're teaching language. If you're teaching content then we should be marking content not just language. Assuming that we're teaching content and using an essay to evaluate content knowledge we need to rethink how we structure the essay task so that we give to students more structure about the organization and the language that they should use. So, that we're not judging how skilled you are with language, but how knowledgable you are with the content that we've been trying to teach during our course. For example, on screen now you'll see a testlet form of short paragraphs. There's a task, and then the task says use the following - A, B, C - three prompts to organise your essay. What this does is it tells each student how to organise their task and allows the marker not to be distracted by people who followed a different order. Everyone follows the same order, so it's easier to evaluate the quality of work, one to the other. On screen now is a structured essay. This one was provided to us by Professor John Hattie. And he actually lists the order of points that the students are supposed to use. And he gives them the words for the first sentence. "The evidence on this topic generally says...". Often, students get stuck who know material but are perhaps not great writers, and they get stuck on the beginning. And this gives everybody a beginning, which helps students. So, if you structure the response, then you're more likely to get better content and better performance from all students. Otherwise, we might be accused of simply reproducing social order as Pierre Bourdieu has suggested. An alternative concept, developed by Professor Rich Shavelson at Stanford University, is instead of asking students to write an essay, give them a concept map. And onscreen you can see the concept map that I created for the purpose of assessment. And each part of a concept map, have nodes - those are the words in the bubbles - and then paths, with words describing the nature of the path. What professor Shavelson has shown, is that if you give them a concept map and take away certain key terms, and ask them, "Don't write the essay, simply fill in the bubbles of the words that are missing." you get a very good understanding of whether students understand the concepts that you're trying to teach. Other researchers have said, "Well, let them draw a concept map and then write the essay. Let them bring the concept map that they wrote at home to the essay exam, and they'll write better essays." So, certainly, whether you use it as an adjunct to a traditional essay or use it as an alternative to a traditional essay. A concept map is a good organiser and helps students write better. To some, essays are all about the marking. We have to remind ourselves that we're very inconsistent as judges and that writing, especially an essay, is a complex linguistic and cognitive process, and so it's hard to evaluate the content without being distracted from the language, and it's hard to have good content without good language. But, when deciding what are we really after in posing an essay task, we have to mark for what we really want to evaluate. Are you marking the language because you're teaching language skills? Then mark the language skills. But if you're teaching content, give the students the language structure and language prompts that they might need and focus on the content instead of language. In our next session, we're going to look at how humans can compare marks and how we can use a technique called moderation to improve the consistency of our scoring. [MUSIC]