[MUSIC] Welcome to this second lecture in the third week of this course on assessment. In this week, we are talking about reporting and specifically, in this lecture I want to talk about the kinds of scores that we might use to indicate the quality of student work. And there's some problems with using these kinds of scores, and unfortunately, it is a little bit technical. So, you may need to study the notes quite extensively. In terms of our curriculum map, we're looking at any kind of score that comes out of an assessment for any purpose, and how we communicate that. Raw scores are a problem. I remember one time I came home, and my son told me in grade eleven, fifth form, that he got 83 on science. And for a moment I was excited - 83 sounds like a very good score. But actually, I wondered, he doesn't normally get 83%, so I asked him what was his test out of. And he smiled and told me it was out of 160, meaning, of course, that he had got around 52-53% like he normally does. And so, I had to give him credit for being smart enough to try and fool me and relaxed enough to realise that, you know, he did about what he normally does. So, a raw score doesn't mean anything unless you know something else. It might be a good score, or it might be an inadequate score. Scores depend on how old the student is; what time of year - beginning of the year, end of the year; whether they've been taught something about the topic before or not; whether they normally get a score in that range or something higher or lower; whether the test is extremely difficult; whether the test was very inaccurate, and what the standards might be in your system. So, a score of 26 out of 50 might mean nothing. However, we have to report scores because people understand, or think they understand, scores. Let's talk about, first of all, the score used to compare my performance to that of others. Teachers are actually very good at this. Teachers are very good at understanding, this person over here is the best in my class in this subject, and that student over there is one of the weakest, and these students here are in the middle. Teachers understand rank order. Teachers are good at it. The problem is, rank order doesn't seem to change very often. Knowing that you're last in class, may lead the teacher to think, "Well, there's nothing you we can do about that. You're always going to be last in class." Conversely, if a child is first in class, what can the teacher do to improve the student? The student's likely to stay first. So, rank order in school abilities tends not to change very quickly, which leads to some unfortunate consequences. There's also a problem that most of us have a reference group that's much too narrow. We're used to comparing students in my school to other students in my school. But my school might not be a very good representative of what total performance in my country looks like. And so we tend to normalise on the population we know. And we think, if you're good in my class, you must be good in the country, whereas, really, perhaps being best in my class might mean you're only average. Conversely, when my son began high school in Wellington, he was put in a low ability group, and we asked for some evidence, and these teachers said, "Well, actually, he performed around the 50-60th percentile in most of these tests." And we thought, "Well, that's not bad", but in this school, that puts him in the bottom group. So, not all schools or classes are equal to start with, and that changes how we understand performance. And the other thing that we have to keep in mind is just because you're best in class, doesn't mean the quality of your work is excellent. If everyone is weak, you can be best in a weak group, and no one's doing very well. Or conversely, in a high performing school, in a high performing class, you could be the bottom kid in the class, but that class is an excellent class, and you might actually be very good quality. So, rank order can be a misleading piece of information. Another way to evaluate students' performance is to compare them on large scales, standardized tests, to large samples of other students. And we can do this using the normal distribution curve. In a normal distribution curve, when we test enough people, the distribution looks like the classic bell curve shape that you see on this screen. And most scores will be very close to the mean, or average, and it allows us to make comparisons. In fact, when my son was put in the low achieving class, we asked for his performance on the normal distribution curve and it turned out that he was in the above average for standard deviation, which meant that he was a good average student. But he was put in the worst performing class in his school. One of the advantages of comparing to the national norm is it cancels out the local effect. What it means is that we can find top quality students elsewhere in the country, even though we normally don't expect to find top students there. It also means that we can find weak students in even high performing schools. And what we can do with these scores is make meaningful comparisons to representative samples. Now, the one that you can see on the screen now with the standard deviation is very difficult to understand, because what does it mean if you're in the second standard deviation below the mean, in minus two? This is not an easy thing to explain to a parent. So, there are mathematical ways to transform this distribution to numbers that are bigger than zero, which are much easier to understand. One of these scores that is commonly used is the percentile score. The percentile score converts everybody's position in the distribution as a number between one and 99. And, what it means is that if you're at the 50th percentile, you scored the score that the average person in the country gets. Now, this does not mean you got 50% because it was an easy test. The average score might be 60%, but on average, half the people got 60%. You would be the 50th percentile. So, you can see that percentile and percentage are slightly confusing and this is a reason to stay away from a percentile score. The other difference that you can see in the diagram on the screen is that to go from score one to score four, an increase of three points, moves you up only from the first percentile to about the tenth percentile. A small gain for an additional three points. But if you gained an extra three points in the middle of the distribution, if you went from 10 out of 20 to 13 out of 20, you would increase from the 50th percentile to somewhere around the 65th percentile - a much larger increase. So, the distances are not equal across the scale. And thus, small differences might look like a big difference, when they're not. To go from the 50th percentile to the 60th percentile seems to be a big difference. But actually, it's a small gain in test scores, and we shouldn't make a big deal out of this kind of gain. So, the percentile score is somewhat misleading, and we'd certainly recommend staying away from it. A simpler score, the stanine, which stands for standardized nine, takes the distribution and breaks it up into nine equally sized bits - which is a much more useful type of score to have. Each stanine is half a standard deviation and you can see from the diagram on the screen, what percentage of students would score in each stanine. What you can see is about 25% get a score between one and three, and 25% get a score between seven and nine, and around half get the middle scores of four, five, and six. This is a much more equal type of score, so that you can make comparisons between subjects. And it's also a little fuzzier in a sense that it takes account of measurement error. And so, certainly if you're going to help a parent understand their child's performance it's really nice to be able to say, "Your student scored in the 6th stanine, which is a high average score for children on this test, of this age group." And that's a much healthier and more robust score to use. Unfortunately, is the change of scores statistically significant beyond chance? We'd have to see a change in score of two stanines. Which means it's hard to see improvement. So, if we're going to use norm referenced scores to monitor learning, there are some pros and cons. Percentiles are problematic. We'd certainly recommend not using them. Stanines, a much more precise score, but we need a gain of two stanines to be sure that it's a real change, which means it's a kind of coarse measure. Now, for a lot of learning purposes, that might be enough. What we really need is a much more precise measure and standardised tests will often give you a precise measure, called the standard error of measurement. And using that can help you improve the quality of your reporting, and evaluating whether change is real, or just an aberration. Reporting is a difficult matter, because parents and administrators and school leaders always want numbers. But the numbers sometimes will mislead you, so be careful with how you use these numbers. [MUSIC]