Another Case of Teacher Cheating, or Is It Just Altruism?

From the results of the high-school “maturity exam” in Poland (courtesy of reader Artur Janc), comes this histogram showing the distribution of scores for the required Polish language test, which is the only subject that all students are required to take, and pass.

Not quite a normal distribution. The dip and spike that occurs at around 21 points just happens to coincide with the cut-off score for passing the exam. Poland employs a fairly elaborate system to avoid bias and grade inflation: removing students’ names from the exams, distributing them to thousands of teachers and graders across the country, employing a well-defined key to determine grades. But by the looks of these results, there’s clearly some sort of bias going on.

Compare that to the results of the “advanced” Polish language exam, which is taken in addition to the basic level exam by about 10% of students. It has no influence on whether students pass or fail the exit examination, so there’s no incentive to grade inflate, as evidenced by the clean distribution.

 

Artur writes:

I’m quite sure there is nothing to be gained for the graders/districts if they pass a student with a borderline score (at the basic level), rather than failing him/her. So my take on this is that graders just didn’t want to fail some kids and seriously hurt their college prospects and/or make them re-take the exam when the score was close to the cutoff.

So, is that pure altruism on the part of the teachers? Or do they actually have some bit of national incentive to see students go on to college? One could probably ask the same questions of school officials in Atlanta.

 

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 22

  1. Alex C says:

    Perhaps the minimal passing criteria is just very well defined

    Thumb up 4 Thumb down 2

  2. YX says:

    It is possible that since teachers like to see his/her students to do well for obvious reasons, this feeling often transfer to students in general. This could create a sub-conscious incentive and not wholly altruism.

    Thumb up 0 Thumb down 0

    • CT says:

      The grading system isn’t explained in detail above, but what I gathered was that the teachers were NOT marking their own students papers….or there is a very low chance of that happening since the papers were distriubuted to so many teachers and graders. So what would the “sub-conscious incentive” be to inflate the grades for a random student?

      Thumb up 0 Thumb down 0

  3. jpp says:

    Hello, I’am glad that something form Poland come to Freakonomics, too :)
    When it comes to the incentives for such interesting distribution, there should be some background add-on:

    This is exam from Polish language containg reading (with understanding) of some newspaper text and discussing specific literature subject (in this case: compare diffrent poems and their views on dreams, or compare life philosophies of two novel heroes – all texts was attached to exam). The problem is that in case of second task, there is plenty of possibility to find new meanings for the poems or novel. Student can write a beautiful, and very deep analysis, but if it is not matching “the key” – centrally prepared answer base in which student must notice something, write about some other things, etc. – there will be no points, and since then – one can even not pass the exam.
    This is more the problem of measuring humanistic skills of people – which are tried to be counted in this exam. And since the teacher checking the exams (surely not knowing students taking it – there is a big procedure to maximize anonimity) is also a Polish language teacher – surely humanistic one – he or she can sympathize with “repressed” humanistic-oriented students.

    Well-loved. Like or Dislike: Thumb up 10 Thumb down 0

  4. James says:

    Why the high bars at either end of the advanced test score graph, though? I can see a long tail on the high side, if all scores above 40 are lumped into the 40 bar, but at zero?

    Thumb up 0 Thumb down 2

    • Qrious says:

      It seems that some people are really dedicated and planned to take this exam well in advance (40th bin), some are completely unprepared and just go for it counting on luck (0th bin) and the rest just don’t know what else to do with their life (taking into account that the distribution is almost centred on pass/fail border) ;)

      Thumb up 1 Thumb down 0

  5. Aaron Goldman says:

    I wonder why the tails on the second graph are so large. Could it be that if the score is really low its just easier to call it zero and if its very good some teachers will give a top grade more often than a point or two below top? Interesting that the top graph shows only the faintest hint of this behavior.

    Thumb up 0 Thumb down 1

    • Joshua Northey says:

      This just displays your ignorance of how tests work (I mean ignorance in the nicest possible way (seriously)).

      The clump at 40 is more due to the way tests place a cap on what can be demonstrated than it is by people lazily “rounding up” (thought there is a tiny bit of that too).

      For example if you gave the American population a basic calculus test you might get a pretty normal distribution among HS graduates who studied liberal arts, a huge clump of zeros among HS dropouts, while other people who need to know it well for their profession will frequently get the maximum score because their ability is off the scale of the test. This is why you should always make tests really hard and why grade inflation is so pernicious. You lose your ability to discriminate between the people.

      For example, on a World history AP test a sophisticated 3 or 4 page essay and good score on the multiple choice will get you a 5. But the scoring is 0-5 so someone who has is perfect multiple choice writes an amazing 20 page essay still gets a 5. this causes there to be a clump at 5 because people who would theoretically get 6s or 7s are limited to a 5.

      The clump of zeros is due to a lot of student who know they will not score well not trying at all. People get in over their heads. Depending on the test zeros may only include non-responses, or those may be counted separately.

      Well-loved. Like or Dislike: Thumb up 10 Thumb down 0

      • James says:

        I guess I must not understand tests, despite having taken rather more than my fair share in my life. Depending on just how easy or difficult the test was, I would have expected either a double hump (lots of people who know it all, plus declining numbers who know almost all or make a few careless errors, then increasing again to a normal distribution of “ordinary” people), or the normal distribution shifted right.

        Thumb up 0 Thumb down 2

      • conchis says:

        “This is why you should always make tests really hard… [otherwise] you lose your ability to discriminate between the people.”

        Different tests will typically be more accurate at different parts of the distribution.* Generally we’d prefer to be more accurate towards the middle of the ability distribution rather than at the tails. Making all tests really hard would undermine this goal, and probably isn’t advisable.

        *It’s possible to avoid this by using computer-administered tests that route people to harder / easier questions depending on how well they perform early on, and can therefore achieve high discrimination in both the centre and the tails.

        Thumb up 1 Thumb down 0

      • Joshua Northey says:

        Or you can just have questions of all difficulties. Then you capture the middle and the tails.

        Thumb up 0 Thumb down 0

      • conchis says:

        Unless you have either (a) an infinitely long test or (b) a conditional routing procedure (of the sort I described above), then you will face trade-offs in how discriminating you can be in different parts of the ability distribution: every more difficult question requires you to drop an easier question and vice versa.

        If you want to argue that we should make tests more accurate in the tails at the expense of accuracy in the middle, then fine. My point was really just that you’re comments seem to incorrectly assume there are no trade-offs involved in increasing tail accuracy.

        Thumb up 1 Thumb down 0

    • Joseph P says:

      There is one more thing which can be helpful. In Poland there is a competition from October to March – it’s called Olimpics in (almost) every subject. Finalists and laureates of this competition automatically get 100 % on matura exam (particular subject of course). That’s why there are quite more “40′s” then expected.

      Thumb up 0 Thumb down 0

  6. Jon Peltier says:

    The same phenomenon has been described for speeding tickets. For example, see figure 1 in “Speed Discounting and Racial Disparities: Evidence from Speeding Tickets in Boston” (http://ftp.iza.org/dp3903.pdf).

    Thumb up 0 Thumb down 0

  7. Marekv says:

    These are the results from 2010, not 2011. The Polish Minsitry of Education report (http://www.cke.edu.pl/images/stories/001_Matura/WYNIKI/raport_matura_2010.pdf) contains graphs for various subjects. Philosophy looks quite similar to Polish language, but mathematics and chemistry appear not to have been tampered with. Teachers who grade the papers are required to stick to lists of suggested correct answers. These are much more vague for soft subjects, so it is easy to infalte the results.

    Thumb up 3 Thumb down 0

  8. Sandy says:

    The “normalness” of the curve is not the issue, smoothness is.

    Thumb up 1 Thumb down 2