Some of you who took the fun quiz on general knowledge of SAP in a Do you think you are an expert on SAP? commented that the questions were too easy or too hard, and I thought you might find it interesting to understand about question difficulty when designing assessments in the workplace.
A common way to measure a question’s difficulty is by its p-value, which is a number from 0 to 1, and represents the proportion of people who correctly answer the question. So a question with p-value of 0.5 means that half the participants get it right and half wrong. And a question with p-value of 0.9 means that 90% of respondents get it right.
Here are two example questions from the general knowledge quiz (if you want to take the quiz, you can see it at www.questionmark.com/go/sapquiz).
Out of 444 people who’d taken the quiz at the time of writing, 388 knew that the answer was IBM and got it right. The p-value of this question is 0.87 and for the SAP community of users, this was a very easy question.
However in the question below, the p-value was much lower.
Only 85 of the 444 respondents to the quiz got the answer right, which gives a p-value of 0.19 making this a difficult question. Most people thought that the building was in Germany or the USA, not the very highly sustainable SAP Labs building in Brazil.
These questions were just for fun, but the concept of p-value is very valuable when you are building an assessment for real, and it’s important to look at the p-value of questions when reviewing results and piloting questions. Most assessment management systems will give you the p-value (sometimes called Question Difficulty) within their item analysis reports.
The following table gives a general guide to look at when looking at p-values:
|0 to 0.2||Less than 20% of participants get the question right. You should review whether there is confusing language in your question content or a problem in the instruction. The only reason to include such questions is likely to be if you need the information to find the very best performing participants.|
|0.2 to 0.4||Between 20% and 40% of participants get the question right. You should check there is no misleading language and identify whether instruction needs review, but this question does give information about many participants and is usable.|
|0.4 to 0.7||This is where most of your questions are likely to be, it gives you good measurement information on your participants.|
|0.7 to 0.9||Easy question that 70% to 90% get right. For a compliance exam where you expect high scores, this may be very appropriate and for any test, it can still be useful if combined with other, harder questions.|
|0.9 to 1.0||Very easy question that 90% or more of your participants have got the question right. Doesn’t give much information about participants. Such questions could be useful for a health and safety or compliance test, or a test for masters where you are expecting everyone to know their subject, or as easy “warm-up” questions but are generally too easy to help you measure skills and knowledge.|
A rule of thumb is that it’s often useful to use questions with p-value that are reasonably close to the pass score of the assessment. For instance, if your pass score is 60%, then questions with a p-value of around 0.6 will give you good information about your participants. However a very high or very low p-value does not give you much information about a person who answers it. If the purpose of the test is to measure someone’s knowledge or skills, then you will get more information from a question with medium p-value.
I hope you found this explanation helpful for when you construct your own tests. For more information, my colleague Greg Pope blogs frequently on psychometrics, and his blog post on Should I include really easy or really hard questions on my assessments? is a good place to start.