Lately I’ve been looking at ways to test users’ comprehension of our content on beta.govt.nz.
Traditional facilitated user testing only really tests whether users can navigate a site — “how would you do X?”, “where would you find Y?”. We’re trying to work out how to get beyond that: do people understand X? Can they make decisions from Y?
It doesn’t seem like there’s a lot out there yet — which is pretty fun for me. It gives us space to do some experiments and play around with some ideas. I’ve started out with the cloze test.
The cloze test is a basic readability test that’s been around for a while. It’s fast and simple (and free), so it seemed like the perfect place to start. You:
- take a page of content
- replace every 5th word with a blank space
- ask people to fill in the gaps.
The idea is that if you’re using plain language and simple sentences, users should be able to predict the missing words and still understand the text. A page is thought to be plain English if users can correctly guess over 60% of the missing words.
We’ve tried it three times — each time with half our users completing a page of our content and half completing the original text from an agency’s website. Since we’re only experimenting, our ‘users’ so far have been our developers and other volunteers from around the office.
The results have been interesting, ranging from 40% correct on one agency’s page, to 81% on our rewritten version. The averages so far, after three comparisons, are 75% correct guesses on our content and 55% on original content.
Is it useful?
On their own, the results are a nice measure to have up our sleeve. They say something concrete, and we can use them as a way to collect some metrics on our content. Like the Flesch reading ease score, the cloze gives us a yardstick to measure things against — but, also like the Flesch, it can’t tell us if the content actually makes sense.
Numbers are nice to have, but they never tell the whole picture. There have been some really interesting things turning up in the details, though, that I think make this test more useful than just as a set of figures.
I’ve been scoring the tests strictly — only exactly the same word as the original version is being marked as correct. The most interesting finding, to me, is that our pages might be averaging 75% correct, but the content people fill in all makes sense.
“Both parents have to sign the form, unless your child legally only has one parent” has the same meaning as “both parents have to fill in the form, unless your baby legally only has one parent”.
Even when they used different words, users understood what they read on our pages — and if we'd asked them to take some action afterwards, they’d have gone and done the right thing.
The original agency pages were different. It’s not just that they couldn't guess the right words — users didn’t get the meaning of what they were reading. “BLAH, where an employee’s employment BLAH is so intermittent or ????? that it is not BLAH” is completely meaningless. People were guessing any old word, leaving blanks, or littering the pages with question marks. A user might have guessed 60% of the right words from their immediate context, but the page as a whole made no sense when they were done.
That’s really interesting.
How could we use this?
In a few ways.
- It’s a good reporting measure to have in our arsenal, especially when we’re trying to justify why we think content is so important.
- It’s an interesting way to find out what terms people use for things. In a few places, everyone we tested used exactly the same word for something — and it wasn’t the word we’d used. This could be a great way to check our thinking and make some tweaks to how we’re saying things. It’s like crowd-sourcing our terminology.
- It’s also a good sanity check. Are we making sense? Can people get the meaning of a thing even with 20% of the words missing? Are there words in there that would be impossible to guess — and can we change them if there are?
We’re hoping to have time to try some other comprehension tests in the next couple of months — specifically some task tests that go beyond “find x” to a level more like “do you need a medical certificate to apply for a 9-month student visa if you’re from Iran?”.
We’d love to try some timed, facilitated before-and-after testing like this on our content and original content, to really get some concrete evidence about how people are using both sets of information. We’d also like to try a multiple choice test on some of the trickier concepts we’re writing about.
We’ll keep you informed.
If anyone is doing any comprehension testing, or has any tips or tricks, I would LOVE to hear from you. This stuff is fascinating, and, I think, really important. I’m really keen to get people thinking about it and doing more of it. Anyone who wants to help, please comment or find me on Twitter at @katieajohnston.