Testing comprehension of content on beta.govt.nz

Lately I’ve been looking at ways to test users’ comprehension of our content on beta.govt.nz.

Traditional facilitated user testing only really tests whether users can navigate a site — “how would you do X?”, “where would you find Y?”. We’re trying to work out how to get beyond that: do people understand X? Can they make decisions from Y?

It doesn’t seem like there’s a lot out there yet — which is pretty fun for me. It gives us space to do some experiments and play around with some ideas. I’ve started out with the cloze test.

Cloze test

The cloze test is a basic readability test that’s been around for a while. It’s fast and simple (and free), so it seemed like the perfect place to start. You:

  • take a page of content
  • replace every 5th word with a blank space
  • ask people to fill in the gaps.

The idea is that if you’re using plain language and simple sentences, users should be able to predict the missing words and still understand the text. A page is thought to be plain English if users can correctly guess over 60% of the missing words.

Our results

We’ve tried it three times — each time with half our users completing a page of our content and half completing the original text from an agency’s website. Since we’re only experimenting, our ‘users’ so far have been our developers and other volunteers from around the office.

The results have been interesting, ranging from 40% correct on one agency’s page, to 81% on our rewritten version. The averages so far, after three comparisons, are 75% correct guesses on our content and 55% on original content.

Is it useful?

On their own, the results are a nice measure to have up our sleeve. They say something concrete, and we can use them as a way to collect some metrics on our content. Like the Flesch reading ease score, the cloze gives us a yardstick to measure things against — but, also like the Flesch, it can’t tell us if the content actually makes sense.

Numbers are nice to have, but they never tell the whole picture. There have been some really interesting things turning up in the details, though, that I think make this test more useful than just as a set of figures.

I’ve been scoring the tests strictly — only exactly the same word as the original version is being marked as correct. The most interesting finding, to me, is that our pages might be averaging 75% correct, but the content people fill in all makes sense.

“Both parents have to sign the form, unless your child legally only has one parent” has the same meaning as “both parents have to fill in the form, unless your baby legally only has one parent”.

Even when they used different words, users understood what they read on our pages — and if we'd asked them to take some action afterwards, they’d have gone and done the right thing.

The original agency pages were different. It’s not just that they couldn't guess the right words — users didn’t get the meaning of what they were reading. “BLAH, where an employee’s employment BLAH is so intermittent or ????? that it is not BLAH” is completely meaningless. People were guessing any old word, leaving blanks, or littering the pages with question marks. A user might have guessed 60% of the right words from their immediate context, but the page as a whole made no sense when they were done.

That’s really interesting.

How could we use this?

In a few ways.

  1. It’s a good reporting measure to have in our arsenal, especially when we’re trying to justify why we think content is so important.
  2. It’s an interesting way to find out what terms people use for things. In a few places, everyone we tested used exactly the same word for something — and it wasn’t the word we’d used. This could be a great way to check our thinking and make some tweaks to how we’re saying things. It’s like crowd-sourcing our terminology.
  3. It’s also a good sanity check. Are we making sense? Can people get the meaning of a thing even with 20% of the words missing? Are there words in there that would be impossible to guess — and can we change them if there are?

Next steps

We’re hoping to have time to try some other comprehension tests in the next couple of months — specifically some task tests that go beyond “find x” to a level more like “do you need a medical certificate to apply for a 9-month student visa if you’re from Iran?”.

We’d love to try some timed, facilitated before-and-after testing like this on our content and original content, to really get some concrete evidence about how people are using both sets of information. We’d also like to try a multiple choice test on some of the trickier concepts we’re writing about.

We’ll keep you informed.

Hollaback

If anyone is doing any comprehension testing, or has any tips or tricks, I would LOVE to hear from you. This stuff is fascinating, and, I think, really important. I’m really keen to get people thinking about it and doing more of it. Anyone who wants to help, please comment or find me on Twitter at @katieajohnston.

3 comments

  1. Comment #1. Alison Jack:

    This blog post is ____________

  2. Comment #2. Rachel Rachel:

    “fascinating”

    I’d never heard of the Cloze test – thanks Katie. Especially fascinating is that people guessed the same “wrong” word.

    I wonder if there are some areas where different population groups (e.g. Baby Boomers vs Millenials vs first generation NZers ) would use different vocabulary about government services (younger generation using more Americanisms, older generation using service names in use in the 70s and 80s.)

    E.g. recently I was surprised to find myself talking about paying my Car Tax (rather than Vehicle Registration) – probably a hangover from time spent a few years ago in the UK. But also a lot shorter and easier to remember (2 syllables total not 7).

    We tend in government to choose unweildy names at times which mean that people develop a shorthand, and older names stick around (it will be a while before people stop talking about OSH inspectors and WINZ) .

  3. Comment #3. Katie Johnston

    Hi Rachel,

    Thanks for the comment. I think there are vocabulary differences in age groups for sure, and I even think we’ve seen some “old” terms being used by younger people — maybe people learn about government services from their parents or elders, and then the terms keep spreading. We’ve definitely seen that service “brand” names (and even new department names) don’t stick with people — people are still referring to benefits by names that haven’t been used in years! The net result is that the name of a service and the identity of the department that provides it are largely barriers between people and whatever it is they’re trying to do. Figuring out how to fix that is a tougher job…

    (I love the term “car tax”. Does what it says on the box!)

    Cheers,
    Katie

Navigate Posts