Video captioning: it’s not that complicated

Words and pictures can work together to communicate more powerfully than either alone.

William Albert Allard (photographer)

Sometimes doing the right thing can seem like so much work…

But not if we’re talking about getting captions done for any videos that you want to post online: that’s just ridiculously easy, surprisingly fast and quite inexpensive. While it can seem challenging, we want to give you some idea of what’s involved.

I should tell you that on the Web Toolkit and, we'll only post videos that meet the New Zealand Government Web Accessibility Standard. (If you’re not familiar with the Standards, we’ve got some guidance for you.) We think it’s pretty important to reach all New Zealanders with information about government services, so this is business-as-usual for us.

Before we start with the ‘how’, let’s go over the ‘why’.

Why add captions?

The most obvious reason for captioning is to improve the accessibility of video content for Deaf and hearing impaired people.  Anyone delivering a service to the public — but particularly government — has a responsibility to make information available to a wide range of people so captioning is a smart way to go.

Captions can also help those with other impairments such as cognitive, learning or reading difficulties, and people not as comfortable in the language spoken in the video. Plus, captions are a nice feature for people who may be in a loud environment where it is difficult to hear the audio of a video file, or where they would prefer to not make any noise, like in a library. They’re also good for people who might prefer to read or scan the text version of the video’s content.

Adding captions can also make the video’s content available for indexing by search engines. Depending on how they are implemented, captions help expose the content of the video to search engines, improving the site's searchability.

What’s required?

Basically, to meet the Web Accessibility Standard where videos are concerned, you must provide:

  • captions that mirror the video file’s audio content, e.g. spoken words and other meaningful sounds
  • a descriptive text transcript of all audio and visual content, which includes all spoken words, meaningful sounds, as well what’s happening in the video, e.g. “Joe Blogg pours the cake batter into a pan”.

Since creating the captions results in text of all the spoken words and meaningful sounds in a video, most of the work in creating the descriptive text transcript is already done if your video is pretty simple and straightforward. An example of a ‘simple’ video would be someone speaking directly into the camera or presenting to an audience. Captioning such a video leaves only the work of adding text versions of any visual information communicated in the video, such as “Image projected on screen: Screenshot with logos of various NZ Government agencies. The agencies are ACC, Ministry for the Environment, and the Earthquake Commission.”

If your video is quite complex, with lots of unstaged action and numerous things happening, this can add a lot more work to creating a full descriptive transcript.

What does it cost?

To give you an idea of how much you might spend, here's a breakdown of our recent captioning costs:

Video title Length (min:sec) Cost (NZD)
“A glimpse of the future” 4:20 $20.94
Digital Engagement Team Projects: An Introduction 8:08 $24.57
Redevelopment of 11:23 $34.36
Domain Integrity Project 11:06 $33.53
Redevelopment of Domain Name Service 10:11 $30.77
Government Online Engagement Services (GOES) 13:22 $40.76

I should also tell you that we definitely challenged the transcribers’ ability to listen well: the accents in these videos include Scottish, American, Canadian and Australian in addition to Kiwi. And we were pretty pleased with the results.

How does it work?

Start with the video that you want to caption. In this case, let’s look at the “A glimpse of the future” video. Once we had the completed video, we uploaded it to the captioning vendor’s website. Then we picked the option we wanted, which was a 1-day turnaround, so that we could post the captioned video quickly.

To create the captions, the vendor used:

  • an automatic speech recognition technology to produce a draft text of the words spoken in the video
  • a transcriptionist to review the text, editing wherever necessary
  • a QA process to ensure accuracy.

Then, when the captioning process was finished, we reviewed the captions ourselves. Basically, this involved watching the video and making sure what was said matched what was written. This was our chance to catch any mistakes and correct them, something that the vendor’s online solution made especially easy.

Once the corrections were made, we downloaded the final file in the desired caption format (there are multiple formats available) and added it to the video which we had uploaded to YouTube. We also downloaded a simple text version of the captions, which we used to create the text transcript that needs to accompany the video. In this case, since the video is just of the Government Chief Information Officer (GCIO) talking, there was no meaningful visual information that we needed to include in the descriptive text transcript.

With the captions ready, we embedded the video in the Government Chief Information Officer’s blog post on this site, and added the descriptive text transcript right below it.

Because we asked for a 1-day instead of the usual 4-day turnaround, the cost per minute was a bit more. Even so, $20.94 is pretty darn good for such a service.

While there are other companies that provide similar captioning services, we used 3PlayMedia. For comparison, here are some other options:

Ultimately, whatever company you use, if you’re focused on your users, this is definitely the way to go. You’ll meet multiple users’ needs, increase your site’s visibility in search engines, and increase the value of your content. What’s not to like?


  1. Comment #1. Graham:

    The 3playmedia link is broken in “While there are other companies that provide similar captioning services, we used 3PlayMedia. For comparison”

  2. Comment #2. Susan Carchedi

    Hi Graham, thanks for letting us know. We’ve got it fixed now.

Navigate Posts