The World Health Organization estimates 278 million people worldwide have some form of hearing impairment.

A Nielsen study suggests that there has been over a 300 percent increase in online video watching since 2003. Further, most watching is done during work hours. Workplace computers are often muted or have no speakers.

Several billions of videos are watched monthly worldwide, with many of them in different

See for more about this.

I had my first foray into captioning this week for a short video that a colleague Daniel Davis (@ourmaninjapan) did on remote debugging with Opera Dragonfly and Opera Mobile 10 to mark the release of Opera Mobile 10 Beta (go try it, it’s free).

I’m slightly pink faced to say I’ve not done any captioning before having always opted to transcribe video and audio so I had to start from scratch sourcing the right tool and figuring out how to go about editing and setting up a process. Whilst I set out to caption a video my purpose was also to see how easy or difficult it was as captioning is the poor cousin of accessibility considered to be expensive, time-consuming and only relevant to hand-full of people.

Before I launch into my findings below is the final product captioned in English, Japanese and Russian using  Overstream and hosted on YouTube. Big thank you to Daniel for the translation and original video and Vadim for the Russian. You can also watch the video on Easy YouTube.

There’s also a hidden Easter Egg in there, see if you can spot it.

Captioning benefits

Accessibility – this is the obvious benefit as you’ll be opening up your content to deaf and hard of hearing users as well as people find it easier to read rather than listen (or do both together). If you don’t have translated captions some non-native speakers may also find content easier to consume when reading captions.

Localisation – adding translations to your captions widens your potential audience massively. There are plenty of tools out there such as dotSUB that enable you to crowdsource translations and many hosts such as YouTube which support multiple caption tracks.

Mobile – users with mobile phones who may not have earphones or are in a noisy place also benefit. I do wonder how much can be visible on some small screens but certainly some people will find it useful.

Search – site indexing may also get a boost. For example YouTube supports video searching of caption data which also filters through into Google search.

Getting the right tool

There are more captioning tools out there than I’ve had hot dinners so I thought I’d narrow it down scientifically and just ask over Twitter what people recommended. My only stipulations were that it had to be quick, easy and free (what else!).


I gave Google’s web based tool CaptionTube a go first. It’s super easy to get started as you just use a Gmail login and from there you upload video from your YouTube collection. So far so simple.

What I didn’t find so intuitive was the captioning interface itself. When dropping text into the timeline I wasn’t able to clearly see when text started and ended as the end time was measured in how long the segment was rather than when it stopped in the overall timeline. This just didn’t work for me.

The CaptionTube interface fails to show captions overlaid the video

In addition to that I had to flick between the Timeline and Preview screens to see the captions I’d just created overlaid on the video. With pages taking a time to download, not to mention breaking the rhythm of what I was doing, this really held me back. Too much buffering for my liking.


Being a newbie to all this I wasn’t sure if I was expecting too much or missing the point but after a chat with Antonia Hyde – who knows a thing or two about accessible multimedia – I decided to switch to Overstream which had originally been recommended by AbilityNet.

This was altogether a lot better plus Overstream support a number of video providers: YouTube, Google Video, MySpace Video, Dailymotion, Veoh and It was pretty easy to upload a YouTube video but equally easy to miss a crucial instruction that you need to have the video in question playing in YouTube when you hit the upload button.

Overstream shows the edit box and video with captiones overlaid on the same page.

Overstream shows the edit box and video with captions overlaid on the same page.

The interface gave me much more of an integrated toolbox and by now I had an idea of what I wanted which helped. One huge bonus was being able to add text to the timeline, complete with start and end times, adjust time lengths and see in real time the text overlaid on the video on the same page.

I had a few problems trying to play the video once done in a new window with a URL warning popping up but it was easy enough to download the .srt file (with all the captions and timeline in) and upload that in turn to YouTube.


Next on my list to try is the downloadable tool MAGpie, from the National Centre for Accessible Media. I didn’t try it this time as Overstream got the job done plus MAGpie supposedly doesn’t play nicely with Intel based Mac’s. I did have a quick look at it however and while very clunky and old looking it does give you an the option to style captions which looks pretty good. I’ll be looking at this in more depth when I next caption something.

Stanford Captioning Service

John Folliot pointed me to Stanford Captioning Service which looks like an excellent service. All you need to do is upload a video file which then is put in multiple formats – FLV, MP4, MP3. These are then transcribed by Stanford contractors for a small fee. When the transcription is done Stanford do automatic timestamp generation to turn transcript into various formats – this part is free.

For my short video I was happy to transcribe and caption the audio myself but if I had longer videos to get caption I’d almost certainly use these guys. Victor Tsaran, head of accessibility at Yahoo!, used the Stanford Captioning service to caption a video about himself recently.


Lastly I dug out my login to dotSUB, who’s main selling point is enabling subtitling of videos on the web into, and from, any language. It’s also a collaborative tool so you can crowdsource community input and/or work collaboratively with your team to get the captions done. Of the tools tested this was by far simplest and easiest to use. 

Captioning tips

As soon as I got started I realised that I needed to have a process as to how I approached doing the actual work. Here are a couple of things that worked for me – let me know if you have any more worth adding to the list:

  • Transcribe text before you start captioning – you can do this yourself, pay a professional to do it or use voice recognition. Even though the last two options are less labour intensive you will need to edit and double check text – especially with voice recognition.
  • Break it down – once you have your transcript you’ll have a clear idea of the volume of words and quality. You can then break text into short sentences that fit on screen without obscuring too much of the screen real estate. All I did was use a text file and hit return after short sentences or natural breaks in a sentence. Once I started adding text to the timeline this had to be reworked as I went along but having it already drafted was a big help.
  • Editing text – if you have a text that works verbatim then great, but this is unlikely and there’s nothing wrong with removing repetitions or false starts to sentences. The key is to keep it succinct while maintaining the original meaning and flavour of the language as well as the character of the speaker.
  • Punctuation – I found that less is more. Obviously you want full stop at the end of sentences but Andrew Kirkpatrick, head of accessibility at Adobe, recommends removing commas at the end of lines. We don’t ‘see” punctuation when we hear people so visually breaking text down like this makes sense to me.
  • Timing – you can create a bit of drama, suspense and humour by remaining faithful to how people speak and using timing to replace tone. For example, someone getting excited may talk in short sentences so break the transcript down so that it is given in short segments rather than having longer segments.

Check out captioning tips from the WGHB Media Access Group, captioning tips and tools from NCDAE and W3C Multimedia FAQ for more.

How long did the whole process take?

Captioning the 4.27 minute video took be the best part of 10 hours BUT this included researching tools, false starts as well as a bit of reading around the subject. If it’s a long video you definitely want it to be transcribed for you but if a short one like this you could estimate 1 to 2 hours depending on your typing speed and how audible the sound is.

After that, once you have the hang of adding text to a timeline you should be ok. I added text and allocated times as I went along but you can add text then allocate time second if breaking the two tasks work better for you. This probably took me about 1.5 hours.

All in all I’d average out a 4 minute video at 3 hours – but this will no doubt get better as it becomes more familiar.

It’s a bit fiddly to start with but smooth running once you get the hang of it and seeing the end result is completely worthwhile. It’s satisfying to know that the captions will help not just deaf users but also non-native English speakers as well a people looking at video on their mobile phone.

Update 20 November 2009

Google have just announced automated captioning of YouTube video which will include automatic time stamping as well as transcripts. This should be available soon and will have a huge impact for many users as well as influence in promoting captioning overall.