We’ve been able to play video in the browser without a plugin for a couple of years now, and whilst there are still some codec annoyances, things appear to have settled down on the video front. The next step is adding resources to the video to make it more accessible and provide more options to the viewer.
We currently have no means to provide information about what’s happening or being said in the video, which means the video isn’t very accessible and the user can’t easily navigate to a particular section of the video. Thankfully, there’s a new format specification in the works called WebVTT (Web Video Text Tracks). As of now, it’s only in the WHATWG spec, but the recently established W3C Web Media Text Tracks Community Group should introduce a WebVTT spec to the W3C soon.
You may recall a similar format called WebSRT (Web Subtitle Resource Tracks) that was recently under discussion. WebSRT was renamed to, and has been replaced by, WebVTT.
A WebVTT (.vtt) file is simply plain text containing several types of information about the video:
- Subtitles
- The transcription or translation of the dialogue.
- Captions
- Similar to subtitles, but may also include sound effects and other audio information.
- Descriptions
- Intended to be a separate text file that describes the video through a screenreader.
- Chapters
- Intended to help the user navigate through the video.
- Metadata
- Information and content about the video, which isn’t intended to be displayed to the viewer by default, though you may wish to do so using JavaScript.
This article will mostly be talking about subtitles and captions, but it will briefly touch on chapters too.
Beyond the scope of this article but worth mentioning is the text track API, which, amongst other things, denotes how many text tracks there are and which ones have loaded and are ready for use. If you have used this API, let us know.
How to Make and Link to a WebVTT file #
All you need to make a WebVTT file is a simple text editor. Type WEBVTT
as the first line of the file and save it as a .vtt file. In the future, we expect existing captioning tools such as Universal Subtitles to export to WebVTT format.
WEBVTT
That’s all you need to get started. Next, we have to link to the file in the HTML document. We do this via the <track>
element, which is a child of the video element. The <track>
element has several optional attributes:
- the source WebVTT file (
src
), - the language of the track (
srclang
), - a user-readable
label
, and - what
kind
of track it is. The values of thekind
attribute come from the list above (i.e.,subtitles
,captions
, etc.).
In the following example, we’re using a <track>
element for subtitles:
<video width="640" height="480" controls>
<source src="video.mp4" type="video/mp4" />
<source src="video.webm" type="video/webm" />
<track src="subtitles.vtt" kind="subtitles" srclang="en" label="English" />
<!-- fallback for rubbish browsers -->
</video>
A few notes about the attributes:
- If no
kind
is specified, the default issubtitles
. - If the
kind
issubtitles
, thensrclang
is required. - There should not be two tracks of the same
kind
with the samelabel
.
In the above example, we use a <video>
element with two different <src>
elements (for cross-browser loveliness). After the sources comes the <track>
element. You can have several <track>
elements as you might have subtitles, captions, and descriptions all in different languages.
<track>
doesn’t presuppose a particular file format. Microsoft supports the TTML format, but this format will not be implemented by other browser vendors.
WebVTT Contents #
We now know how to make a WebVTT file and how to reference it in an HTML document, but what goes inside it? Within the file, we list what are known as “cues”. The WebVTT file might only have one cue, but it can contain as many as you want. Each cue starts with an ID, followed by the time settings, followed by the text. Each cue is separated by a blank line. Here’s an example of captions:
WEBVTT
1
00:00:01.000 --> 00:00:10.000
This is the first line of text, displaying from 1-10 seconds
2
00:00:15.000 --> 00:00:20.000
And the second line of text
separated over two lines
The above example has two cues. Times must be written in hh:mm:ss.mmm
format, so the timings in this example occur in the first twenty seconds. The second cue will split the text over two lines automatically.
If you have a segment of text that needs to appear in a karaoke/paint-on caption style, then you can place timers inline with text:
1
00:00:01.000 --> 00:00:10.000
Never gonna give you up <00:00:01.000> Never gonna let you down <00:00:05.000> Never gonna run around and desert you
Styling Options #
The previous examples specify the minimum configuration you need for subtitling and captioning, but you can style your captions too. Let’s start with the cue settings, which are done on the same line as the time settings:
D:vertical / D:vertical-lr
- Display the text vertically rather than horizontally. This also specifies whether the text grows to the left (
vertical
) or to the right (vertical-lr
). L:X / L:X%
- Either a number or a percentage. If a percentage, then it is the position from the top of the frame. If a number, this represents what line number it will be.
T:X%
- The position of the text horizontally on the video.
T:100%
would place the text on the right side of the video. A:start / A:middle / A:end
- The alignment of the text within its box –
start
is left-aligned,middle
is centre-aligned, andend
is right-aligned. S:X%
- The width of the text box as a percentage of the video width.
To make use of these settings, put them alongside the time settings, like this:
00:00:01.000 --> 00:00:10.000 A:middle T:50%
00:00:01.000 --> 00:00:10.000 A:end D:vertical
00:00:01.000 --> 00:00:10.000 A:start T:100% L:0%
which would result in something like the following:
Clik here to view.

Along with the above cue settings, you can use inline styles for text:
- Bold text
<b>Lorem ipsum</b>
- Italic text
<i>dolor sit amet</i>
- Underlined text
<u>consectetuer adipiscing</u>
- Ruby text
<ruby>見<rt>み</rt></ruby>
You can even apply a CSS class to a section of text using <c.myClass>Lorem ipsum</c>
, giving us many more styling options.
Finally, you can add a declaration representing the name of the voice: <v Tom>Hello world</v>
. This declaration accomplishes three things:
- The caption will display the voice (Tom) in addition to the caption text.
- The name of the voice can be read by a screenreader, possibly event using a different voice for male or female names.
- It offers a hook for styling so that, for example, all captions for Tom could be in blue.
Chapters
You can provide a chapter list for the video the same way you would provide subtitles or captions. Start with the same WEBVTT
declaration, and then for each cue, declare the chapter number, the start and stop times, and the chapter title:
<track src="chapters.vtt" kind="chapters" srclang="en" />
<track>
element for providing chapters to a videoWEBVTT
Chapter 1
00:00:01.000 --> 00:00:10.000>
Introduction to HTML5
Browser Support #
One slight glitch with WebVTT: not a single browser currently supports it. All major browsers have started working on implementations, so we should see some results soon. Thankfully, in the meantime, there are several JavaScript polyfills available:
- js_videosub
- Playr
- MediaElementJS
- LeanBack player (and upcoming new version)
- Captionator
Demo #
We’ve put together a quick demo which uses the Playr polyfill. We started using MediaElementJS, but it doesn’t sport as many features as Playr, such as separate lines of text and CSS classes. In the demo, the subtitles start at 2 seconds and 15 seconds and use bold, underline, and custom styles. Here’s the associated WebVTT file.
Conclusion #
This article covers the basics of creating a WebVTT file suitable for subtitles or captions for a video. We know how to add cues and chapters and how to add styles and change how the text appears on the video. Although no browser yet supports it, there’s a lot more to come for accessible video, so stay tuned to the W3C Web Media Text Tracks Community Group.
What are your thoughts on WebVTT? Are any of you using it now? How can it be improved?
Finally, let’s thank @silviapfeiffer for taking the time to answer some questions about WebVTT and for her tremendous work in this field.
Reading #
- Follow @silviapfeiffer
- W3C Web Media Text Tracks Community Group
- Recent developments around WebVTT
- Presentation: HTML5 video accessibility and the WebVTT file format
- HTML5 video accessibility and the WebVTT file format – Audio Described
- A review with notes and thoughts for LeanBack Player
- WebVTT validator
- WebVTT and Video Subtitles
- The Open Video Alliance
- Understanding WebVTT file format (draft)
- Creating subtitles and audio descriptions with HTML5 video
Video Subtitling and WebVTT originally appeared on HTML5 Doctor on November 29, 2011.