Quantcast
Channel: Accessibility – HTML5 Doctor
Viewing all articles
Browse latest Browse all 15

Goodbye time, datetime, and pubdate. Hello data and value.

$
0
0

Please note that since this was written, <time>, datetime and (possibly) pubdate have been reinstated, and made more powerful. Doctor Bruce has the low-down in his blogpost The best of <time>s. We preserve this merely to show our grandchildren that we played a role in the Time Wars.

We’ve come a long way in the HTML5 specification’s steady march towards ratification and implementation. The WHATWG’s energy has recently been more on post-HTML5 features that are being added to “HTML The Living Standard”, plus tidying up HTML5 for Last Call. However we’re still not past losing (or gaining) an element, with last week seeing the removal of <time> and the addition of <data>.

: As per the HTML Working Group Chairs request, the W3 HTML5 Editor’s Draft spec has been reverted to include <time>. Note this means it no longer includes <data>. The WHATWG HTML: Living Standard spec is currently the opposite, still retaining <data> but with no <time>. According to Anne Van Kesteren’s post on the WHATWG Weblog, <time> will return to the WHATWG spec taking into account new use cases on the WHATWG Wiki.

TL;DR — it looks like <time> will remain, probably with more permissive datetimes, and <data> will also remain, but it’ll take a little while before the dust settles.

<time> was originally added to allow dates and times to be machine readable, via the datetime attribute. This gives us human-readable content (“yesterday”) plus hidden machine-readable content (“2011-11-02”) with no accessibility problems. It allows for e.g. browsers to offer to localise dates. The pubdate attribute indicating an article’s date of publication was added for HTML to Atom conversion (also removed from HTML5 in this change), and would make it easy for search engines to sort by date. Having permitted dates and times specified in HTML5 (a subset of ISO 8601) allows a validator to check a datetime value is valid.

<time> has been one of the easier elements to understand for authors, as it’s semantically obvious. By comparison the microformats Class-Value pattern for datetimes is clunky.

HTML5 <time> element:

<time class="dtstart" datetime="2011-10-05T09:00Z">9am on October 5</time>

Microformats Value-Class pattern:

<span class="dtstart"><abbr class="value" title="09:00">9am</abbr> on <abbr class="value" title="2011-10-05">October 5</abbr></span>

<time> has been pretty widely used for weblog article publication dates, and has made it into WordPress and Drupal plus being used by Google for search results.

The issues raised about <time> by authors were mainly that it didn’t do everything: it didn’t cover ancient and vague times, time durations, and there was no “last updated” attribute equivalent to pubdate. The other problem is there are a bunch of other less common but similar kinds of data that would also benefit from being machine readable and validatable, such as weights and prices. Minting a new element for each one would (arguably) be a lot of work, so Ian Hickson has added a generic element for these use cases instead — the <data> element, with a required value attribute.

The data element represents its contents, along with a machine-readable form of those contents in the value attribute.

The value attribute contains the machine-readable equivalent of the element’s content. The <data> element can be used as-is as an element equivalent of data-* for marking up private data for scripts (although without the dataset API). It can also be used in conjunction with microdata vocabularies (and potentially microformats), in which case the format of the value attribute is specified by the vocabulary.

This is a welcome addition as it gives us an easy way to duplicate microformats’ Value-Class pattern for more than just the datetimes <time> allowed. However as part of introducing <data>, <time> together with datetime and pubdate have been dropped.

This is controversial, with our own Dr Bruce writing:

I think this is a bad decision

Steve Faulkner has requested this change be reverted, and comments on the Twitters and blogs (Bruce, WebMonkey, Zeldman) have been mostly ranging from shock to outrage.

This is because while <time> is semantically obvious, <data> is seen as an equivalent to <div> or <span>. However, “semantics for the sake of it” isn’t enough to justify being in the spec, despite the benefits. Another reason for the dismay is many people have had trouble pushing to use HTML5, and having an element removed gives fuel to anyone arguing HTML5 isn’t suitable for production.

What’s wrong with this picture

In On Semantics in HTML, Jens Meiert lists five types of semantics in HTML markup, ordered from most to least meaningful:

  • Standards bodies: elements, attributes
  • Communities: microdata and microformats vocabularies, POSH formats
  • Common sense: functional ID and class names
  • Generic names
  • Obfuscated, random, or presentational names

Demoting datetimes from spec-specified to vocabulary-specified has several effects. For one, it’s more complex. Compare these two examples using <time> and <data> respectively:

Using <time>:

<article>
 …
 <footer>Published <time pubdate datetime="2011-11-03">today</time>.</footer>
</article>

Using <data> plus the BlogPosting schema.org microdata vocabulary:

<article itemscope itemtype="http://schema.org/BlogPosting">
 …
 <footer>Published <data itemprop="datePublished" value="2011-11-03">today</data>.</footer>
</article>

While Google would no doubt love everyone to start using schema.org vocabularies, it’s a big increase in complexity. Adding <time datetime="…" pubdate> is fairly straightforward — learning and implementing microdata plus an appropriate schema.org vocabulary … not so much. Because of this fewer people will implement machine-readable article published dates.

To make matters worse, Google’s Rich Snippets Testing Tool (so presumably Google Search too) understandably does not yet know about <data>. This means if you use <data> to replace <time> now, Google will only see the human-readable text. <data itemprop="datePublished" value="2011-11-03">today</data> is interpreted as datePublished = today, not datePublished = 2011-11-03.

Also, now that specifying a datetime is not part of HTML5 we (presumably) can no longer validate datetime values using the HTML5 validator. Instead our only option is currently doing the two-step with Google’s Rich Snippets Testing Tool. Ironically as schema.org defined dates using ISO 8601, the imprecise dates and durations requested for <time> are now valid for datePublished, even though pubdate is the one usage of datetime everyone agreed on.

Pros & Cons

Ian Hickson and, it seems, browser makers in general are for this change, whereas authors are in general against it (ahem, priority of constituencies). A couple of options would be:

  1. Scrap <data> and allow the value attribute on any element
  2. Add a type attribute to <data> to make it more semantic
  3. Bring back <time>

Unfortunately there are pros and cons for each of these options:

Allow value on any element

This conflicts with microdata, where the values of some elements are their URLs rather than their content. For example, currently <img itemprop="photo" src="http://oli.jp/photo.jpg"> gives the microdata output of photo = "http://oli.jp/photo.jpg". Adding value to the mix means there’d be two machine-readable values, so authors would need to know which elements couldn’t accept value.

Add a type attribute to <data>

This means the HTML5 specification has to specify each approved type value. While not as much work for implementers as a new element for each type of data, it’s still a bunch of work if the browsers actually do anything with that knowledge (like auto-converting type="money" into your currency). If type is required it also limits <data> to the types that are defined.

Bring back <time>

The easiest way to make everyone happy is to keep <time> in addition to adding <data>. However the cons for this are we’d have two confusingly similar but not always interchangeable ways to mark up datetimes, potentially with different rules on what’s a valid datetime. For example, we’d need to mark up an article’s published date and updated date using different syntax. Special cases and exceptions make things harder to teach and learn.

Conclusion

While our private conversation between doctors about this has tended towards the WTF end of the spectrum, I’m personally up in the air about it. Despite the easy response (“bring back <time>!”), this is one of those thorny problems where there’s no simple right answer. WHATWG is performing a delicate balancing act: pragmatically adding only features that have a lot of value, and removing any that don’t make the grade. In this case Hixie decided the cons of <time> (and of removing it from HTML5) outweighed the pros, and <data> is the result.

The one thing that bothers me about Hixie’s argument is that while datetimes are similar to other types of data that <data> now lets us mark up, they’re orders of magnitude more common on the web. Regardless of how it’s marked up, almost all weblog posts have the published date, and the majority of sites have a copyright date in the footer. In these use cases <time> was perfect, and definitely covered the 80%.

In my ideal world I’d like <time> to return, with the addition of a “pubupdate” attribute, and for all dates and times that fall inside HTML5’s definition to use <time>. For datetimes that <time> currently doesn’t cover, and for general use, we’d have <data>. Then again, I’m not sure I’d want to try teaching such intricacies to someone.

What do you think about this? While you’re welcome to just jump on the hogpile, I’d be interested to hear people consider all the pros and cons, and try to come up with a better (or less problematic) solution.

Goodbye time, datetime, and pubdate. Hello data and value. originally appeared on HTML5 Doctor on November 2, 2011.


Viewing all articles
Browse latest Browse all 15

Trending Articles