HTML5 semantic markup for blog post tags and categories

问题

Many blogs use the concept of "tags" and "categories" to add metadata to a post. What is the best practice for semantic markup for this information, such that a machine reading the blog post could easily identify the tags?

Currently I add "tag" to the rel attribute on the link, e.g.

<a rel="tag" class="tag" href="/tags.html#site-configuration">#site-configuration</a>

I suppose one could use Dublin Core's html format for keyword:

<meta name = "DC.Subject"
          content = "site-configuration">

and add this to the page header, or can meta tags go in the body? Is one or the other preferable, or some entirely different option?

Is there a better strategy in terms of providing precise and standardized definitions for content?

Is HTML5 a reasonable choice if I want to be so picky about metadata, or should I be using an XML doctype?

What are the pros and cons of the different approaches?

回答1:

The first step would be to get/use the plain HTML semantically right. In case of (X)HTML5 you should build an appropriate outline using the sectioning content elements section, article, aside and nav, and use header and footer to separate the metadata content from the main content; also think of inline-level semantics like time (publication date), dfn (definitions), abbr (abbreviations/acronyms) etc. And make use of meta-name and rel values that are defined in the spec.

The second step would be to make use of metadata attribute values that are not defined in the specification, but are registered at specified places (so they are valid to use), like name keywords for meta elements and rel values for a/area/link elements.

The third step would be to enhance the markup with semantic, machine-readable annotations. There are three common ways to do this:

Microformats (using pre-defined class and rel values)
RDFa (using attributes and URIs)
Microdata (using attributes and URIs)

RDFa and Microdata are similar (both extensible and rather complex), while Microformats is simpler (but not so expressive/extensible). I wrote a short answer over at Programmers about the differences, and more detailed answer about the differences between Microdata and RDFa.

In the case of RDFa or Microdata, your main job would be to find vocabularies/ontologies that are able to describe/classify your content. Such vocabularies can be created by everyone (you could even create one yourself), but it's often advisable to use well-known/popular ones, for example so that search engines can make use of your annotations (popular example: Schema.org).

In the case of Microformats, you'd have to find a Microformat (on the wiki at microformats.org) that suits your needs. If there is none for your case, you could propose a new Microformat (but that would take some time until it gets "accepted", if at all).

Is HTML5 a reasonable choice if I want to be so picky about metadata, or should I be using an XML doctype?

You could also use XHTML5, if you need/want XML support. If you "only" use the (X)HTML defined in the specification and no additional XML schemas/vocabularies, it won't matter from a semantic perspective if you use HTML(5) or XHTML(5).

来源：https://stackoverflow.com/questions/12866008/html5-semantic-markup-for-blog-post-tags-and-categories

标签

xml

html5

metadata

schema.org