Many blogs use the concept of \"tags\" and \"categories\" to add metadata to a post. What is the best practice for semantic markup for this information, such that a machine
The first step would be to get/use the plain HTML semantically right. In case of (X)HTML5 you should build an appropriate outline using the sectioning content elements section
, article
, aside
and nav
, and use header
and footer
to separate the metadata content from the main content; also think of inline-level semantics like time
(publication date), dfn
(definitions), abbr
(abbreviations/acronyms) etc. And make use of meta
-name
and rel
values that are defined in the spec.
The second step would be to make use of metadata attribute values that are not defined in the specification, but are registered at specified places (so they are valid to use), like name keywords for meta elements and rel values for a/area/link elements.
The third step would be to enhance the markup with semantic, machine-readable annotations. There are three common ways to do this:
class
and rel
values)RDFa and Microdata are similar (both extensible and rather complex), while Microformats is simpler (but not so expressive/extensible). I wrote a short answer over at Programmers about the differences, and more detailed answer about the differences between Microdata and RDFa.
In the case of RDFa or Microdata, your main job would be to find vocabularies/ontologies that are able to describe/classify your content. Such vocabularies can be created by everyone (you could even create one yourself), but it's often advisable to use well-known/popular ones, for example so that search engines can make use of your annotations (popular example: Schema.org).
In the case of Microformats, you'd have to find a Microformat (on the wiki at microformats.org) that suits your needs. If there is none for your case, you could propose a new Microformat (but that would take some time until it gets "accepted", if at all).
Is HTML5 a reasonable choice if I want to be so picky about metadata, or should I be using an XML doctype?
You could also use XHTML5, if you need/want XML support. If you "only" use the (X)HTML defined in the specification and no additional XML schemas/vocabularies, it won't matter from a semantic perspective if you use HTML(5) or XHTML(5).