Success! Thanks to Ambush Commander and mcgrailm in another question, I am now using a hilariously simple solution:
// a bit of context
$htmlDef = $this->configuration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
// HTMLPurifier_AttrTransform_RemoveLoneHttp strips 'href="http:/"' from
// all anchor tags (see first post for class detail)
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_RemoveLoneHttp();
// this is the magic! We're making 'href' a required attribute (note the
// asterisk) - now HTML Purifier removes <a></a>, as well as
// <a href="http:/"></a> after HTMLPurifier_AttrTransform_RemoveLoneHttp
// is through with it!
$htmlDef->addAttribute('a', 'href*', new HTMLPurifier_AttrDef_URI());
It works, it works, bahahahaHAHAHAHAnhͥͤͫ̀ğͮ͑̆ͦó̓̉ͬ͋h́ͧ̆̈́̉ğ̈́͐̈a̾̈́̑ͨô̔̄̑̇g̀̄h̘̝͊̐ͩͥ̋ͤ͛g̦̣̙̙̒̀ͥ̐̔ͅo̤̣hg͓̈́͋̇̓́̆a͖̩̯̥͕͂̈̐ͮ̒o̶ͬ̽̀̍ͮ̾ͮ͢҉̩͉̘͓̙̦̩̹͍̹̠̕g̵̡͔̙͉̱̠̙̩͚͑ͥ̎̓͛̋͗̍̽͋͑̈́̚...! * manic laughter, gurgling noises, keels over with a smile on her face *
The fact that you can't remove elements with a TagTransform appears to have been an implementation detail. The classic mechanism for removing nodes (a smidge higher-level than just tags) is to use an Injector though.
Anyway, the particular piece of functionality you're looking for is already implemented as %AutoFormat.RemoveEmpty
For perusal, this is my current solution. It works, but bypasses HTML Purifier entirely.
/**
* Removes <a></a> and <a href="http:/"></a> tags from the purified
* HTML.
* @todo solve this with an injector?
* @param string $purified The purified HTML
* @return string The purified HTML, sans pointless anchors.
*/
private function anchorCull($purified)
{
if (empty($purified)) return '';
// re-parse HTML
$domTree = new DOMDocument();
$domTree->loadHTML($purified);
// find all anchors (even good ones)
$anchors = $domTree->getElementsByTagName('a');
// collect bad anchors (destroying them in this loop breaks the DOM)
$destroyNodes = array();
for ($i = 0; ($i < $anchors->length); $i++) {
$anchor = $anchors->item($i);
$href = $anchor->attributes->getNamedItem('href');
// <a></a>
if (is_null($href)) {
$destroyNodes[] = $anchor;
// <a href="http:/"></a>
} else if ($href->nodeValue == 'http:/') {
$destroyNodes[] = $anchor;
}
}
// destroy the collected nodes
foreach ($destroyNodes as $node) {
// preserve content
$retain = $node->childNodes;
for ($i = 0; ($i < $retain->length); $i++) {
$rnode = $retain->item($i);
$node->parentNode->insertBefore($rnode, $node);
}
// actually destroy the node
$node->parentNode->removeChild($node);
}
// strip out HTML out of DOM structure string
$html = $domTree->saveHTML();
$begin = strpos($html, '<body>') + strlen('<body>');
$end = strpos($html, '</body>');
return substr($html, $begin, $end - $begin);
}
I'd still much rather have a good HTML Purifier solution to this, so, just as a heads-up, this answer won't end up self-accepted. But in case no better answer ends up coming around, at least it might help those with similar issues. :)