问题
I'm converting an XML document into HTML. One of the things that needs to happen is the removal of namespaces, which cannot be legally declared in HTML (unless it's the XHTML namespace in the root tag). I have found posts from 5-10 years ago about how difficult this is to do with XML::LibXML and LibXML2, but not as much recently. Here's an example:
use XML::LibXML;
use XML::LibXML::XPathContext;
use feature 'say';
my $xml = <<'__EOI__';
<myDoc>
<par xmlns:bar="www.bar.com">
<bar:foo/>
</par>
</myDoc>
__EOI__
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $bar_foo = do{
my $xpc = XML::LibXML::XPathContext->new($doc);
$xpc->registerNs('bar', 'www.bar.com');
${ $xpc->findnodes('//bar:foo') }[0];
};
$bar_foo->setNodeName('foo');
$bar_foo->setNamespace('','');
say $bar_foo->nodeName; #prints 'bar:foo'. Dang!
my @namespaces = $doc->findnodes('//namespace::*');
for my $ns (@namespaces){
# $ns->delete; #can't find any such method for namespaces
}
say $doc->toStringHTML;
In this code I tried a few things that didn't work. First I tried setting the name of the bar:foo
element to an unprefixed foo
(the documentation says that that method is aware of namespaces, but apparently not). Then I tried setting the element namespace to null, and that didn't work either. Finally, I looked through the docs for a method for deleting namespaces. No such luck. The final output string still has everything I want to remove (namespace declarations and prefixes).
Does anyone have a way to remove namespaces, setting elements and attributes to the null namespace?
回答1:
Here's my own gymnasticsy answer. If there is no better way, it will do. I sure wish there were a better way...
The replace_without_ns
method just copies nodes without the namespace. Any children elements that need the namespace get the declaration on them, instead. The code below moves the entire document into the null namespace:
use strict;
use warnings;
use XML::LibXML;
my $xml = <<'__EOI__';
<myDoc xmlns="foo">
<par xmlns:bar="www.bar.com" foo="bar">
<bar:foo stuff="junk">
<baz bar:thing="stuff"/>
fooey
<boof/>
</bar:foo>
</par>
</myDoc>
__EOI__
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
# remove namespaces for the whole document
for my $el($doc->findnodes('//*')){
if($el->getNamespaces){
replace_without_ns($el);
}
}
# replaces the given element with an identical one without the namespace
# also does this with attributes
sub replace_without_ns {
my ($el) = @_;
# new element has same name, minus namespace
my $new = XML::LibXML::Element->new( $el->localname );
#copy attributes (minus namespace namespace)
for my $att($el->attributes){
if($att->nodeName !~ /xmlns(?::|$)/){
$new->setAttribute($att->localname, $att->value);
}
}
#move children
for my $child($el->childNodes){
$new->appendChild($child);
}
# if working with the root element, we have to set the new element
# to be the new root
my $doc = $el->ownerDocument;
if( $el->isSameNode($doc->documentElement) ){
$doc->setDocumentElement($new);
return;
}
#otherwise just paste the new element in place of the old element
$el->parentNode->insertAfter($new, $el);
$el->unbindNode;
return;
}
print $doc->toStringHTML;
回答2:
Here's a simple solution using an XSLT stylesheet:
use strict;
use warnings;
use XML::LibXML;
use XML::LibXSLT;
my $xml = <<'__EOI__';
<myDoc xmlns="foo">
<par xmlns:bar="www.bar.com" foo="bar">
<bar:foo stuff="junk">
<baz bar:thing="stuff"/>
fooey
<boof/>
</bar:foo>
</par>
</myDoc>
__EOI__
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $xslt = XML::LibXSLT->new();
my $xsl_doc = $parser->parse_string(<<'XSL');
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="node()|@*"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
XSL
my $stylesheet = $xslt->parse_stylesheet($xsl_doc);
my $result = $stylesheet->transform($doc);
print $stylesheet->output_as_bytes($result);
Note that if you want to copy comments or processing instructions, further adjustments are needed.
来源:https://stackoverflow.com/questions/17756926/remove-xml-namespaces-with-xmllibxml