Trouble Parsing XML File Using Perl

后端 未结 3 401
盖世英雄少女心
盖世英雄少女心 2021-01-24 21:13

I\'m trying to parse an XML manifest file for an Articulate eLearning course (imsmanifest.xml).

An excerpt of the XML structure is provided below (I\'m trying to drill d

相关标签:
3条回答
  • 2021-01-24 21:40

    You're asking to locate elements named manifest in the null namespace, but you want elements named manifest in the http://www.imsproject.org/xsd/imscp_rootv1p1p2 namespace.

    Fixes:

    use strict;
    use warnings;
    
    use XML::LibXML               qw( );
    use XML::LibXML::XPathContext qw( );
    
    my $xml_qfn = 'imsmanifest.xml';
    
    my $parser = XML::LibXML->new( no_network => 1 );
    my $doc = $parser->parse_file($xml_qfn);
    
    my $xpc = XML::LibXML::XPathContext->new();
    $xpc->registerNs( a => "http://www.adlnet.org/xsd/adlcp_rootv1p2" );
    $xpc->registerNs( i => "http://www.imsproject.org/xsd/imscp_rootv1p1p2" );
    
    for my $item ($xpc->findnodes('/i:manifest/i:organizations/i:organization/i:item', $doc)) {
        my $title   = $xpc->find('i:title/text()', $item);
        my $mastery = $xpc->find('a:masteryscore/text()', $item);
        print "$title: $mastery\n"; 
    }
    

    Note: The actual choice of prefix for use in an XPaths (a and i) is arbitrary. You can pick whatever you want, just like when you compose an XML document.

    Note: I added no_network => 1 to prevent libxml from fetching the DTDs every time you parse the XML doc.

    0 讨论(0)
  • 2021-01-24 21:50

    step one, fix your example so it is properly formed xml

    <?xml version="1.0" encoding="UTF-8"?>
    <manifest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:adlcp="http://www.adlnet.org/xsd/adlcp_rootv1p2" xmlns="http://www.imsproject.org/xsd/imscp_rootv1p1p2" xsi:schemaLocation="http://www.imsproject.org/xsd/imscp_rootv1p1p2 imscp_rootv1p1p2.xsd http://www.imsglobal.org/xsd/imsmd_rootv1p2p1 imsmd_rootv1p2p1.xsd http://www.adlnet.org/xsd/adlcp_rootv1p2 adlcp_rootv1p2.xsd" version="1.0" identifier="Electrical_Design_Part_3">
        <metadata>
        <organizations default="Electrical_Design_Part_3_ORG">
          <organization identifier="Electrical_Design_Part_3_ORG">
            <title>Electrical Design - Part 3</title>
            <item identifier="Electrical_Design_Part_3_SCO" identifierref="Articulate_Presenter_RES" isvisible="true">
              <title>Electrical Design - Part 3</title>
              <adlcp:masteryscore>65</adlcp:masteryscore>
            </item>
          </organization>
        </organizations>
        <resources/>
    </metadata>
    </manifest>
    

    fire up the perl debugger

    DB<2> use XML::Simple
    
      DB<3> $x=XMLin("example.xml")
    
      DB<4> x $x
    0  HASH(0x2733c48)
       'identifier' => 'Electrical_Design_Part_3'
       'metadata' => HASH(0x2733828)
          'organizations' => HASH(0x2733288)
             'default' => 'Electrical_Design_Part_3_ORG'
             'organization' => HASH(0x272d7e8)
                'identifier' => 'Electrical_Design_Part_3_ORG'
                'item' => HASH(0x27285f8)
                   'adlcp:masteryscore' => 65
                   'identifier' => 'Electrical_Design_Part_3_SCO'
                   'identifierref' => 'Articulate_Presenter_RES'
                   'isvisible' => 'true'
                   'title' => 'Electrical Design - Part 3'
                'title' => 'Electrical Design - Part 3'
          'resources' => HASH(0x27333d8)
               empty hash
       'version' => 1.0
       'xmlns' => 'http://www.imsproject.org/xsd/imscp_rootv1p1p2'
       'xmlns:adlcp' => 'http://www.adlnet.org/xsd/adlcp_rootv1p2'
       'xmlns:xsi' => 'http://www.w3.org/2001/XMLSchema-instance'
       'xsi:schemaLocation' => 'http://www.imsproject.org/xsd/imscp_rootv1p1p2 imscp_rootv1p1p2.xsd http://www.imsglobal.org/xsd/imsmd_rootv1p2p1 imsmd_rootv1p2p1.xsd http://www.adlnet.org/xsd/adlcp_rootv1p2 adlcp_rootv1p2.xsd'
    
      DB<6> x keys %$x
    0  'xmlns'
    1  'xmlns:xsi'
    2  'identifier'
    3  'version'
    4  'metadata'
    5  'xsi:schemaLocation'
    6  'xmlns:adlcp'
      DB<9> x keys %{$x->{metadata}}
    0  'resources'
    1  'organizations'
      DB<10> x keys %{$x->{metadata}{organizations}}
    0  'default'
    1  'organization'
      DB<11> x keys %{$x->{metadata}{organizations}{organizations}
    Missing right curly or square bracket at (eval 22)[/usr/share/perl/5.14/perl5db.pl:640] line 4, at end of line
    syntax error at (eval 22)[/usr/share/perl/5.14/perl5db.pl:640] line 4, at EOF
      DB<12> x keys %{$x->{metadata}{organizations}{organizations}}
      empty array
      DB<13> x keys %{$x->{metadata}{organizations}{organization}}
    0  'identifier'
    1  'item'
    2  'title'
      DB<14> x keys %{$x->{metadata}{organizations}{organization}{item}}
    0  'identifier'
    1  'identifierref'
    2  'isvisible'
    3  'title'
    4  'adlcp:masteryscore'
      DB<19> x $x->{metadata}{organizations}{organization}{item}{'adlcp:masteryscore'}
    0  65
      DB<20> 
    

    so all you have to do is

    use XML::Simple;
    $x=XMLIN("example.xml");
    print $x->{metadata}{organizations}{organization}{item}{'adlcp:masteryscore'};
    

    Hope this helps

    0 讨论(0)
  • 2021-01-24 21:59

    the xml is not valid, you need to close the tags metadata and resources

    after that XML::Simple will work with this code

    #!/usr/bin/env perl 
    
    use strict;
    use warnings;
    use XML::Simple;
    use Data::Dumper;
    
    
    use XML::Simple qw(:strict);
    
    my $ref = XMLin('test.xml',ForceArray => [], KeyAttr => {});
    print STDERR Dumper $ref;
    
    0 讨论(0)
提交回复
热议问题