Parsing XML in Perl using SAX

感情迁移 提交于 2019-12-24 19:01:43

问题


I need to parse an XML file into Perl using SAX - for performing the following email validation checks.

  • If the 'Id' contains only alphanumeric characters and its length is between 5 and 10
  • If the 'LastLoginDate' is not older than 'CreationDate'
  • If 'SubscriptionMontlyFee' = 0 && 'SubscriptionType'!= free
  • If 'PaymentMode' is undefined && 'SubscriptionType'!= free
  • If Provision < 0
  • Internal Mail exists or not
  • External Mail exists or not
  • If InternalMail = External Mail

Otherwise, return an alert (print a message to notify).

accounts.xml

<?xml version="1.0" encoding="utf-8"?>
<Accounts locale="en_US">
  <Account>
    <Id>abcd</Id>
    <OwnerLastName>asd</OwnerLastName>
    <OwnerFirstName>zxc</OwnerFirstName>
    <Locked>false</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2011" month="8" month-name="fevrier" day-of-month="19" hour-of-day="15" minute="23" day-name="dimanche"/>
    <LastLoginDate year="2015" month="04" month-name="avril" day-of-month="22" hour-of-day="11" minute="13" day-name="macredi"/>
    <LoginsCount>10405</LoginsCount>
    <Locale>nl</Locale>
    <Country>NL</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1980" month="1" month-name="janvier" day-of-month="1" hour-of-day="0" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail>asdf@asdf.com</InternalMail>
    <ExternalMail>fdsa@zxczxc.com</ExternalMail>
    <GroupMemberships>
      <Group>werkgroep X.Y.Z.</Group>
    </GroupMemberships>
    <SynchroCount>6</SynchroCount>
    <LastSynchroDate year="2003" month="12" month-name="decembre" day-of-month="5" hour-of-day="12" minute="48" day-name="mardi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
  <Account>
    <Id>mnbv</Id>
    <OwnerLastName>cvbb</OwnerLastName>
    <OwnerFirstName>bvcc</OwnerFirstName>
    <Locked>true</Locked>
    <Database>mail</Database>
    <Customer>mail</Customer>
    <CreationDate year="2012" month="10" month-name="octobre" day-of-month="10" hour-of-day="10" minute="18" day-name="jeudi"/>
    <LastLoginDate/>
    <LoginsCount>0</LoginsCount>
    <Locale>fr</Locale>
    <Country>BE</Country>
    <SubscriptionType>free</SubscriptionType>
    <ActiveSubscriptionType>free</ActiveSubscriptionType>
    <SubscriptionExpiration year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
    <PaymentMode>Undefined</PaymentMode>
    <Provision>0</Provision>
    <InternalMail/>
    <ExternalMail>qweqwe@qwe.com</ExternalMail>
    <GroupMemberships/>
    <SynchroCount>0</SynchroCount>
    <LastSynchroDate year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
    <HasActiveSync>false</HasActiveSync>
    <Company/>
  </Account>
</Accounts>

I have tried several unsuccessful attempts (follows) - and will highly appreciate your help in this regard.

Attempt for doing the parsing (but unable to retrieve values from the inner-hash).

use warnings;
use strict;
use XML::SAX;
my $parser = XML::SAX::ParserFactory->parser(Handler => MySAXHandler->new);
$parser->parse_uri("accounts.xml");

package MySAXHandler;
use base qw(XML::SAX::Base);

  sub start_element {
    my ($self, $el) = @_;

    my $ElementName = $el->{Name};
    my $attr = %{$el->{Attributes}};
    my $attr_value = %{$el->{Attributes}->{'LocalName'}};

    print my $loginID, "\n";      
    print $ElementName, "\n";
    print $attr_value, "\n";
  }

For performing the validation checks.

    print "Currently looking into ".(scalar @account)."elements";

    #Checking If Login only includes Alphanumeric characters and has acceptable length
    print "ALERT - ID contains invalid characters" unless ($login =~ m/[a-zA-Z@.]+$/);
    # Or print "ALERT - ID contains invalid characters" unless ($accountRef->{"Login"} =~ /^[a-zA-Z]$/);
    print "ALERT - ID length is greater than 8 characters" unless (length.$account[0] > 20);

    #print "Suspicious ALERT - Account Creation and Login time is same" unless ($account[4] != $account[5]);
    print "Suspicious ALERT - Last Login was before the account creation" unless ($account[5] > $account[4]);
    print "Suspicious ALERT - Incorrect Login Counts" unless ($account[6] > 0 && $account[5] > $account[4]);

    #Checking if Subscription Type & Active Subscription Type is same - DISCARDED
    #print "ALERT - Preferred Subscription & Current Subscription Type is not same" unless ($account[9] eq $account[10]);

    #Checking if Subscription Fee matches the Subscription Type
    if( $account[9] eq "free" && account[12] = 0) {
        #print "The user subscription is on free subscription and there are no charges" 
        return 0;
    } elsif((account[9] eq "light" || account[9] eq "regular" || account[9] eq "advanced") && account[12] <= 0) {
        print "ALERT - The user subscription is" account[9] "and he/she is not getting charged";
    } else {
        #print "The user subscription is " $account[9] "and he/she is getting charged" account[12];
        return 0;
    }

    #Checking if the Payment Mode is undefined and the subscription type is not free
    if($account[9] ne "free" && account[13] eq 'undefined') {print "ALERT - Payment mode is not being defined and the subscription type is not free"};

    #Checking if Provision is less than 0
    print "ALERT - The user balance is in negative" unless ($account[14] >= 0 );

    #Checking if Internal Email Exists or not
    print "ALERT - The user doesn't have an internal email address" unless ($account[15] != "" );

    #Checking if External Email Exists or not
    print "ALERT - The user doesn't have an external email address" unless ($account[16] != "" );

    #Checking if External Email Exists or not
    print "ALERT - The user doesn't have an external and internal email addresses are same" unless ($account[15] ne $account[16]);

    }

回答1:


XML::LibXML::Reader provides a SAX-like interface, but you can inflate the element into a full XML::LibXML object when needed. The first two conditions would be tested as follows:

#!/usr/bin/perl
use warnings;
use strict;

use XML::LibXML::Reader;

my $r = 'XML::LibXML::Reader'->new(location => 'file.xml') or die;
while ($r->nextElement('Account')) {
    my $xml = $r->copyCurrentNode(1);

    my $id = $xml->findvalue('Id');
    if ($id !~ /^[[:alnum:]]+$/ || 5 > length $id || 10 < length $id) {
        print "Invalid Id: $id.\n";
        next
    }

    my @dates = map $xml->findnodes($_), qw( CreationDate LastLoginDate );
    my @date_strings = map sprintf('%4d%02d%02d%02d%02d',
                                   @$_{qw{ year month day-of-month hour-of-day minute }}),
                           @dates;
    if ($date_strings[0] gt $date_strings[1]) {
        print "Invalid dates for $id: @date_strings.\n";
    }

    ...
}

Note that LastLoginDate for "robynsa" is empty, so it can't be compared to CreationDate.




回答2:


Well, with XML::Twig I'd approach it something like this:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig; 
use Time::Piece; 

sub process_account {
    my ( $twig, $account ) = @_;
    print $account -> first_child_text('Id'),"\n";
    if ( $account -> first_child_text('Id') =~ m/^\w{5,10}$/ ) { 
        print "Id OK\n";
    }

    my $logindate = join ( "-", map { $account -> first_child('LastLoginDate')->att($_) // 0 } qw ( year month day-of-month ) );

    my $createdate = join ( "-", map { $account -> first_child('CreationDate')->att($_) // 0 } qw ( year month day-of-month ) );
    if ( eval{ Time::Piece -> strptime ( $logindate, "%Y-%m-%d" )} > eval{Time::Piece -> strptime ( $createdate, "%Y-%m-%d" )} ) {
         print "Create Date OK\n"; 
    }

    if (    $account -> first_child_text('SubscriptionMonthlyFee') eq '0'
        and $account -> first_child_text('SubscriptionType') eq 'free' ) {
       print "Subscription Type fee OK\n"; 

    }
    $twig -> purge; #empty data processed so far. 
}

my $twig = XML::Twig -> new ( twig_handlers => { 'Account' => \&process_account } );
   $twig -> parsefile ( 'your_xml_file'); 

We set a handler on Account elements, which gets passed each as we go. We extract fields from the record, and validate them (this just prints, you should perhaps invert the logic and abort/warn etc.).

We also use purge because one of the major reasons to use a SAX style parser is memory footprint.



来源:https://stackoverflow.com/questions/34017316/parsing-xml-in-perl-using-sax

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!