问题
I need to parse an XML file into Perl using SAX - for performing the following email validation checks.
- If the 'Id' contains only alphanumeric characters and its length is between 5 and 10
- If the 'LastLoginDate' is not older than 'CreationDate'
- If 'SubscriptionMontlyFee' = 0 && 'SubscriptionType'!= free
- If 'PaymentMode' is undefined && 'SubscriptionType'!= free
- If Provision < 0
- Internal Mail exists or not
- External Mail exists or not
- If InternalMail = External Mail
Otherwise, return an alert (print a message to notify).
accounts.xml
<?xml version="1.0" encoding="utf-8"?>
<Accounts locale="en_US">
<Account>
<Id>abcd</Id>
<OwnerLastName>asd</OwnerLastName>
<OwnerFirstName>zxc</OwnerFirstName>
<Locked>false</Locked>
<Database>mail</Database>
<Customer>mail</Customer>
<CreationDate year="2011" month="8" month-name="fevrier" day-of-month="19" hour-of-day="15" minute="23" day-name="dimanche"/>
<LastLoginDate year="2015" month="04" month-name="avril" day-of-month="22" hour-of-day="11" minute="13" day-name="macredi"/>
<LoginsCount>10405</LoginsCount>
<Locale>nl</Locale>
<Country>NL</Country>
<SubscriptionType>free</SubscriptionType>
<ActiveSubscriptionType>free</ActiveSubscriptionType>
<SubscriptionExpiration year="1980" month="1" month-name="janvier" day-of-month="1" hour-of-day="0" minute="0" day-name="jeudi"/>
<SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
<PaymentMode>Undefined</PaymentMode>
<Provision>0</Provision>
<InternalMail>asdf@asdf.com</InternalMail>
<ExternalMail>fdsa@zxczxc.com</ExternalMail>
<GroupMemberships>
<Group>werkgroep X.Y.Z.</Group>
</GroupMemberships>
<SynchroCount>6</SynchroCount>
<LastSynchroDate year="2003" month="12" month-name="decembre" day-of-month="5" hour-of-day="12" minute="48" day-name="mardi"/>
<HasActiveSync>false</HasActiveSync>
<Company/>
</Account>
<Account>
<Id>mnbv</Id>
<OwnerLastName>cvbb</OwnerLastName>
<OwnerFirstName>bvcc</OwnerFirstName>
<Locked>true</Locked>
<Database>mail</Database>
<Customer>mail</Customer>
<CreationDate year="2012" month="10" month-name="octobre" day-of-month="10" hour-of-day="10" minute="18" day-name="jeudi"/>
<LastLoginDate/>
<LoginsCount>0</LoginsCount>
<Locale>fr</Locale>
<Country>BE</Country>
<SubscriptionType>free</SubscriptionType>
<ActiveSubscriptionType>free</ActiveSubscriptionType>
<SubscriptionExpiration year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
<SubscriptionMonthlyFee>0</SubscriptionMonthlyFee>
<PaymentMode>Undefined</PaymentMode>
<Provision>0</Provision>
<InternalMail/>
<ExternalMail>qweqwe@qwe.com</ExternalMail>
<GroupMemberships/>
<SynchroCount>0</SynchroCount>
<LastSynchroDate year="1970" month="1" month-name="janvier" day-of-month="1" hour-of-day="1" minute="0" day-name="jeudi"/>
<HasActiveSync>false</HasActiveSync>
<Company/>
</Account>
</Accounts>
I have tried several unsuccessful attempts (follows) - and will highly appreciate your help in this regard.
Attempt for doing the parsing (but unable to retrieve values from the inner-hash).
use warnings;
use strict;
use XML::SAX;
my $parser = XML::SAX::ParserFactory->parser(Handler => MySAXHandler->new);
$parser->parse_uri("accounts.xml");
package MySAXHandler;
use base qw(XML::SAX::Base);
sub start_element {
my ($self, $el) = @_;
my $ElementName = $el->{Name};
my $attr = %{$el->{Attributes}};
my $attr_value = %{$el->{Attributes}->{'LocalName'}};
print my $loginID, "\n";
print $ElementName, "\n";
print $attr_value, "\n";
}
For performing the validation checks.
print "Currently looking into ".(scalar @account)."elements";
#Checking If Login only includes Alphanumeric characters and has acceptable length
print "ALERT - ID contains invalid characters" unless ($login =~ m/[a-zA-Z@.]+$/);
# Or print "ALERT - ID contains invalid characters" unless ($accountRef->{"Login"} =~ /^[a-zA-Z]$/);
print "ALERT - ID length is greater than 8 characters" unless (length.$account[0] > 20);
#print "Suspicious ALERT - Account Creation and Login time is same" unless ($account[4] != $account[5]);
print "Suspicious ALERT - Last Login was before the account creation" unless ($account[5] > $account[4]);
print "Suspicious ALERT - Incorrect Login Counts" unless ($account[6] > 0 && $account[5] > $account[4]);
#Checking if Subscription Type & Active Subscription Type is same - DISCARDED
#print "ALERT - Preferred Subscription & Current Subscription Type is not same" unless ($account[9] eq $account[10]);
#Checking if Subscription Fee matches the Subscription Type
if( $account[9] eq "free" && account[12] = 0) {
#print "The user subscription is on free subscription and there are no charges"
return 0;
} elsif((account[9] eq "light" || account[9] eq "regular" || account[9] eq "advanced") && account[12] <= 0) {
print "ALERT - The user subscription is" account[9] "and he/she is not getting charged";
} else {
#print "The user subscription is " $account[9] "and he/she is getting charged" account[12];
return 0;
}
#Checking if the Payment Mode is undefined and the subscription type is not free
if($account[9] ne "free" && account[13] eq 'undefined') {print "ALERT - Payment mode is not being defined and the subscription type is not free"};
#Checking if Provision is less than 0
print "ALERT - The user balance is in negative" unless ($account[14] >= 0 );
#Checking if Internal Email Exists or not
print "ALERT - The user doesn't have an internal email address" unless ($account[15] != "" );
#Checking if External Email Exists or not
print "ALERT - The user doesn't have an external email address" unless ($account[16] != "" );
#Checking if External Email Exists or not
print "ALERT - The user doesn't have an external and internal email addresses are same" unless ($account[15] ne $account[16]);
}
回答1:
XML::LibXML::Reader provides a SAX-like interface, but you can inflate the element into a full XML::LibXML object when needed. The first two conditions would be tested as follows:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML::Reader;
my $r = 'XML::LibXML::Reader'->new(location => 'file.xml') or die;
while ($r->nextElement('Account')) {
my $xml = $r->copyCurrentNode(1);
my $id = $xml->findvalue('Id');
if ($id !~ /^[[:alnum:]]+$/ || 5 > length $id || 10 < length $id) {
print "Invalid Id: $id.\n";
next
}
my @dates = map $xml->findnodes($_), qw( CreationDate LastLoginDate );
my @date_strings = map sprintf('%4d%02d%02d%02d%02d',
@$_{qw{ year month day-of-month hour-of-day minute }}),
@dates;
if ($date_strings[0] gt $date_strings[1]) {
print "Invalid dates for $id: @date_strings.\n";
}
...
}
Note that LastLoginDate
for "robynsa" is empty, so it can't be compared to CreationDate
.
回答2:
Well, with XML::Twig
I'd approach it something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Time::Piece;
sub process_account {
my ( $twig, $account ) = @_;
print $account -> first_child_text('Id'),"\n";
if ( $account -> first_child_text('Id') =~ m/^\w{5,10}$/ ) {
print "Id OK\n";
}
my $logindate = join ( "-", map { $account -> first_child('LastLoginDate')->att($_) // 0 } qw ( year month day-of-month ) );
my $createdate = join ( "-", map { $account -> first_child('CreationDate')->att($_) // 0 } qw ( year month day-of-month ) );
if ( eval{ Time::Piece -> strptime ( $logindate, "%Y-%m-%d" )} > eval{Time::Piece -> strptime ( $createdate, "%Y-%m-%d" )} ) {
print "Create Date OK\n";
}
if ( $account -> first_child_text('SubscriptionMonthlyFee') eq '0'
and $account -> first_child_text('SubscriptionType') eq 'free' ) {
print "Subscription Type fee OK\n";
}
$twig -> purge; #empty data processed so far.
}
my $twig = XML::Twig -> new ( twig_handlers => { 'Account' => \&process_account } );
$twig -> parsefile ( 'your_xml_file');
We set a handler on Account
elements, which gets passed each as we go. We extract fields from the record, and validate them (this just prints, you should perhaps invert the logic and abort/warn etc.).
We also use purge
because one of the major reasons to use a SAX
style parser is memory footprint.
来源:https://stackoverflow.com/questions/34017316/parsing-xml-in-perl-using-sax