Populate array from XML end tags

前端 未结 1 1264
遇见更好的自我
遇见更好的自我 2021-01-17 06:18

I am trying to create an array of field names that I can use later in my script. Regular expressions are kicking my butt. I haven\'t written code in a long time. The fiel

相关标签:
1条回答
  • 2021-01-17 06:51

    Your sample data isn't XML. Your slashes are backwards. Assuming it is XML you're trying to parse, the answer is 'don't use regular expressions'.

    They're simply not able to cope with the recursion and nesting to the degree necessary.

    So with that in mind - assuming your sample data is actually well formed XML and that is a typo, something like XML::Twig will do it quite handily:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    use XML::Twig;
    
    my $twig = XML::Twig -> parse ( \*DATA );
    
    #extract a single field value
    print $twig -> root -> first_child_text('title'),"\n";
    #get a field name
    print $twig -> root -> first_child -> tag,"\n";
    #can also use att() if you have attributes
    
    
    print "Field names:\n";
    #children() returns all the children of the current (in this case root) node
    #We use map to access all, and tag to read their 'name'. 
    #att or trimmed_text would do other parts of the XML. 
    print join ( "\n", map { $_ -> tag } $twig -> root -> children );
    
    __DATA__
    <XML>
    <record>DEFECT000179</record><state>Approved</state><title>Something is broken</title>
    </XML>
    

    This prints:

    Something is broken
    record
    Field names:
    record
    state
    title
    

    You also have a variety of other really useful tools, such as pretty_print for formatting your output XML, twig_handlers that let you manipulate XML as you parse (particularly handy for purge), cut and paste to move nodes around, and get_xpath to let you use an xpath expression to find elements based on path and attributes.

    Edit: Based on comments, if you really want to extract data from:

    </something>
    

    The thing that's going wrong in your thingy is that .* is greedy. You either need to use a negated match - like:

    m,</[^>]>,g 
    

    Or a nongreedy match:

    m,</(.*?)>,g
    

    Oh, and given you've a backslash - you need to escape it:

    my $firstLineOfXMLFile = '<record>DEFECT000179<\record><state>Approved<\state><title>Something is broken<\title>';
    my @fieldNames = $firstLineOfXMLFile =~ m(<\\(.*?)>)g;
    print @fieldNames;
    

    Will do the trick. (but seriously - deliberately creating something that looks like XML that isn't is a really bad thing to do)

    0 讨论(0)
提交回复
热议问题