Perl parse links from HTML Table

后端 未结 2 1460
花落未央
花落未央 2021-01-25 12:51

I\'m trying to get links from table in HTML. By using HTML::TableExtract, I\'m able to parse table and get text (i.e. Ability, Abnormal in below example) but cannot get link tha

相关标签:
2条回答
  • 2021-01-25 13:12

    Use keep_html option in the constructor.

    keep_html

    Return the raw HTML contained in the cell, rather than just the visible text. Embedded tables are not retained in the HTML extracted from a cell. Patterns for header matches must take into account HTML in the string if this option is enabled. This option has no effect if extracting into an element tree structure.

    $te = HTML::TableExtract->new( keep_html => 1, headers => [qw(field1 ... fieldN)]);
    
    0 讨论(0)
  • 2021-01-25 13:31

    HTML::LinkExtor, passing the extracted table text to its parse method.

    my $le = HTML::LinkExtor->new();
    
    foreach $ts ($te->tables){
        foreach $row ($ts->rows){
            $le->parse($row->[0]);
            for my $link_tag ( $le->links ) {
                my ($tag, %links) = @$link_tag;
                # next if $tag ne 'a'; # exclude other kinds of links?
                print for values %links;
            }
        }
    }
    
    0 讨论(0)
提交回复
热议问题