问题
I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else.
When I try to parse them with the split function it splits these strings as well. Now how can I make perl understand that the strings within the " " are a single column entry?
A simple example is,
12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; "
回答1:
Use the Text::CSV library, which handles all the edge cases for you. It lets you set the delimiter:
my $csv = Text::CSV->new({sep_char => "\t"});
回答2:
Note that you say tab/space delimited. If delimiters are mixed and/or you have to treat consecutive spaces as one, using Text::ParseWords might be easier:
#!/usr/bin/perl
use Text::ParseWords qw( quotewords );
use YAML;
while ( my $line = <DATA> ) {
print Dump [ quotewords('\s+', 0, $line) ];
}
__DATA__
12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; "
Output:
--- - 12 - 345546.67677 - Hello World!!! - -567.55656 - 0.5465767 - 'Hello_Again; '
回答3:
Other possibilities are Regexp::Common::balanced and Text::Balanced.
来源:https://stackoverflow.com/questions/4500407/in-perl-how-can-i-correctly-parse-tab-space-delimited-files-with-quoted-strings