In Perl, how can I correctly parse tab/space delimited files with quoted strings?

筅森魡賤 提交于 2019-12-07 08:01:10

问题


I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else.

When I try to parse them with the split function it splits these strings as well. Now how can I make perl understand that the strings within the " " are a single column entry?

A simple example is,

12  345546.67677   "Hello World!!!" -567.55656 0.5465767 "Hello_Again;   "

回答1:


Use the Text::CSV library, which handles all the edge cases for you. It lets you set the delimiter:

my $csv = Text::CSV->new({sep_char => "\t"});



回答2:


Note that you say tab/space delimited. If delimiters are mixed and/or you have to treat consecutive spaces as one, using Text::ParseWords might be easier:

#!/usr/bin/perl

use Text::ParseWords qw( quotewords );
use YAML;

while ( my $line = <DATA> ) {
    print Dump [ quotewords('\s+', 0, $line) ];
}

__DATA__
12  345546.67677   "Hello World!!!" -567.55656 0.5465767 "Hello_Again;   "

Output:

---
- 12
- 345546.67677
- Hello World!!!
- -567.55656
- 0.5465767
- 'Hello_Again;   '



回答3:


Other possibilities are Regexp::Common::balanced and Text::Balanced.



来源:https://stackoverflow.com/questions/4500407/in-perl-how-can-i-correctly-parse-tab-space-delimited-files-with-quoted-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!