Can field separator in awk encompass multiple characters?

∥☆過路亽.° 提交于 2019-11-29 17:10:39

问题


Can I use a field separator consisting of multiple characters? Like I want to separate words which contain quotes and commas between them viz.

"School","College","City"

So here I want to set my FS to be ",". But I am getting funny results when I define my FS like that. Here's a snippet of my code.

awk -F\",\" '
{
for(i=1;i<=NF;i++)
  {
    if($i~"[a-z0-9],[a-z0-9]") 
    print $i
  }
}' OFS=\",\"  $* 

回答1:


yes, FS could be multi-characters. see the below test with your example:

kent$  echo '"School","College","City"'|awk -F'","|^"|"$' '{for(i=1;i<=NF;i++){if($i)print $i}}'
School
College
City



回答2:


What's being talked around here is that the Field Separator isn't just limited to being multiple characters but can actually be a full-blown regex.

To wit: This strips out the header and surrounding tags from an XML fragment. Note that tags are well-formed, but different.

bash-3.2$ more xml_example 
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
                  http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
<url>
<loc>http://www.foo.com/about.html</loc>
<lastmod>2006-05-15T13:43:37Z</lastmod>
<priority>0.5000</priority>
</url>
<url>
<loc>http://www.foo.com/articles/articles.html</loc>
<lastmod>2006-06-20T23:03:36Z</lastmod>
<priority>0.5000</priority>
</url>

Now we apply the awk script to print out the middle field, using a regex as the field separator:

bash-3.2$ awk -F"<(/?)[a-z]+>" '{print $2}' <xml_example




http://www.foo.com/about.html
2006-05-15T13:43:37Z
0.5000


http://www.foo.com/articles/articles.html
2006-06-20T23:03:36Z
0.5000

bash-3.2$

The blank lines are from where a tag was the only thing on that line, so there is no $2 to print. This is actually really powerful because it means that you can not only use fixed patterns with multiple characters but the full power of regular expressions as well in your field separator.




回答3:


Try

awk 'BEGIN{FS="[|,:]"}{print $1}' youFile



回答4:


With GNU awk 4 you can easily parse even *CSV*s with embedded separators and quotes:

% cat infile 
"School",College: "My College","City, I"

% awk '{    
  for (i = 0; ++i <= NF;)
    print i, substr($i, 1, 1) == "\042" ?
      substr($i, 2, length($i) - 2) : $i
  }' FPAT='([^,]+)|(\"[^\"]+\")' infile  
1 School
2 College: "My College"
3 City, I



回答5:


Yes, you can use multiple characters for the -F argument because that value can be a regular expression. For example you can do things like:

echo "hello:::my:::friend" | gawk -F':::' '{print $3}'

which will return friend.

The support for regexp as the argument to -F is true for nawk and gawk (GNU awk), the original awk does not support it. On Solaris this distinction is important, on Linux it is not important because awk is effectively a link to gawk. I would therefore say it is best practice to invoke awk as gawk because then it will work across platforms.



来源:https://stackoverflow.com/questions/8257865/can-field-separator-in-awk-encompass-multiple-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!