character “|” in R

匿名 (未验证) 提交于 2019-12-03 01:31:01

问题:

I would like to split a string of character at pattern "|"

but

unlist(strsplit("I am | very smart", " | "))  [1] "I"     "am"    "|"     "very"  "smart" 

or

gsub(pattern="|", replacement="*", x="I am | very smart")      [1] "*I* *a*m* *|* *v*e*r*y* *s*m*a*r*t*" 

回答1:

Use fixed argument:

unlist(strsplit("I am | very smart", " | ", fixed=TRUE)) # [1] "I am"       "very smart" 

Side effect is faster computation.

stringr alternative:

unlist(stringr::str_split("I am | very smart", fixed(" | "))) 


回答2:

| is a metacharacter. You need to escape it (using \\ before it).

> unlist(strsplit("I am | very smart", " \\| ")) [1] "I am"       "very smart" > sub(pattern="\\|", replacement="*", x="I am | very smart") [1] "I am * very smart" 

Edit: The reason you need two backslashes is that the single backslash prefix is reserved for special symbols such as \n (newline) and \t (tab). For more information look in the help page ?regex. The other metacharacters are . \ | ( ) [ { ^ $ * + ?



回答3:

If you are parsing a table than calling read.table might be a better option. Tiny example:

> txt  read.table(txt, sep='|')      V1          V2 1 I am   very smart 

So I would suggest to fetch the wiki page with Rcurl, grab the interesting part of the page with XML (which has a really neat function to parse HTML tables also) and if HTML format is not available call read.table with specified sep. Good luck!



回答4:

Pipe '|' is a metacharacter, used as an 'OR' operator in regular expression.

try unlist(strsplit("I am | very smart", "\s+\|\s+"))



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!