stringr

Detect multiple strings with dplyr and stringr

ε祈祈猫儿з 提交于 2019-12-03 02:47:56
I'm trying to combine dplyr and stringr to detect multiple patterns in a dataframe. I want to use dplyr as I want to test a number of different columns. Here's some sample data: test.data <- data.frame(item = c("Apple", "Bear", "Orange", "Pear", "Two Apples")) fruit <- c("Apple", "Orange", "Pear") test.data item 1 Apple 2 Bear 3 Orange 4 Pear 5 Two Apples What I would like to use is something like: test.data <- test.data %>% mutate(is.fruit = str_detect(item, fruit)) and receive item is.fruit 1 Apple 1 2 Bear 0 3 Orange 1 4 Pear 1 5 Two Apples 1 A very simple test works > str_detect("Apple",

Extract character before and after “/”

你。 提交于 2019-12-02 23:34:56
问题 I'm trying to extract character before and after "/" with no success. Sentences are: XXXX YYY ZZZ - AV HAHEHRS, 3061 - SDDW ASDA DDSF - SAO JOSE DOS CAMPOS / SP - CEP: 00000-000 Output should be SAO JOSE DOS CAMPOS / SP I'm trying str_extract(str, "- [a-zA-Z]{1,} / [a-zA-Z]{1,}") but it's just bringing me CAMPOS / SP 回答1: In your regex there is the space missing. Try: str_extract(str, "- [a-zA-Z ]+ / [a-zA-Z ]+") Note the space in the character class. Also, {1,} is the long form of + . The

Search for multiple values in a column in R

。_饼干妹妹 提交于 2019-12-02 19:11:49
问题 I have a data frame with two columns: df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue","the dog is black; the bird is yellow and blue", "the bird is blue"), stringsAsFactors = F) I'd need the sum of the occurrences of all the "animals" on each row in the entire "sentences" column. For example: "animals" first row c("cat; dog; bird") = sum_occurrences_sentences_column (cat = 1) + (dog = 2) +

Remove a list of whole words that may contain special chars from a character vector without matching parts of words

烈酒焚心 提交于 2019-12-02 14:01:31
问题 I have a list of words in R as shown below: myList <- c("at","ax","CL","OZ","Gm","Kg","C100","-1.00") And I want to remove the words which are found in the above list from the text as below: myText <- "This is at Sample ax Text, which CL is OZ better and cleaned Gm, where C100 is not equal to -1.00. This is messy text Kg." After removing the unwanted myList words, the myText should look like: This is at Sample Text, which is better and cleaned, where is not equal to. This is messy text. I was

Extracting a string between other two strings in R

大憨熊 提交于 2019-12-02 13:36:23
I am trying to find a simple way to extract an unknown substring (could be anything) that appear between two known substrings. For example, I have a string: a<-" anything goes here, STR1 GET_ME STR2, anything goes here" I need to extract the string GET_ME which is between STR1 and STR2 (without the white spaces). I am trying str_extract(a, "STR1 (.+) STR2") , but I am getting the entire match [1] "STR1 GET_ME STR2" I can of course strip the known strings, to isolate the substring I need, but I think there should be a cleaner way to do it by using a correct regular expression. You may use str

Search for multiple values in a column in R

ⅰ亾dé卋堺 提交于 2019-12-02 07:47:38
I have a data frame with two columns: df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue","the dog is black; the bird is yellow and blue", "the bird is blue"), stringsAsFactors = F) I'd need the sum of the occurrences of all the "animals" on each row in the entire "sentences" column. For example: "animals" first row c("cat; dog; bird") = sum_occurrences_sentences_column (cat = 1) + (dog = 2) + (bird = 3) = 6 . The result will be a third column like this: df <- cbind( sum_accurrences_sentences

Removing characters after a EURO symbol in R

拥有回忆 提交于 2019-12-02 06:31:00
问题 I have a euro symbol saved in "euro" variable: euro <- "\u20AC" euro #[1] "€" And "eurosearch" variable contains "services as defined in this SOW at a price of € 15,896.80 (if executed fro" . eurosearch [1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro" I want the characters after the Euro symbol which is "15,896.80 (if executed fro" I am using this code: gsub("^.*[euro]","",eurosearch) But I'm getting empty result. How can I obtain the expected output? 回答1: You

stringr function to concatenate vector of words separated by comma with “and” before last word

橙三吉。 提交于 2019-12-02 05:46:39
问题 I know I can easily write one, but does anyone know if stringr (or stringi) already has a function that concatenates a vector of one or more words separated by commas, but with an "and" before the last word? 回答1: You can use the knitr::combine_words function knitr::combine_words(letters[1:2]) # [1] "a and b" knitr::combine_words(letters[1:3]) # [1] "a, b, and c" knitr::combine_words(letters[1:4]) # [1] "a, b, c, and d" 回答2: Here's another solution : enum <- function(x) paste(c(head(x,-2),

strsplit by spaces greater than one in R

﹥>﹥吖頭↗ 提交于 2019-12-02 04:55:11
Given a string, mystr = "Average student score 88" I wish to split if there are more than 1 space. I wish to obtain the following: "Average student score" "88" I searched that "\s+" will split by any number of spaces. strsplit(mystr, "\\s+") But this is not what I want. Is there any option within strsplit that can split strings based on a certain number of spaces (say space = k) or a rule on spaces (say space > 1)? Avinash Raj You may specify it through a repetition quantifier. strsplit(mystr, "\\s{2,}") \\s{2,} regex should match two or more spaces. 来源: https://stackoverflow.com/questions

Substring extraction from vector in R

老子叫甜甜 提交于 2019-12-02 03:49:39
问题 I am trying to extract substrings from a unstructured text. For example, assume a vector of country names: countries <- c("United States", "Israel", "Canada") How do I go about passing this vector of character values to extract exact matches from unstructured text. text.df <- data.frame(ID = c(1:5), text = c("United States is a match", "Not a match", "Not a match", "Israel is a match", "Canada is a match")) In this example, the desired output would be: ID text 1 United States 4 Israel 5