stringr | 易学教程

Detect multiple strings with dplyr and stringr

阅读更多关于 Detect multiple strings with dplyr and stringr

I'm trying to combine dplyr and stringr to detect multiple patterns in a dataframe. I want to use dplyr as I want to test a number of different columns. Here's some sample data: test.data <- data.frame(item = c("Apple", "Bear", "Orange", "Pear", "Two Apples")) fruit <- c("Apple", "Orange", "Pear") test.data item 1 Apple 2 Bear 3 Orange 4 Pear 5 Two Apples What I would like to use is something like: test.data <- test.data %>% mutate(is.fruit = str_detect(item, fruit)) and receive item is.fruit 1 Apple 1 2 Bear 0 3 Orange 1 4 Pear 1 5 Two Apples 1 A very simple test works > str_detect("Apple",

Extract character before and after “/”

阅读更多关于 Extract character before and after “/”

问题 I'm trying to extract character before and after "/" with no success. Sentences are: XXXX YYY ZZZ - AV HAHEHRS, 3061 - SDDW ASDA DDSF - SAO JOSE DOS CAMPOS / SP - CEP: 00000-000 Output should be SAO JOSE DOS CAMPOS / SP I'm trying str_extract(str, "- [a-zA-Z]{1,} / [a-zA-Z]{1,}") but it's just bringing me CAMPOS / SP 回答1: In your regex there is the space missing. Try: str_extract(str, "- [a-zA-Z ]+ / [a-zA-Z ]+") Note the space in the character class. Also, {1,} is the long form of + . The

Search for multiple values in a column in R

阅读更多关于 Search for multiple values in a column in R

问题 I have a data frame with two columns: df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue","the dog is black; the bird is yellow and blue", "the bird is blue"), stringsAsFactors = F) I'd need the sum of the occurrences of all the "animals" on each row in the entire "sentences" column. For example: "animals" first row c("cat; dog; bird") = sum_occurrences_sentences_column (cat = 1) + (dog = 2) +

Remove a list of whole words that may contain special chars from a character vector without matching parts of words

阅读更多关于 Remove a list of whole words that may contain special chars from a character vector without matching parts of words

问题 I have a list of words in R as shown below: myList <- c("at","ax","CL","OZ","Gm","Kg","C100","-1.00") And I want to remove the words which are found in the above list from the text as below: myText <- "This is at Sample ax Text, which CL is OZ better and cleaned Gm, where C100 is not equal to -1.00. This is messy text Kg." After removing the unwanted myList words, the myText should look like: This is at Sample Text, which is better and cleaned, where is not equal to. This is messy text. I was

Extracting a string between other two strings in R

阅读更多关于 Extracting a string between other two strings in R

I am trying to find a simple way to extract an unknown substring (could be anything) that appear between two known substrings. For example, I have a string: a<-" anything goes here, STR1 GET_ME STR2, anything goes here" I need to extract the string GET_ME which is between STR1 and STR2 (without the white spaces). I am trying str_extract(a, "STR1 (.+) STR2") , but I am getting the entire match [1] "STR1 GET_ME STR2" I can of course strip the known strings, to isolate the substring I need, but I think there should be a cleaner way to do it by using a correct regular expression. You may use str

Search for multiple values in a column in R

阅读更多关于 Search for multiple values in a column in R

I have a data frame with two columns: df = data.frame(animals = c("cat; dog; bird", "dog; bird", "bird"), sentences = c("the cat is brown; the dog is barking; the bird is green and blue","the dog is black; the bird is yellow and blue", "the bird is blue"), stringsAsFactors = F) I'd need the sum of the occurrences of all the "animals" on each row in the entire "sentences" column. For example: "animals" first row c("cat; dog; bird") = sum_occurrences_sentences_column (cat = 1) + (dog = 2) + (bird = 3) = 6 . The result will be a third column like this: df <- cbind( sum_accurrences_sentences

Removing characters after a EURO symbol in R

阅读更多关于 Removing characters after a EURO symbol in R

问题 I have a euro symbol saved in "euro" variable: euro <- "\u20AC" euro #[1] "€" And "eurosearch" variable contains "services as defined in this SOW at a price of € 15,896.80 (if executed fro" . eurosearch [1] "services as defined in this SOW at a price of € 15,896.80 (if executed fro" I want the characters after the Euro symbol which is "15,896.80 (if executed fro" I am using this code: gsub("^.*[euro]","",eurosearch) But I'm getting empty result. How can I obtain the expected output? 回答1: You

stringr function to concatenate vector of words separated by comma with “and” before last word

阅读更多关于 stringr function to concatenate vector of words separated by comma with “and” before last word

问题 I know I can easily write one, but does anyone know if stringr (or stringi) already has a function that concatenates a vector of one or more words separated by commas, but with an "and" before the last word? 回答1: You can use the knitr::combine_words function knitr::combine_words(letters[1:2]) # [1] "a and b" knitr::combine_words(letters[1:3]) # [1] "a, b, and c" knitr::combine_words(letters[1:4]) # [1] "a, b, c, and d" 回答2: Here's another solution : enum <- function(x) paste(c(head(x,-2),

strsplit by spaces greater than one in R

阅读更多关于 strsplit by spaces greater than one in R

Given a string, mystr = "Average student score 88" I wish to split if there are more than 1 space. I wish to obtain the following: "Average student score" "88" I searched that "\s+" will split by any number of spaces. strsplit(mystr, "\\s+") But this is not what I want. Is there any option within strsplit that can split strings based on a certain number of spaces (say space = k) or a rule on spaces (say space > 1)? Avinash Raj You may specify it through a repetition quantifier. strsplit(mystr, "\\s{2,}") \\s{2,} regex should match two or more spaces. 来源： https://stackoverflow.com/questions

Substring extraction from vector in R

阅读更多关于 Substring extraction from vector in R

问题 I am trying to extract substrings from a unstructured text. For example, assume a vector of country names: countries <- c("United States", "Israel", "Canada") How do I go about passing this vector of character values to extract exact matches from unstructured text. text.df <- data.frame(ID = c(1:5), text = c("United States is a match", "Not a match", "Not a match", "Israel is a match", "Canada is a match")) In this example, the desired output would be: ID text 1 United States 4 Israel 5