stringr | 易学教程

How to str_extract percentages in R?

阅读更多关于 How to str_extract percentages in R?

问题 From this string border-color:#002449;left:74.4%top;37%; I would like to make the first percentage 74.4% a variable called X and the second percentage 37% a variable called Y . I have tried to play around with this regex "^.*?(\\d+)%.*" but this takes out the % sign and only extracts the second 4 from 74.4 Any help will be appreciated. Please let me know if any further information is needed. 回答1: s <- "border-color:#002449;left:74.4%top;37%;" regmatches(s, gregexpr("\\d+(\\.\\d+){0,1}%", s))[

Replacing underscore “_” with backslash-underscore “\_” in an R string

阅读更多关于 Replacing underscore “_” with backslash-underscore “\_” in an R string

问题 Q: How can I replace underscores "_" with backslash-underscores "_" in an R string? I'd prefer to use the stringr package. Also, can anyone explain why line 5 below fails to get the desired result? I was almost certain that would work. library(stringr) s <- "foo_bar_baz" str_replace_all(s, "_", 5) # [1] "foo5bar5baz" str_replace_all(s, "_", "\_") # Error: '\_' is an unrecognized escape in character string starting ""\_" str_replace_all(s, "_", "\\_") # [1] "foo_bar_baz" str_replace_all(s, "_"

Regex with non-capturing group using stringr in R

阅读更多关于 Regex with non-capturing group using stringr in R

问题 I am trying to use non-capturing groups with the str_extract function from the stringr package. Here is an example: library(stringr) txt <- "foo" str_extract(txt,"(?:f)(o+)") This returns "foo" while i expect it to return only "oo" like in this post: https://stackoverflow.com/a/14244553/3750030 How do i use non-capturing groups in R to remove the content of the groups from the returned value while using it for matching? 回答1: When you are using regex (?:f)(o+) this won't Capture but it will

How to remove + (plus sign) from string in R?

阅读更多关于 How to remove + (plus sign) from string in R?

问题 Say I use gsub and want to remove the following (=,+,-) sign from the string and replace with an underscore. Can someone describe what is going on when I try to use the gsub with a plus sign (+). test<- "sandwich=bread-mustard+ketchup" # [1] "sandwich=bread-mustard+ketchup" test<-gsub("-","_",test) # [1] "sandwich=bread_mustard+ketchup" test<-gsub("=","_",test) # [1] "sandwich_bread_mustard+ketchup" test<-gsub("+","_",test) #[1] "_s_a_n_d_w_i_c_h___b_r_e_a_d___m_u_s_t_a_r_d_+_k_e_t_c_h_u_p_"

Perl regular expressions in the stringr package

阅读更多关于 Perl regular expressions in the stringr package

问题 The perl() function is deprecated in the latest version of stringr in favor of regex() . However, I don't seem to be able to replicate the earlier behavior. To capitalize the first letter of a vector of strings, this used to work: name <- c("jim", "john", "bill") str_replace(name, perl("^(.)"), "\\U\\1") However, this no longer works: str_replace(name, regex("^(.)"), "\\U\\1") But using base R works: gsub("^(.)", "\\U\\1", name, perl=TRUE) Is there still a way to do this with the stringr

Count the maximum of consecutive letters in a string

阅读更多关于 Count the maximum of consecutive letters in a string

问题 I have this vector: vector <- c("XXXX-X-X", "---X-X-X", "--X---XX", "--X-X--X", "-X---XX-", "-X--X--X", "X-----XX", "X----X-X", "X---XX--", "XX--X---", "---X-XXX", "--X-XX-X") I want to detect the maximum of consecutive times that appears X. So, my expected vector would be: 4, 1, 2, 1,2, 1, 2, 1, 2, 2, 3, 2 回答1: In base R, we can split each vector into separate characters and then using rle find the max consecutive length for "X". sapply(strsplit(vector, ""), function(x) { inds = rle(x) max

Create new variables based upon specific values

阅读更多关于 Create new variables based upon specific values

问题 I read up on regular expressions and Hadley Wickham's stringr and dplyr packages but can't figure out how to get this to work. I have library circulation data in a data frame, with the call number as a character variable. I'd like to take the initial capital letters and make that a new variable and the digits between the letters and period into a second new variable. Call_Num HV5822.H4 C47 Circulating Collection, 3rd Floor QE511.4 .G53 1982 Circulating Collection, 3rd Floor TL515 .M63

Extract a sample of words around a particular word using stringr in R

阅读更多关于 Extract a sample of words around a particular word using stringr in R

问题 I've seen a couple of similar questions posted on SO regarding this topic, but they seem to be worded improperly (example) or in a different language (example). In my scenario, I consider everything that is surrounded by white space to be a word. Emoticons, numbers, strings of letters that aren't really words, I don't care. I just want to get some context around the string that was found without having to read the entire file to figure out if it's a valid match. I tried using the following,

How do I extract appearances of a vector of strings in another vector of strings using R?

阅读更多关于 How do I extract appearances of a vector of strings in another vector of strings using R?

I have a vector of strings like this : strings <- tibble(string = c("apple, orange, plum, tomato", "plum, beat, pear, cactus", "centipede, toothpick, pear, fruit")) And I have a vector of fruit: fruits <- tibble(fruit =c("apple", "orange", "plum", "pear")) What I'd like is a data.frame/tibble with the original strings data.frame with a second list or character column of all the fruit contained in that original column. Something like this. strings <- tibble(string = c("apple, orange, plum, tomato", "plum, beat, pear, cactus", "centipede, toothpick, pear, fruit"), match = c("apple, orange, plum"

How to refer to multiple column names held in a variable inside a function

阅读更多关于 How to refer to multiple column names held in a variable inside a function

问题 This follows from Extract rows with duplicate values in two or more fields but different values in another field As suggested, I'm posting additional request separately. First code then question. library(data.table) # load the data customers <- structure(list( NAME = c("B V RAMANA ", "K KRISHNA", "B SUDARSHAN", "B ANNAPURNA ", "BIKASH BAHADUR CHITRE", "KOTLA CHENNAMMA ", "K KRISHNA", " B V RAMANA", "B ANNAPURNA", "ZAITOON BEE", "BIMAN BALAIAH", " KOTLA CHENNAMMA ", "B V RAMANA"), DOB = c("15