stringr

How to str_extract percentages in R?

让人想犯罪 __ 提交于 2019-12-10 23:20:00
问题 From this string border-color:#002449;left:74.4%top;37%; I would like to make the first percentage 74.4% a variable called X and the second percentage 37% a variable called Y . I have tried to play around with this regex "^.*?(\\d+)%.*" but this takes out the % sign and only extracts the second 4 from 74.4 Any help will be appreciated. Please let me know if any further information is needed. 回答1: s <- "border-color:#002449;left:74.4%top;37%;" regmatches(s, gregexpr("\\d+(\\.\\d+){0,1}%", s))[

Replacing underscore “_” with backslash-underscore “\_” in an R string

谁说我不能喝 提交于 2019-12-10 21:13:05
问题 Q: How can I replace underscores "_" with backslash-underscores "_" in an R string? I'd prefer to use the stringr package. Also, can anyone explain why line 5 below fails to get the desired result? I was almost certain that would work. library(stringr) s <- "foo_bar_baz" str_replace_all(s, "_", 5) # [1] "foo5bar5baz" str_replace_all(s, "_", "\_") # Error: '\_' is an unrecognized escape in character string starting ""\_" str_replace_all(s, "_", "\\_") # [1] "foo_bar_baz" str_replace_all(s, "_"

Regex with non-capturing group using stringr in R

风流意气都作罢 提交于 2019-12-10 20:44:34
问题 I am trying to use non-capturing groups with the str_extract function from the stringr package. Here is an example: library(stringr) txt <- "foo" str_extract(txt,"(?:f)(o+)") This returns "foo" while i expect it to return only "oo" like in this post: https://stackoverflow.com/a/14244553/3750030 How do i use non-capturing groups in R to remove the content of the groups from the returned value while using it for matching? 回答1: When you are using regex (?:f)(o+) this won't Capture but it will

How to remove + (plus sign) from string in R?

别来无恙 提交于 2019-12-10 17:31:37
问题 Say I use gsub and want to remove the following (=,+,-) sign from the string and replace with an underscore. Can someone describe what is going on when I try to use the gsub with a plus sign (+). test<- "sandwich=bread-mustard+ketchup" # [1] "sandwich=bread-mustard+ketchup" test<-gsub("-","_",test) # [1] "sandwich=bread_mustard+ketchup" test<-gsub("=","_",test) # [1] "sandwich_bread_mustard+ketchup" test<-gsub("+","_",test) #[1] "_s_a_n_d_w_i_c_h___b_r_e_a_d___m_u_s_t_a_r_d_+_k_e_t_c_h_u_p_"

Perl regular expressions in the stringr package

断了今生、忘了曾经 提交于 2019-12-10 17:17:31
问题 The perl() function is deprecated in the latest version of stringr in favor of regex() . However, I don't seem to be able to replicate the earlier behavior. To capitalize the first letter of a vector of strings, this used to work: name <- c("jim", "john", "bill") str_replace(name, perl("^(.)"), "\\U\\1") However, this no longer works: str_replace(name, regex("^(.)"), "\\U\\1") But using base R works: gsub("^(.)", "\\U\\1", name, perl=TRUE) Is there still a way to do this with the stringr

Count the maximum of consecutive letters in a string

那年仲夏 提交于 2019-12-10 15:16:21
问题 I have this vector: vector <- c("XXXX-X-X", "---X-X-X", "--X---XX", "--X-X--X", "-X---XX-", "-X--X--X", "X-----XX", "X----X-X", "X---XX--", "XX--X---", "---X-XXX", "--X-XX-X") I want to detect the maximum of consecutive times that appears X. So, my expected vector would be: 4, 1, 2, 1,2, 1, 2, 1, 2, 2, 3, 2 回答1: In base R, we can split each vector into separate characters and then using rle find the max consecutive length for "X". sapply(strsplit(vector, ""), function(x) { inds = rle(x) max

Create new variables based upon specific values

旧城冷巷雨未停 提交于 2019-12-10 05:44:25
问题 I read up on regular expressions and Hadley Wickham's stringr and dplyr packages but can't figure out how to get this to work. I have library circulation data in a data frame, with the call number as a character variable. I'd like to take the initial capital letters and make that a new variable and the digits between the letters and period into a second new variable. Call_Num HV5822.H4 C47 Circulating Collection, 3rd Floor QE511.4 .G53 1982 Circulating Collection, 3rd Floor TL515 .M63

Extract a sample of words around a particular word using stringr in R

℡╲_俬逩灬. 提交于 2019-12-09 16:55:27
问题 I've seen a couple of similar questions posted on SO regarding this topic, but they seem to be worded improperly (example) or in a different language (example). In my scenario, I consider everything that is surrounded by white space to be a word. Emoticons, numbers, strings of letters that aren't really words, I don't care. I just want to get some context around the string that was found without having to read the entire file to figure out if it's a valid match. I tried using the following,

How do I extract appearances of a vector of strings in another vector of strings using R?

左心房为你撑大大i 提交于 2019-12-08 11:22:29
I have a vector of strings like this : strings <- tibble(string = c("apple, orange, plum, tomato", "plum, beat, pear, cactus", "centipede, toothpick, pear, fruit")) And I have a vector of fruit: fruits <- tibble(fruit =c("apple", "orange", "plum", "pear")) What I'd like is a data.frame/tibble with the original strings data.frame with a second list or character column of all the fruit contained in that original column. Something like this. strings <- tibble(string = c("apple, orange, plum, tomato", "plum, beat, pear, cactus", "centipede, toothpick, pear, fruit"), match = c("apple, orange, plum"

How to refer to multiple column names held in a variable inside a function

≯℡__Kan透↙ 提交于 2019-12-08 10:25:09
问题 This follows from Extract rows with duplicate values in two or more fields but different values in another field As suggested, I'm posting additional request separately. First code then question. library(data.table) # load the data customers <- structure(list( NAME = c("B V RAMANA ", "K KRISHNA", "B SUDARSHAN", "B ANNAPURNA ", "BIKASH BAHADUR CHITRE", "KOTLA CHENNAMMA ", "K KRISHNA", " B V RAMANA", "B ANNAPURNA", "ZAITOON BEE", "BIMAN BALAIAH", " KOTLA CHENNAMMA ", "B V RAMANA"), DOB = c("15