stringr | 易学教程

Separate a String using Tidyr's “separate” into Multiple Columns and then Create a New Column with Counts

阅读更多关于 Separate a String using Tidyr's “separate” into Multiple Columns and then Create a New Column with Counts

问题 So I have the basic dataframe below which contains long strings separated by a comma.I used Tidyr's "separate" to create new columns. How do I add another new column with counts of how many new columns there are for each person that contain an answer? (no NA's). I suppose the columns can be counted after being separated, or before, by counting how many string elements there are that are separated by a comma? Any help would be appreciated. I would like to stay within the Tidyverse and dplyr.

removing everything after first 'backslash' in a string

阅读更多关于 removing everything after first 'backslash' in a string

问题 I have a vector like below vec <- c("abc\edw\www", "nmn\ggg", "rer\qqq\fdf"......) I want to remove everything after as soon as first slash is encountered, like below newvec <- c("abc","nmn","rer") Thank you. My original vector is as below (only the head) [1] "peoria ave\nste \npeoria" [2] "wood dr\nphoenix" "central ave\nphoenix" [4] "southern ave\nphoenix" [5] "happy valley rd\nste \nglendaleaz " "the americana at brand\n americana way\nglendale" Here the problem is my original csv file

Match character vector in a dataframe with another character vector and trim character

阅读更多关于 Match character vector in a dataframe with another character vector and trim character

问题 Here is a dataframe and a vector. df1 <- tibble(var1 = c("abcd", "efgh", "ijkl", "qrst")) vec <- c("abcd", "mnop", "ijkl") Now, for all the values in var1 that matches with the values in vec, keep only first 3 characters in var1 such that the desired solution is: df2 <- tibble(var1 = c("abc", "efgh", "ijk", "qrst")) Since, "abcd" matches, we keep only 3 characters i.e. "abc" in df2, but "efgh" doesn't exist in vec, so we keep it as is i.e "efgh" in df2. How can I use dplyr and/or stringr to

str_count with overlapping substrings

阅读更多关于 str_count with overlapping substrings

问题 I am trying to count the number of appearances of a substring within a character vector. For example: lookin<-c("babababa", "bellow", "ra;baba") searchfor<-"aba" str_count(lookin, searchfor) returns: 2 0 1 However, I want it to return '3 0 1' but it isn't picking up on the middle 'aba' in the first item since it is partially used in the first instance (I think). I found this question but couldn't figure out how to use that with a vector having multiple items. 回答1: Try: str_count(lookin,

stringr str_extract capture group capturing everything

阅读更多关于 stringr str_extract capture group capturing everything

问题 I'm looking to extract the year from a string. This always comes after an 'X' and before "." then a string of other characters. Using stringr 's str_extract I'm trying the following: year = str_extract(string = 'X2015.XML.Outgoing.pounds..millions.' , pattern = 'X(\\d{4})\\.') I thought the brackets would define the capture group, returning 2015 , but I actually get the complete match X2015. Am I doing this correctly? Why am i not trimming "X" and "."? 回答1: The capture group is irrelevant in

Extract last word in a string after comma if there are multiple words else the first word

阅读更多关于 Extract last word in a string after comma if there are multiple words else the first word

问题 I have data where the words as follows location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) I would like to extract the country name from the data. The tricky part is if i extract just the last word then I will have only one record (France). library(stringr) df$country<- word(df$location,-1) Any ideas on how to extract country data from this data? id location country 1 xyz, sss, New Zealand New Zealand 2 USA USA 3 Pris,France France 回答1: You

removing everything after first 'backslash' in a string

阅读更多关于 removing everything after first 'backslash' in a string

I have a vector like below vec <- c("abc\edw\www", "nmn\ggg", "rer\qqq\fdf"......) I want to remove everything after as soon as first slash is encountered, like below newvec <- c("abc","nmn","rer") Thank you. My original vector is as below (only the head) [1] "peoria ave\nste \npeoria" [2] "wood dr\nphoenix" "central ave\nphoenix" [4] "southern ave\nphoenix" [5] "happy valley rd\nste \nglendaleaz " "the americana at brand\n americana way\nglendale" Here the problem is my original csv file does not contain backslashes, but when i read it backslashes appear. Original csv file is as below [1]

What is the difference between paste/paste0 and str_c?

阅读更多关于 What is the difference between paste/paste0 and str_c?

问题 I don't seem to see a difference between paste / paste0 and str_c for combining a single vector into a single string, multiple strings into one string, or multiple vectors into a single string. While I was writing the question I found this: https://www.rdocumentation.org/packages/stringr/versions/1.3.1/topics/str_c. The community example from richie@datacamp.com says the difference is is that str_c treats blanks as blanks (not as NAs) and recycles more appropriately. Any other differences?

best way to manipulate strings in big data.table

阅读更多关于 best way to manipulate strings in big data.table

I have a 67MM row data.table with people names and surname separated by spaces. I just need to create a new column for each word. Here is an small subset of the data: n <- structure(list(Subscription_Id = c("13.855.231.846.091.000", "11.156.048.529.090.800", "24.940.584.090.830", "242.753.039.111.124", "27.843.782.090.830", "13.773.513.145.090.800", "25.691.374.090.830", "12.236.174.155.090.900", "252.027.904.121.210", "11.136.991.054.110.100" ), Account_Desc = c("AGUAYO CARLA", "LEIVA LILIANA", "FULLANA MARIA LAURA", "PETREL SERGIO", "IPTICKET SRL", "LEDESMA ORLANDO", "CATTANEO LUIS RAUL",

Match character vector in a dataframe with another character vector and trim character

阅读更多关于 Match character vector in a dataframe with another character vector and trim character

Here is a dataframe and a vector. df1 <- tibble(var1 = c("abcd", "efgh", "ijkl", "qrst")) vec <- c("abcd", "mnop", "ijkl") Now, for all the values in var1 that matches with the values in vec, keep only first 3 characters in var1 such that the desired solution is: df2 <- tibble(var1 = c("abc", "efgh", "ijk", "qrst")) Since, "abcd" matches, we keep only 3 characters i.e. "abc" in df2, but "efgh" doesn't exist in vec, so we keep it as is i.e "efgh" in df2. How can I use dplyr and/or stringr to accomplish this? You can just use %in% to check whether the strings are in the vector, and substr to