stringr

Separate a String using Tidyr's “separate” into Multiple Columns and then Create a New Column with Counts

十年热恋 提交于 2019-12-08 10:06:55
问题 So I have the basic dataframe below which contains long strings separated by a comma.I used Tidyr's "separate" to create new columns. How do I add another new column with counts of how many new columns there are for each person that contain an answer? (no NA's). I suppose the columns can be counted after being separated, or before, by counting how many string elements there are that are separated by a comma? Any help would be appreciated. I would like to stay within the Tidyverse and dplyr.

removing everything after first 'backslash' in a string

被刻印的时光 ゝ 提交于 2019-12-08 02:53:19
问题 I have a vector like below vec <- c("abc\edw\www", "nmn\ggg", "rer\qqq\fdf"......) I want to remove everything after as soon as first slash is encountered, like below newvec <- c("abc","nmn","rer") Thank you. My original vector is as below (only the head) [1] "peoria ave\nste \npeoria" [2] "wood dr\nphoenix" "central ave\nphoenix" [4] "southern ave\nphoenix" [5] "happy valley rd\nste \nglendaleaz " "the americana at brand\n americana way\nglendale" Here the problem is my original csv file

Match character vector in a dataframe with another character vector and trim character

不羁的心 提交于 2019-12-07 12:42:22
问题 Here is a dataframe and a vector. df1 <- tibble(var1 = c("abcd", "efgh", "ijkl", "qrst")) vec <- c("abcd", "mnop", "ijkl") Now, for all the values in var1 that matches with the values in vec, keep only first 3 characters in var1 such that the desired solution is: df2 <- tibble(var1 = c("abc", "efgh", "ijk", "qrst")) Since, "abcd" matches, we keep only 3 characters i.e. "abc" in df2, but "efgh" doesn't exist in vec, so we keep it as is i.e "efgh" in df2. How can I use dplyr and/or stringr to

str_count with overlapping substrings

假如想象 提交于 2019-12-07 04:56:16
问题 I am trying to count the number of appearances of a substring within a character vector. For example: lookin<-c("babababa", "bellow", "ra;baba") searchfor<-"aba" str_count(lookin, searchfor) returns: 2 0 1 However, I want it to return '3 0 1' but it isn't picking up on the middle 'aba' in the first item since it is partially used in the first instance (I think). I found this question but couldn't figure out how to use that with a vector having multiple items. 回答1: Try: str_count(lookin,

stringr str_extract capture group capturing everything

半腔热情 提交于 2019-12-07 02:47:07
问题 I'm looking to extract the year from a string. This always comes after an 'X' and before "." then a string of other characters. Using stringr 's str_extract I'm trying the following: year = str_extract(string = 'X2015.XML.Outgoing.pounds..millions.' , pattern = 'X(\\d{4})\\.') I thought the brackets would define the capture group, returning 2015 , but I actually get the complete match X2015. Am I doing this correctly? Why am i not trimming "X" and "."? 回答1: The capture group is irrelevant in

Extract last word in a string after comma if there are multiple words else the first word

狂风中的少年 提交于 2019-12-07 02:27:02
问题 I have data where the words as follows location<- c("xyz, sss, New Zealand", "USA", "Pris,France") id<- c(1,2,3) df<-data.frame(location,id) I would like to extract the country name from the data. The tricky part is if i extract just the last word then I will have only one record (France). library(stringr) df$country<- word(df$location,-1) Any ideas on how to extract country data from this data? id location country 1 xyz, sss, New Zealand New Zealand 2 USA USA 3 Pris,France France 回答1: You

removing everything after first 'backslash' in a string

限于喜欢 提交于 2019-12-06 13:32:46
I have a vector like below vec <- c("abc\edw\www", "nmn\ggg", "rer\qqq\fdf"......) I want to remove everything after as soon as first slash is encountered, like below newvec <- c("abc","nmn","rer") Thank you. My original vector is as below (only the head) [1] "peoria ave\nste \npeoria" [2] "wood dr\nphoenix" "central ave\nphoenix" [4] "southern ave\nphoenix" [5] "happy valley rd\nste \nglendaleaz " "the americana at brand\n americana way\nglendale" Here the problem is my original csv file does not contain backslashes, but when i read it backslashes appear. Original csv file is as below [1]

What is the difference between paste/paste0 and str_c?

橙三吉。 提交于 2019-12-06 04:13:14
问题 I don't seem to see a difference between paste / paste0 and str_c for combining a single vector into a single string, multiple strings into one string, or multiple vectors into a single string. While I was writing the question I found this: https://www.rdocumentation.org/packages/stringr/versions/1.3.1/topics/str_c. The community example from richie@datacamp.com says the difference is is that str_c treats blanks as blanks (not as NAs) and recycles more appropriately. Any other differences?

best way to manipulate strings in big data.table

喜欢而已 提交于 2019-12-05 20:50:08
I have a 67MM row data.table with people names and surname separated by spaces. I just need to create a new column for each word. Here is an small subset of the data: n <- structure(list(Subscription_Id = c("13.855.231.846.091.000", "11.156.048.529.090.800", "24.940.584.090.830", "242.753.039.111.124", "27.843.782.090.830", "13.773.513.145.090.800", "25.691.374.090.830", "12.236.174.155.090.900", "252.027.904.121.210", "11.136.991.054.110.100" ), Account_Desc = c("AGUAYO CARLA", "LEIVA LILIANA", "FULLANA MARIA LAURA", "PETREL SERGIO", "IPTICKET SRL", "LEDESMA ORLANDO", "CATTANEO LUIS RAUL",

Match character vector in a dataframe with another character vector and trim character

一笑奈何 提交于 2019-12-05 18:47:35
Here is a dataframe and a vector. df1 <- tibble(var1 = c("abcd", "efgh", "ijkl", "qrst")) vec <- c("abcd", "mnop", "ijkl") Now, for all the values in var1 that matches with the values in vec, keep only first 3 characters in var1 such that the desired solution is: df2 <- tibble(var1 = c("abc", "efgh", "ijk", "qrst")) Since, "abcd" matches, we keep only 3 characters i.e. "abc" in df2, but "efgh" doesn't exist in vec, so we keep it as is i.e "efgh" in df2. How can I use dplyr and/or stringr to accomplish this? You can just use %in% to check whether the strings are in the vector, and substr to