grepl | 易学教程

Count the frequency of strings in a dataframe R

阅读更多关于 Count the frequency of strings in a dataframe R

问题 I am wanting to count the frequencies of certain strings within a dataframe. strings <- c("pi","pie","piece","pin","pinned","post") df <- as.data.frame(strings) I would then like to count the frequency of the strings: counts <- c("pi", "in", "pie", "ie") To give me something like: string freq pi 5 in 2 pie 2 ie 2 I have experimented with grepl and table but I don't see how I can specify the strings I want to search for are. 回答1: You can use sapply() to go the counts and match every item in

Assign colors to a data frame based on shared values with a character string in R

阅读更多关于 Assign colors to a data frame based on shared values with a character string in R

问题 I'm working in R. I have many different data frames that have sample names in them and I'm trying to assign a color to each row in each data frame based on the sample names. There are many rows that have the same sample names in them, but I have messy output data so I can't sort by sample name. Here's a small example case of what I have names <- c( "TC3", "102", "172", "136", "142", "143", "AC2G" ) colors <- c( "darkorange", "forestgreen", "darkolivegreen", "darkgreen", "darksalmon",

Spliting the character into parts

阅读更多关于 Spliting the character into parts

问题 I observe the following character: l <- "mod, range1 = seq(-m, n, 0.1), range2 = seq(-2, 2, 0.1), range3 = seq(-2, 2, 0.1)" Using regular expressions in R I desire to split l in the following structure: [1] "mod" "range1 = seq(-m, n, 0.1)" [3] "range2 = seq(-2, 2, 0.1)" "range3 = seq(-2, 2, 0.1)" Unfortunetely, I didn't find a proper way to overcome the problem, yet. Anyone has an idea how is it possible to acquire such an elegeant split? 回答1: I really doubt you can do it with regular

Find rows where one column string is in another column using dplyr in R

阅读更多关于 Find rows where one column string is in another column using dplyr in R

问题 Looking to pull back rows where the value in one column exists as a string in another column (within the same row). I have a df: A <- c("cat", "dog", "boy") B <- c("cat in the cradle", "meet the parents", "boy mmets world") df <- as.data.frame(A, B) A B cat cat in the cradle dog meet the parents boy boy meets world I'm trying things like: df2 <- df %>% filter(grepl(A, B)) # doesn't work because it thinks A is the whole column vector df2 <- df %>% filter(B %in% A) # which doesn't work because

grep one pattern over multiple columns

阅读更多关于 grep one pattern over multiple columns

I'm trying to figure out a way for me to use grepl() of only one partial pattern over multiple columns with mutate() . I want to have a new column that will be TRUE or FALSE if ANY of a set of columns contains a certain string. df <- structure(list(ID = c("A1.1234567_10", "A1.1234567_20"), var1 = c("NORMAL", "NORMAL"), var2 = c("NORMAL", "NORMAL"), var3 = c("NORMAL", "NORMAL"), var4 = c("NORMAL", "NORMAL"), var5 = c("NORMAL", "NORMAL"), var6 = c("NORMAL", "NORMAL"), var7 = c("NORMAL", "ABNORMAL"), var8 = c("NORMAL", "NORMAL")), .Names = c("ID", "var1", "var2", "var3", "var4", "var5", "var6",

grep one pattern over multiple columns

阅读更多关于 grep one pattern over multiple columns

问题 I'm trying to figure out a way for me to use grepl() of only one partial pattern over multiple columns with mutate() . I want to have a new column that will be TRUE or FALSE if ANY of a set of columns contains a certain string. df <- structure(list(ID = c("A1.1234567_10", "A1.1234567_20"), var1 = c("NORMAL", "NORMAL"), var2 = c("NORMAL", "NORMAL"), var3 = c("NORMAL", "NORMAL"), var4 = c("NORMAL", "NORMAL"), var5 = c("NORMAL", "NORMAL"), var6 = c("NORMAL", "NORMAL"), var7 = c("NORMAL",

grepl across multiple, specified columns

阅读更多关于 grepl across multiple, specified columns

问题 I want to create a new column in my data frame that is either TRUE or FALSE depending on whether a term occurs in two specified columns. This is some example data: AB <- c('CHINAS PARTY CONGRESS','JAPAN-US RELATIONS','JAPAN TRIES TO') TI <- c('AMERICAN FOREIGN POLICY', 'CHINESE ATTEMPTS TO', 'BRITAIN HAS TEA') AU <- c('AUTHOR 1', 'AUTHOR 2','AUTHOR 3') M <- data.frame(AB,TI,AU) I can do it for one column, or the other, but I cannot figure out how to do it for both. In other words, I don't

Duplicating observations of a dataframe, but also replacing specific variable values in R

阅读更多关于 Duplicating observations of a dataframe, but also replacing specific variable values in R

问题 I am looking for some advice on some data restructuring. I am collecting some data using Google Forms which I download as a csv file and looks something like the following: # alpha beta option # 6 8, 9, 10, 11 apple # 9 6 pear # 1 6 apple # 3 8, 9 pear # 3 6, 8 lime # 3 1 apple # 2, 4, 7, 11 9 lime The data has two variables (alpha and beta) that each list numbers. For the majority of my data there is only one number in each variable. However, for some observations there can be two, three or

Making a character string with column names with zero values

阅读更多关于 Making a character string with column names with zero values

The 4th column is my desired column. Video,Webinar,Meeting,Conference are the 4 type of activities that the different customers(names) can engage in. You can see,in a given row, all the column names with zero value are in the final column(NextStep) and the value there(character string separated by commas) excludes the column name with non-zero value. The character strings(column names) in the final column usually appear in the column order with two exceptions. Webinar always appears first if it has a zero value and video always appears last if it has a zero value. library(data.table) dt <-

Duplicating observations of a dataframe, but also replacing specific variable values in R

阅读更多关于 Duplicating observations of a dataframe, but also replacing specific variable values in R

I am looking for some advice on some data restructuring. I am collecting some data using Google Forms which I download as a csv file and looks something like the following: # alpha beta option # 6 8, 9, 10, 11 apple # 9 6 pear # 1 6 apple # 3 8, 9 pear # 3 6, 8 lime # 3 1 apple # 2, 4, 7, 11 9 lime The data has two variables (alpha and beta) that each list numbers. For the majority of my data there is only one number in each variable. However, for some observations there can be two, three or even up to ten numbers. This is because these are responses gathered using the 'checkbox' option in