stringr

How to find the first most frequent, second most frequent, …, last frequent in text?

荒凉一梦 提交于 2021-02-08 11:15:25
问题 I'm trying to find the first most frequent, the second most frequent, ..., the last most frequent words/categories in the following text cat . library(stringr) cat <- c("AA","AA","AA","Ee","Dd","Ee","Bb","Cc","Cc","Cc") OUTPUT that I need: most1 AAA Cc most2 Ee most3 Bb Dd Can one help me in this regard? Tnx! 回答1: You can use table like: sort(table(cat), TRUE) #cat #AA Cc Ee Bb Dd # 3 3 2 1 1 And as a character vector: x <- table(cat) x <- rev(do.call(rbind, lapply(split(names(x), x), paste

r regex Lookbehind Lookahead issue

故事扮演 提交于 2021-02-08 10:06:42
问题 I try to extract passages like 44.11.36.00-1 (precisely, nn.nn.nn.nn-n , where n stands for any number from 0-9) from text in R. I want to extract passages if they are "sticked" to non-number marks: 44.11.36.00-1 extracted from nsfghstighsl44.11.36.00-1vsdfgh is OK 44.11.36.00-1 extracted from fa0044.11.36.00-1000 is NOT I have read that str_extract_all is not working with Lookbehind and Lookahead expressions, so I sadly came back to grep , but cannot deal with it: > pattern1 <- "(?<![0-9]{1}

Tidyverse: Replacing entire strings based on partial matches

為{幸葍}努か 提交于 2021-02-08 08:33:27
问题 I'm looking to replace entire string entries within data based on partial matches using functions in the stringr package. The only method I've tried has been replacing exact matches using str_replace_all() but this becomes tedious and unwieldy when there are dozens of variations to correct for. I'm looking to replace based on partial matches. In my reprex below, I replace variants of "Spaniard" and "Colombian" by direct specification. However, I would love to perform those replacements based

Tidyverse: Replacing entire strings based on partial matches

荒凉一梦 提交于 2021-02-08 08:32:13
问题 I'm looking to replace entire string entries within data based on partial matches using functions in the stringr package. The only method I've tried has been replacing exact matches using str_replace_all() but this becomes tedious and unwieldy when there are dozens of variations to correct for. I'm looking to replace based on partial matches. In my reprex below, I replace variants of "Spaniard" and "Colombian" by direct specification. However, I would love to perform those replacements based

Stop text labels from overlapping in ggplot2

和自甴很熟 提交于 2021-02-08 03:54:27
问题 So I have a dataframe like so df <- structure(list(Reportable = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" ), Location1_Description = c("MAIN/BRANCH", "MAIN/BRANCH", "YARD", "YARD", "PART", "PART", "SHOP", "SHOP", "LOT", "LOT", "HIGHWAY/ROADWAY", "HIGHWAY/ROADWAY", "OFFICE", "OFFICE" ), count = c(146L, 447L, 83L, 241L, 44L, 89L, 38L, 83L, 16L, 28L, 4L, 30L, 11L, 21L), pct = c("25%", "75%", "26%", "74%", "33%", "67%", "31%", "69%", "36%", "64%", "12%", "88%", "33%",

Stop text labels from overlapping in ggplot2

允我心安 提交于 2021-02-08 03:48:53
问题 So I have a dataframe like so df <- structure(list(Reportable = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" ), Location1_Description = c("MAIN/BRANCH", "MAIN/BRANCH", "YARD", "YARD", "PART", "PART", "SHOP", "SHOP", "LOT", "LOT", "HIGHWAY/ROADWAY", "HIGHWAY/ROADWAY", "OFFICE", "OFFICE" ), count = c(146L, 447L, 83L, 241L, 44L, 89L, 38L, 83L, 16L, 28L, 4L, 30L, 11L, 21L), pct = c("25%", "75%", "26%", "74%", "33%", "67%", "31%", "69%", "36%", "64%", "12%", "88%", "33%",

Stop text labels from overlapping in ggplot2

非 Y 不嫁゛ 提交于 2021-02-08 03:48:07
问题 So I have a dataframe like so df <- structure(list(Reportable = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" ), Location1_Description = c("MAIN/BRANCH", "MAIN/BRANCH", "YARD", "YARD", "PART", "PART", "SHOP", "SHOP", "LOT", "LOT", "HIGHWAY/ROADWAY", "HIGHWAY/ROADWAY", "OFFICE", "OFFICE" ), count = c(146L, 447L, 83L, 241L, 44L, 89L, 38L, 83L, 16L, 28L, 4L, 30L, 11L, 21L), pct = c("25%", "75%", "26%", "74%", "33%", "67%", "31%", "69%", "36%", "64%", "12%", "88%", "33%",

Sequentially replace multiple places matching single pattern in a string with different replacements

不问归期 提交于 2021-02-07 07:12:16
问题 Using stringr package, it is easy to perform regex replacement in a vectorized manner. Question: How can I do the following: Replace every word in hello,world??your,make|[]world,hello,pos to different replacements, e.g. increasing numbers 1,2??3,4|[]5,6,7 Note that simple separators cannot be assumed, the practical use case is more complicated. stringr::str_replace_all does not seem to work because it str_replace_all(x, "(\\w+)", 1:7) produces a vector for each replacement applied to all

Sequentially replace multiple places matching single pattern in a string with different replacements

爷,独闯天下 提交于 2021-02-07 07:10:00
问题 Using stringr package, it is easy to perform regex replacement in a vectorized manner. Question: How can I do the following: Replace every word in hello,world??your,make|[]world,hello,pos to different replacements, e.g. increasing numbers 1,2??3,4|[]5,6,7 Note that simple separators cannot be assumed, the practical use case is more complicated. stringr::str_replace_all does not seem to work because it str_replace_all(x, "(\\w+)", 1:7) produces a vector for each replacement applied to all

Split Strings into values in long dataframe format [duplicate]

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-05 09:42:02
问题 This question already has answers here : Split comma-separated strings in a column into separate rows (6 answers) Split delimited strings in a column and insert as new rows [duplicate] (6 answers) Closed 3 years ago . I have a dataframe that looks like the following example df which consist of a character variable VAR . df<-data.frame(ID = 1:2, VAR = c("VAL1\r\nVAL2\r\nVAL8","VAL2\r\nVAL5"), stringsAsFactors = FALSE) # ID VAR # 1 1 VAL1\r\nVAL2\r\nVAL8 # 2 2 VAL2\r\nVAL5 I would like to split