stringr | 易学教程

How to find the first most frequent, second most frequent, …, last frequent in text?

阅读更多关于 How to find the first most frequent, second most frequent, …, last frequent in text?

问题 I'm trying to find the first most frequent, the second most frequent, ..., the last most frequent words/categories in the following text cat . library(stringr) cat <- c("AA","AA","AA","Ee","Dd","Ee","Bb","Cc","Cc","Cc") OUTPUT that I need: most1 AAA Cc most2 Ee most3 Bb Dd Can one help me in this regard? Tnx! 回答1: You can use table like: sort(table(cat), TRUE) #cat #AA Cc Ee Bb Dd # 3 3 2 1 1 And as a character vector: x <- table(cat) x <- rev(do.call(rbind, lapply(split(names(x), x), paste

r regex Lookbehind Lookahead issue

阅读更多关于 r regex Lookbehind Lookahead issue

问题 I try to extract passages like 44.11.36.00-1 (precisely, nn.nn.nn.nn-n , where n stands for any number from 0-9) from text in R. I want to extract passages if they are "sticked" to non-number marks: 44.11.36.00-1 extracted from nsfghstighsl44.11.36.00-1vsdfgh is OK 44.11.36.00-1 extracted from fa0044.11.36.00-1000 is NOT I have read that str_extract_all is not working with Lookbehind and Lookahead expressions, so I sadly came back to grep , but cannot deal with it: > pattern1 <- "(?<![0-9]{1}

Tidyverse: Replacing entire strings based on partial matches

阅读更多关于 Tidyverse: Replacing entire strings based on partial matches

问题 I'm looking to replace entire string entries within data based on partial matches using functions in the stringr package. The only method I've tried has been replacing exact matches using str_replace_all() but this becomes tedious and unwieldy when there are dozens of variations to correct for. I'm looking to replace based on partial matches. In my reprex below, I replace variants of "Spaniard" and "Colombian" by direct specification. However, I would love to perform those replacements based

Tidyverse: Replacing entire strings based on partial matches

阅读更多关于 Tidyverse: Replacing entire strings based on partial matches

Stop text labels from overlapping in ggplot2

阅读更多关于 Stop text labels from overlapping in ggplot2

问题 So I have a dataframe like so df <- structure(list(Reportable = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B" ), Location1_Description = c("MAIN/BRANCH", "MAIN/BRANCH", "YARD", "YARD", "PART", "PART", "SHOP", "SHOP", "LOT", "LOT", "HIGHWAY/ROADWAY", "HIGHWAY/ROADWAY", "OFFICE", "OFFICE" ), count = c(146L, 447L, 83L, 241L, 44L, 89L, 38L, 83L, 16L, 28L, 4L, 30L, 11L, 21L), pct = c("25%", "75%", "26%", "74%", "33%", "67%", "31%", "69%", "36%", "64%", "12%", "88%", "33%",

Stop text labels from overlapping in ggplot2

阅读更多关于 Stop text labels from overlapping in ggplot2

Stop text labels from overlapping in ggplot2

阅读更多关于 Stop text labels from overlapping in ggplot2

Sequentially replace multiple places matching single pattern in a string with different replacements

阅读更多关于 Sequentially replace multiple places matching single pattern in a string with different replacements

问题 Using stringr package, it is easy to perform regex replacement in a vectorized manner. Question: How can I do the following: Replace every word in hello,world??your,make|[]world,hello,pos to different replacements, e.g. increasing numbers 1,2??3,4|[]5,6,7 Note that simple separators cannot be assumed, the practical use case is more complicated. stringr::str_replace_all does not seem to work because it str_replace_all(x, "(\\w+)", 1:7) produces a vector for each replacement applied to all

Sequentially replace multiple places matching single pattern in a string with different replacements

阅读更多关于 Sequentially replace multiple places matching single pattern in a string with different replacements

Split Strings into values in long dataframe format [duplicate]

阅读更多关于 Split Strings into values in long dataframe format [duplicate]

问题 This question already has answers here : Split comma-separated strings in a column into separate rows (6 answers) Split delimited strings in a column and insert as new rows [duplicate] (6 answers) Closed 3 years ago . I have a dataframe that looks like the following example df which consist of a character variable VAR . df<-data.frame(ID = 1:2, VAR = c("VAL1\r\nVAL2\r\nVAL8","VAL2\r\nVAL5"), stringsAsFactors = FALSE) # ID VAR # 1 1 VAL1\r\nVAL2\r\nVAL8 # 2 2 VAL2\r\nVAL5 I would like to split