stringr | 易学教程

How do I extract appearances of a vector of strings in another vector of strings using R?

阅读更多关于 How do I extract appearances of a vector of strings in another vector of strings using R?

问题 I have a vector of strings like this : strings <- tibble(string = c("apple, orange, plum, tomato", "plum, beat, pear, cactus", "centipede, toothpick, pear, fruit")) And I have a vector of fruit: fruits <- tibble(fruit =c("apple", "orange", "plum", "pear")) What I'd like is a data.frame/tibble with the original strings data.frame with a second list or character column of all the fruit contained in that original column. Something like this. strings <- tibble(string = c("apple, orange, plum,

R - Recoding column with multiple text values associated with one code

阅读更多关于 R - Recoding column with multiple text values associated with one code

问题 I'm trying to recode a column to determine the shift of an employee. The data is messy and the word I am looking for must be extracted from the text. I've been trying various routes with if statements, stringr and dplyr packages, but can't figure out how to get them to work together. I have this line of code, but str_match doesn't produce a true/false value. Data$Shift <- if(str_match(Data$Unit, regex(first, ignore_case = TRUE))) { print("First Shift") } else { print("Lame") } recode is

In regex, mystery Error: assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634

阅读更多关于 In regex, mystery Error: assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634

问题 Assume 900+ company names pasted together to form a regex pattern using the pipe separator -- "firm.pat". firm.pat <- str_c(firms$firm, collapse = "|") With a data frame called "bio" that has a large character variable (250 rows each with 100+ words) named "comment", I would like to replace all the company names with blanks. Both a gsub call and a str_replace_all call return the same mysterious error. bio$comment <- gsub(pattern = firm.pat, x = bio$comment, replacement = "") Error in gsub

Installation of packages ‘stringr’ and ‘stringi’ had non-zero exit status

阅读更多关于 Installation of packages ‘stringr’ and ‘stringi’ had non-zero exit status

问题 Please help me to install stringr and stringi packages in R. The result is: install.packages("stringi") Installing package into ‘C:/Users/kozlovpy/Documents/R/win-library/3.2’ (as ‘lib’ is unspecified) пробую URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5.zip' Error in download.file(url, destfile, method, mode = "wb", ...) : не могу открыть URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5

strsplit by spaces greater than one in R

阅读更多关于 strsplit by spaces greater than one in R

问题 Given a string, mystr = "Average student score 88" I wish to split if there are more than 1 space. I wish to obtain the following: "Average student score" "88" I searched that "\s+" will split by any number of spaces. strsplit(mystr, "\\s+") But this is not what I want. Is there any option within strsplit that can split strings based on a certain number of spaces (say space = k) or a rule on spaces (say space > 1)? 回答1: You may specify it through a repetition quantifier. strsplit(mystr, "\\s

Counting whole word/number occurrences with str_count in R

阅读更多关于 Counting whole word/number occurrences with str_count in R

问题 Similar to this case, i would like to count the number of occurrences of multiple words and numbers that occur in a vector of sentences with str_count of the stringr package. But I noticed that not only whole numbers are counted but also partial numbers. For example: df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km") keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159") library(stringr) number_of_keywords_df

Why does is this end of line (\\b) not recognised as word boundary in stringr/ICU and Perl

阅读更多关于 Why does is this end of line (\\b) not recognised as word boundary in stringr/ICU and Perl

问题 Using stringr i tried to detect a € sign at the end of a string as follows: str_detect("my text €", "€\\b") # FALSE Why is this not working? It is working in the following cases: str_detect("my text a", "a\\b") # TRUE - letter instead of € grepl("€\\b", "2009in €") # TRUE - base R solution But it also fails in perl mode: grepl("€\\b", "2009in €", perl=TRUE) # FALSE So what is wrong about the €\\b -regex? The regex €$ is working in all cases... 回答1: When you use base R regex functions without

parsing html containing (non-breaking space)

阅读更多关于 parsing html containing (non-breaking space)

问题 I am using rvest to parse a website. I'm hitting a wall with these little non-breaking spaces. How does one remove the whitespace that is created by the element in a parsed html document? library("rvest") library("stringr") minimal <- html("<!doctype html><title>blah</title> <p> foo") bodytext <- minimal %>% html_node("body") %>% html_text Now I have extracted the body text: bodytext [1] " foo" However, I can't remove that pesky bit of whitespace! str_trim(bodytext) gsub(pattern = " ", "",

Delete characters before regular expression (R)

阅读更多关于 Delete characters before regular expression (R)

问题 I have a character vector of stock tickers where the ticker name is concatenated to the country in which that ticker is based in the following form: country_name/ticker_name. I am trying to split each string and delete everything from the '/' back, returning a character vector of only the ticker names. Here is an example vector: sample_string <- c('US/SPY', 'US/AOL', 'US/MTC', 'US/PHA', 'US/PZI', 'US/AOL', 'US/BRCM') My initial thought would be to use the stringr library. I don't have really

Replace 'from nth to the last' occurrence of word in string/text

阅读更多关于 Replace 'from nth to the last' occurrence of word in string/text

问题 This question has been asked previously but hasn't been answered to the asker's satisfaction. Given the following string: mystring <- "one fish two fish red fish blue fish" The following function allows to replace the nth occurrence of a word in it: replacerFn <- function(String, word, rword, n){ stopifnot(n >0) pat <- sprintf(paste0("^((.*?\\b", word, "\\b.*?){%d})\\b", word,"\\b"), n-1) rpat <- paste0("\\1", rword) if(n >1) { stringr::str_replace(String, pat, rpat) } else { stringr::str