stringr

How do I extract appearances of a vector of strings in another vector of strings using R?

╄→尐↘猪︶ㄣ 提交于 2019-12-23 03:25:16
问题 I have a vector of strings like this : strings <- tibble(string = c("apple, orange, plum, tomato", "plum, beat, pear, cactus", "centipede, toothpick, pear, fruit")) And I have a vector of fruit: fruits <- tibble(fruit =c("apple", "orange", "plum", "pear")) What I'd like is a data.frame/tibble with the original strings data.frame with a second list or character column of all the fruit contained in that original column. Something like this. strings <- tibble(string = c("apple, orange, plum,

R - Recoding column with multiple text values associated with one code

自作多情 提交于 2019-12-22 14:05:28
问题 I'm trying to recode a column to determine the shift of an employee. The data is messy and the word I am looking for must be extracted from the text. I've been trying various routes with if statements, stringr and dplyr packages, but can't figure out how to get them to work together. I have this line of code, but str_match doesn't produce a true/false value. Data$Shift <- if(str_match(Data$Unit, regex(first, ignore_case = TRUE))) { print("First Shift") } else { print("Lame") } recode is

In regex, mystery Error: assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634

余生颓废 提交于 2019-12-22 08:44:00
问题 Assume 900+ company names pasted together to form a regex pattern using the pipe separator -- "firm.pat". firm.pat <- str_c(firms$firm, collapse = "|") With a data frame called "bio" that has a large character variable (250 rows each with 100+ words) named "comment", I would like to replace all the company names with blanks. Both a gsub call and a str_replace_all call return the same mysterious error. bio$comment <- gsub(pattern = firm.pat, x = bio$comment, replacement = "") Error in gsub

Installation of packages ‘stringr’ and ‘stringi’ had non-zero exit status

╄→гoц情女王★ 提交于 2019-12-21 09:14:00
问题 Please help me to install stringr and stringi packages in R. The result is: install.packages("stringi") Installing package into ‘C:/Users/kozlovpy/Documents/R/win-library/3.2’ (as ‘lib’ is unspecified) пробую URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5.zip' Error in download.file(url, destfile, method, mode = "wb", ...) : не могу открыть URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5

strsplit by spaces greater than one in R

天涯浪子 提交于 2019-12-20 02:58:17
问题 Given a string, mystr = "Average student score 88" I wish to split if there are more than 1 space. I wish to obtain the following: "Average student score" "88" I searched that "\s+" will split by any number of spaces. strsplit(mystr, "\\s+") But this is not what I want. Is there any option within strsplit that can split strings based on a certain number of spaces (say space = k) or a rule on spaces (say space > 1)? 回答1: You may specify it through a repetition quantifier. strsplit(mystr, "\\s

Counting whole word/number occurrences with str_count in R

我的梦境 提交于 2019-12-20 02:56:38
问题 Similar to this case, i would like to count the number of occurrences of multiple words and numbers that occur in a vector of sentences with str_count of the stringr package. But I noticed that not only whole numbers are counted but also partial numbers. For example: df <- c("honda civic 1988 with new lights","toyota auris 4x4 140000 km","nissan skyline 2.0 159000 km") keywords <- c("honda","civic","toyota","auris","nissan","skyline","1988","1400","159") library(stringr) number_of_keywords_df

Why does is this end of line (\\b) not recognised as word boundary in stringr/ICU and Perl

醉酒当歌 提交于 2019-12-19 17:45:50
问题 Using stringr i tried to detect a € sign at the end of a string as follows: str_detect("my text €", "€\\b") # FALSE Why is this not working? It is working in the following cases: str_detect("my text a", "a\\b") # TRUE - letter instead of € grepl("€\\b", "2009in €") # TRUE - base R solution But it also fails in perl mode: grepl("€\\b", "2009in €", perl=TRUE) # FALSE So what is wrong about the €\\b -regex? The regex €$ is working in all cases... 回答1: When you use base R regex functions without

parsing html containing   (non-breaking space)

假装没事ソ 提交于 2019-12-18 04:16:27
问题 I am using rvest to parse a website. I'm hitting a wall with these little non-breaking spaces. How does one remove the whitespace that is created by the   element in a parsed html document? library("rvest") library("stringr") minimal <- html("<!doctype html><title>blah</title> <p> foo") bodytext <- minimal %>% html_node("body") %>% html_text Now I have extracted the body text: bodytext [1] " foo" However, I can't remove that pesky bit of whitespace! str_trim(bodytext) gsub(pattern = " ", "",

Delete characters before regular expression (R)

南楼画角 提交于 2019-12-14 03:18:22
问题 I have a character vector of stock tickers where the ticker name is concatenated to the country in which that ticker is based in the following form: country_name/ticker_name. I am trying to split each string and delete everything from the '/' back, returning a character vector of only the ticker names. Here is an example vector: sample_string <- c('US/SPY', 'US/AOL', 'US/MTC', 'US/PHA', 'US/PZI', 'US/AOL', 'US/BRCM') My initial thought would be to use the stringr library. I don't have really

Replace 'from nth to the last' occurrence of word in string/text

大城市里の小女人 提交于 2019-12-13 14:42:19
问题 This question has been asked previously but hasn't been answered to the asker's satisfaction. Given the following string: mystring <- "one fish two fish red fish blue fish" The following function allows to replace the nth occurrence of a word in it: replacerFn <- function(String, word, rword, n){ stopifnot(n >0) pat <- sprintf(paste0("^((.*?\\b", word, "\\b.*?){%d})\\b", word,"\\b"), n-1) rpat <- paste0("\\1", rword) if(n >1) { stringr::str_replace(String, pat, rpat) } else { stringr::str