stringi | 易学教程

r ngram extraction with regex

阅读更多关于 r ngram extraction with regex

问题 Karl Broman's post: https://kbroman.wordpress.com/2015/06/22/randomized-hobbit-2/ got me playing with regex and ngrams just for fun. I attempted to use regex to extract 2-grams. I know there are parsers to do this but am interested in the regex logic (i.e., it was a self challenge that I failed to meet). Below I give a minimal example and the desired output. The problem in my attempt is 2 fold: The grams (words) get eaten up and aren't available for the next pass. How can I make them

Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?

阅读更多关于 Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?

问题 What are the differences between all of these functions that seem very similar ? 回答1: stri_join , stri_c , and stri_paste come from package stringi and are pure aliases str_c comes from stringr and is just stringi::stri_join with a parameter ignore_null hardcoded to TRUE while stringi::stri_join has it set to FALSE by default. stringr::str_join is a deprecated alias for str_c see: library(stringi) identical(stri_join, stri_c) # [1] TRUE identical(stri_join, stri_paste) # [1] TRUE library

Convert HTML Entity to proper character R

阅读更多关于 Convert HTML Entity to proper character R

问题 Does anyone know of a generic function in r that can convert ä to its unicode character â ? I have seen some functions that take in â , and convert it to a normal character. Any help would be appreciated. Thanks. Edit: Below is a record of data, which I probably have over 1 million records. Is there an easier solution other than reading the data into a massive vector, and for each element, changing the records? wine/name: 1999 Domaine Robert Chevillon Nuits St. Georges 1er Cru Les Vaucrains

Regular expression to search and replace a string in a file

阅读更多关于 Regular expression to search and replace a string in a file

问题 Hi friends I am trying to search particular keywords (given in txt) in a list of files.I am using a regular expression to detect and replace the occurrence of the keyword in a file. Below is a comma separated keywords that i am passing to be searched. library(stringi) txt <- "automatically got activated,may be we download,network services,food quality is excellent" Ex "automatically got activated" should be searched and replaced by automatically_got_activated..."may be we download" replaced

Regular expression to search and replace a string in a file

阅读更多关于 Regular expression to search and replace a string in a file

pkgdown builds in Ubuntu but not Windows: argument `str` should be a character vector

阅读更多关于 pkgdown builds in Ubuntu but not Windows: argument `str` should be a character vector

问题 I've asked this similar question before. I've done more digging and made this question as minimal and reproducible as possible: First I created a new package as described here and built a site with pkgdown . This builds a site as expected: pkgdown::build_site() Initialising site ------------------------------------------------------------------ Copying 'C:/Users/name/Documents/R/win-library/3.3/pkgdown/assets/jquery.sticky-kit.min.js' Copying 'C:/Users/name/Documents/R/win-library/3.3/pkgdown

How do I extract appearances of a vector of strings in another vector of strings using R?

阅读更多关于 How do I extract appearances of a vector of strings in another vector of strings using R?

问题 I have a vector of strings like this : strings <- tibble(string = c("apple, orange, plum, tomato", "plum, beat, pear, cactus", "centipede, toothpick, pear, fruit")) And I have a vector of fruit: fruits <- tibble(fruit =c("apple", "orange", "plum", "pear")) What I'd like is a data.frame/tibble with the original strings data.frame with a second list or character column of all the fruit contained in that original column. Something like this. strings <- tibble(string = c("apple, orange, plum,

Installation of packages ‘stringr’ and ‘stringi’ had non-zero exit status

阅读更多关于 Installation of packages ‘stringr’ and ‘stringi’ had non-zero exit status

问题 Please help me to install stringr and stringi packages in R. The result is: install.packages("stringi") Installing package into ‘C:/Users/kozlovpy/Documents/R/win-library/3.2’ (as ‘lib’ is unspecified) пробую URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5.zip' Error in download.file(url, destfile, method, mode = "wb", ...) : не могу открыть URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5

gsub speed vs pattern length

阅读更多关于 gsub speed vs pattern length

问题 I've been using gsub extensively lately, and I noticed that short patterns run faster than long ones, which is not surprising. Here's a fully reproducible code: library(microbenchmark) set.seed(12345) n = 0 rpt = seq(20, 1461, 20) msecFF = numeric(length(rpt)) msecFT = numeric(length(rpt)) inp = rep("aaaaaaaaaa",15000) for (i in rpt) { n = n + 1 print(n) patt = paste(rep("a", rpt[n]), collapse = "") #time = microbenchmark(func(count[1:10000,12], patt, "b"), times = 10) timeFF = microbenchmark

Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called 'stringi' [duplicate]

阅读更多关于 Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called 'stringi' [duplicate]

问题 This question already has answers here : lib unspecified & Error in loadNamespace (4 answers) package 'stringi' does not work after updating to R3.2.1 (6 answers) Closed 2 years ago . When I use library(Hmisc) I get the following error Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called 'stringi' Error: package 'ggplot2' could not be loaded As well, if I use library(ggplot2) I get the following error Error in loadNamespace(i, c(lib.loc,