stringr

obtaining first word in the string [duplicate]

丶灬走出姿态 提交于 2019-12-04 06:56:19
问题 This question already has answers here : Extract first word from a column and insert into new column [duplicate] (3 answers) Closed 2 years ago . I would like to extract the first string from a vector. For example, y<- c('london/hilss', 'newyork/hills', 'paris/jjk') I want to get the string before the symbol"/" i.e., location london newyork paris 回答1: A very simple approach with gsub gsub("/.*", '', y) [1] "london" "newyork" "paris" 回答2: Your example is simple, for a more general case like y<

Extract a sample of words around a particular word using stringr in R

*爱你&永不变心* 提交于 2019-12-04 06:22:12
I've seen a couple of similar questions posted on SO regarding this topic, but they seem to be worded improperly ( example ) or in a different language ( example ). In my scenario, I consider everything that is surrounded by white space to be a word. Emoticons, numbers, strings of letters that aren't really words, I don't care. I just want to get some context around the string that was found without having to read the entire file to figure out if it's a valid match. I tried using the following, but it takes awhile to run if you've got a long text file: text <- "He served both as Attorney

Extracting a number following specific text in R

主宰稳场 提交于 2019-12-04 04:12:59
问题 I have a data frame which contains a column full of text. I need to capture the number (can potentially be any number of digits from most likely 1 to 4 digits in length) that follows a certain phrase, namely 'Floor Area' or 'floor area' . My data will look something like the following: "A beautiful flat on the 3rd floor with floor area: 50 sqm and a lift" "Newbuild flat. Floor Area: 30 sq.m" "6 bed house with floor area 50 sqm, lot area 25 sqm" If I try to extract just the number or if I look

Installation of packages ‘stringr’ and ‘stringi’ had non-zero exit status

感情迁移 提交于 2019-12-04 02:07:27
Please help me to install stringr and stringi packages in R. The result is: install.packages("stringi") Installing package into ‘C:/Users/kozlovpy/Documents/R/win-library/3.2’ (as ‘lib’ is unspecified) пробую URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5.zip' Error in download.file(url, destfile, method, mode = "wb", ...) : не могу открыть URL 'https://mran.revolutionanalytics.com/snapshot/2015-08-27/bin/windows/contrib/3.2/stringi_0.5-5.zip' Вдобавок: Предупреждение: В download.file(url, destfile, method, mode = "wb", ...) :

How to extract everything until first occurrence of pattern

旧时模样 提交于 2019-12-04 01:51:05
I'm trying to use the stringr package in R to extract everything from a string up until the first occurrence of an underscore. What I've tried str_extract("L0_123_abc", ".+?(?<=_)") > "L0_" Close but no cigar. How do I get this one? Also, Ideally I'd like something that's easy to extend so that I can get the information in between the 1st and 2nd underscore and get the information after the 3rd underscore. To get L0 , you may use > library(stringr) > str_extract("L0_123_abc", "[^_]+") [1] "L0" The [^_]+ matches 1 or more chars other than _ . Also, you may split the string with _ : x <- str

Extract the last word between | |

☆樱花仙子☆ 提交于 2019-12-04 01:09:20
问题 I have the following dataset > head(names$SAMPLE_ID) [1] "Bacteria|Proteobacteria|Gammaproteobacteria|Pseudomonadales|Moraxellaceae|Acinetobacter|" [2] "Bacteria|Firmicutes|Bacilli|Bacillales|Bacillaceae|Bacillus|" [3] "Bacteria|Proteobacteria|Gammaproteobacteria|Pasteurellales|Pasteurellaceae|Haemophilus|" [4] "Bacteria|Firmicutes|Bacilli|Lactobacillales|Streptococcaceae|Streptococcus|" [5] "Bacteria|Firmicutes|Bacilli|Lactobacillales|Streptococcaceae|Streptococcus|" [6] "Bacteria|Firmicutes

Regular Expression in Base R Regex to identify email address

不羁的心 提交于 2019-12-03 18:08:09
问题 I am trying to use the stringr library to extract emails from a big, messy file. str_match doesn't allow perl=TRUE, and I can't figure out the escape characters to get it to work. Can someone recommend a relatively robust regex that would work in the context below? c("larry@gmail.com", "larry-sally@sally.com", "larry@sally.larry.com")->emails "SomeRegex"->regex str_match(emails, regex) 回答1: > "^[[:alnum:].-_]+@[[:alnum:].-]+$"->regex > str_match(emails, regex) [,1] [1,] "larry@gmail.com" [2,]

R regex gsub separate letters and numbers

倾然丶 夕夏残阳落幕 提交于 2019-12-03 13:54:30
I have a string that's mixed letters and numbers: "The sample is 22mg" I'd like to split strings where a number is immediately followed by letter like this: "The sample is 22 mg" I've tried this: gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg') but am not getting the desired results. Any suggestions? You need to use capturing parentheses in the regular expression and group references in the replacement. For example: gsub('([0-9])([[:alpha:]])', '\\1 \\2', 'This is a test 22mg') There's nothing R-specific here; the R help for regex and gsub should be of some use. You need

Detect multiple strings with dplyr and stringr

ε祈祈猫儿з 提交于 2019-12-03 12:16:23
问题 I'm trying to combine dplyr and stringr to detect multiple patterns in a dataframe. I want to use dplyr as I want to test a number of different columns. Here's some sample data: test.data <- data.frame(item = c("Apple", "Bear", "Orange", "Pear", "Two Apples")) fruit <- c("Apple", "Orange", "Pear") test.data item 1 Apple 2 Bear 3 Orange 4 Pear 5 Two Apples What I would like to use is something like: test.data <- test.data %>% mutate(is.fruit = str_detect(item, fruit)) and receive item is.fruit

R: How to ignore case when using str_detect?

匆匆过客 提交于 2019-12-03 11:34:51
stringr package provides good string functions. To search for a string (ignoring case) one could use stringr::str_detect('TOYOTA subaru',ignore.case('toyota')) This works but gives warning Please use (fixed|coll|regex)(x, ignore_case = TRUE) instead of ignore.case(x) What is the right way of rewriting it? You can use regex (or fix as @lmo's comments depending on what you need) function to make the pattern as detailed in ?modifiers or ?str_detect (see the instruction for pattern parameter) : library(stringr) str_detect('TOYOTA subaru', regex('toyota', ignore_case = T)) # [1] TRUE the search