grepl

POSIX character class does not work in base R regex

怎甘沉沦 提交于 2019-12-17 06:18:11
问题 I'm having some problems matching a pattern with a string of text in R . I'm trying to get TRUE with grepl when the text is something like "lettersornumbersorspaces y lettersornumbersorspaces". I'm using the following regex : ([:alnum:]|[:blank:])+[:blank:][yY][:blank:]([:alnum:]|[:blank:])+ When using the regex as follows to obtain the "address" it works at expected. regex <- "([:alnum:]|[:blank:])+[:blank:][yY][:blank:]([:alnum:]|[:blank:])+" address <- str_extract(fulltext, regex) I see

Finding all string matches from another dataframe in R

走远了吗. 提交于 2019-12-13 18:23:01
问题 I am relatively new in R. I have a dataframe locs that has 1 variable V1 and looks like: V1 edmonton general hospital cardiovascular institute, hospital san carlos, madrid spain hospital of santa maria, lisbon, portugal and another dataframe cities that has two variables that look like this: city country edmonton canada san carlos spain los angeles united states santa maria united states tokyo japan madrid spain santa maria portugal lisbon portugal I want to create two new variables in locs

R: extract and paste keyword matches

会有一股神秘感。 提交于 2019-12-13 07:38:06
问题 I am new to R and have been struggling with this one. I want to create a new column, that checks if a set of any of words ("foo", "x", "y") exist in column 'text', then write that value in new column. I have a data frame that looks like this: a-> id text time username 1 "hello x" 10 "me" 2 "foo and y" 5 "you" 3 "nothing" 15 "everyone" 4 "x,y,foo" 0 "know" The correct output should be: a2 -> id text time username keywordtag 1 "hello x" 10 "me" x 2 "foo and y" 5 "you" foo,y 3 "nothing" 15

Removing rows containing special characters

北慕城南 提交于 2019-12-13 06:12:27
问题 I am working on filtering out a massive dataset that reads in as a list. I need to filter out special markings and am getting stuck on some of them. Here is what I currently have: library(R.utils) library(stringr) gunzip("movies.list.gz") #open file movies <- readLines("movies.list") #read lines in movies <- gsub("[\t]", '', movies) #remove tabs (\t) #movies <- gsub(, '', movies) a <- movies[!grepl("\\{", movies)] # removed any line that contained special character { b <- a[!grepl("\\(V)", a)

How to take a word and create an indicator variable based on the word's presence in comments?

喜你入骨 提交于 2019-12-12 18:01:43
问题 I have a vector of words and a a vector of comments: word.list <- c("very", "experience", "glad") comments <- c("very good experience. first time I have been and I would definitely come back.", "glad I scheduled an appointment.", "the staff have become more cordial.", "the experience i had was not good at all.", "i am very glad") I would like to create a data frame that looks like df <- data.frame(comments = c("very good experience. first time I have been and I would definitely come back.",

Extract/subset minute values from each hour

て烟熏妆下的殇ゞ 提交于 2019-12-12 09:59:45
问题 My data frame contains date values in the format YYYY-MM-DD HH-MM-SS across 125000+ rows, broken down by the minute (each row represents a single minute). 1 2018-01-01 00:04:00 2 2018-01-01 00:05:00 3 2018-01-01 00:06:00 4 2018-01-01 00:07:00 5 2018-01-01 00:08:00 6 2018-01-01 00:09:00 ... 124998 2018-03-29 05:07:00 124999 2018-03-29 05:08:00 125000 2018-03-29 05:09:00 I want to subset the data by extracting all of the minute values within any given hour and saving the results into individual

Using grepl in R to match string

十年热恋 提交于 2019-12-12 09:49:18
问题 I have a frame data "testData" as follows: id content 1 I came from China 2 I came from America 3 I came from Canada 4 I came from Japan 5 I came from Mars And I also have another frame data "addr" as follows: id addr 1 America 2 Canada 3 China 4 Japan Then how can I use grepl , sapply or any other useful function in R to generate data into as follows: id content addr 1 I came from China China 2 I came from America America 3 I came from Canada Canada 4 I came from Japan Japan 5 I came from

R — Logical grep on multiple variables within data frame

。_饼干妹妹 提交于 2019-12-12 02:49:07
问题 I am interested in performing a string search using logical grep (grepl) in R, with multiple string patterns, and would like to apply this function to several variables (columns) in my data frame. I believe that one of the apply functions is going to be well-suited to this task, but I am not entirely sure how to get it to work correctly. Please find an example (toy) included below: v.grepl <- Vectorize(grepl) pattern <- "^330|^334|^335|^343|^359|^740|^741|^742" data <- structure(list(recnum =

Regular expression parsed with grepl replacement

孤街醉人 提交于 2019-12-12 02:25:46
问题 The objective is to parse a regular expression and replace the matched pattern. Consider this example: data <- c("cat 6kg","cat g250", "cat dog","cat 10 kg") I have to locate all occurrences of cat and a number [0-9] . To do this: found <- data[grepl("(^cat.[a-z][0-9])|(^cat.[0-9])",data)] found [1] "cat 6kg" "cat g250" "cat 10 kg" The next step is to replace each element of found with string cat . I have attempted gsub , sub , and gsubfn() from package (gsubfn) according to Stack question

How can I extract from title from name in a column?

ぃ、小莉子 提交于 2019-12-11 20:29:03
问题 I have a column of names of the form "Hobs, Mr. jack" i.e. lastname, title. firstname. title has 4 types -"Mr.", "Mrs.","Miss.","Master." How can I search for each item in the column & return the title ,which I can store in another column ? Name <- c("Hobs, Mr. jack","Hobs, Master. John","Hobs, Mrs. Nicole",........) desired output - a column "title" with values - ("Mr","Master", "Mrs",.....) I have tried something like this: f <- function(d) { if (grep("Mr", d$title)) { gsub("$Mr$", "Mr", d