stringr | 易学教程

RegEx and stringr package

阅读更多关于 RegEx and stringr package

问题 I am an R newbie and have troubles with my programming homework. The input is a poem: poem <- c( "Am Tag, an dem das L verschwand,", "da war die Luft voll Klagen.", "Den Dichtern, ach, verschlug es glatt", "ihr Singen und ihr Sagen.", "Nun gut. Sie haben sich gefasst.", "Man sieht sie wieder schreiben.", "Jedoch:", "Solang das L nicht wiederkehrt,", "muß alles Flickwerk beiben.") Now I need to extract all the capital letters and combine them into one word. I am doing this with the following

Using str_detect (or some other function) and some way to loop through a list to essentially perform a vlookup

阅读更多关于 Using str_detect (or some other function) and some way to loop through a list to essentially perform a vlookup

问题 I have been searching for a way to do this and some results on here seem similar, nothing seems to be working, nor can I find a method that will loop through a list like a vlookup in excel. I apologize if I have missed it. I am trying to add a new column to a data set with Mutate. What it is going to do is look at one column using str_replace (or some other function if necessary), and then loop through another list. I want to replace what it finds on with the corresponding value in another

Convert HTML Entity to proper character R

阅读更多关于 Convert HTML Entity to proper character R

问题 Does anyone know of a generic function in r that can convert ä to its unicode character â ? I have seen some functions that take in â , and convert it to a normal character. Any help would be appreciated. Thanks. Edit: Below is a record of data, which I probably have over 1 million records. Is there an easier solution other than reading the data into a massive vector, and for each element, changing the records? wine/name: 1999 Domaine Robert Chevillon Nuits St. Georges 1er Cru Les Vaucrains

Concatenate previous and latter words to a word that match a condition in R

阅读更多关于 Concatenate previous and latter words to a word that match a condition in R

问题 I need to concatenate the previous and the latter words of a condition meeting word. Specifically, those who match the condition of having a comma. vector <- c("Paulsen", "Kehr,", "Diego", "Schalper", "Sepúlveda,", "Diego") #I know how to get which elements meet my condition: grepl(",", vector) #[1] FALSE TRUE FALSE FALSE TRUE FALSE Desired output: print(vector_ok) #[1] "Paulsen Kehr, Diego", "Schalper Sepúlveda, Diego" Thanks in advance! 回答1: You can use grep() to get the positions of the

Extract text in parentheses in R

阅读更多关于 Extract text in parentheses in R

问题 Two related questions. I have vectors of text data such as "a(b)jk(p)" "ipq" "e(ijkl)" and want to easily separate it into a vector containing the text OUTSIDE the parentheses: "ajk" "ipq" "e" and a vector containing the text INSIDE the parentheses: "bp" "" "ijkl" Is there any easy way to do this? An added difficulty is that these can get quite large and have a large (unlimited) number of parentheses. Thus, I can't simply grab text "pre/post" the parentheses and need a smarter solution. 回答1:

Why is stringr changing encoding when manipulating strings?

阅读更多关于 Why is stringr changing encoding when manipulating strings?

问题 There is this strange behavior of stringr , which is really annoying me. stringr changes without a warning the encoding of some strings that contain exotic characters, in my case ø, å, æ, é and some others... If you str_trim a vector of characters, then those with exotic letters will be converted to a new Encoding. letter1 <- readline('Gimme an ASCII character!') # try q or a letter2 <- readline('Gimme an non-ASCII character!') # try ø or é Letters <- c(letter1, letter2) Encoding(Letters) #

R: How to ignore case when using str_detect?

阅读更多关于 R: How to ignore case when using str_detect?

问题 stringr package provides good string functions. To search for a string (ignoring case) one could use stringr::str_detect('TOYOTA subaru',ignore.case('toyota')) This works but gives warning Please use (fixed|coll|regex)(x, ignore_case = TRUE) instead of ignore.case(x) What is the right way of rewriting it? 回答1: You can use regex (or fix as @lmo's comments depending on what you need) function to make the pattern as detailed in ?modifiers or ?str_detect (see the instruction for pattern parameter

R: How to ignore case when using str_detect?

阅读更多关于 R: How to ignore case when using str_detect?

Split a character vector into individual characters? (opposite of paste or stringr::str_c)

阅读更多关于 Split a character vector into individual characters? (opposite of paste or stringr::str_c)

问题 An incredibly basic question in R yet the solution isn't clear. How to split a vector of character into its individual characters, i.e. the opposite of paste(..., sep='') or stringr::str_c() ? Anything less clunky than this: sapply(1:26, function(i) { substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,i) } ) "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z" Can it be done otherwise, e.g. with strsplit() , stringr::* or anything else? 回答1: Yes, strsplit

Creating Groups with Dplyr's “group_by” then Using Stringr to Find Differences Between Groups

阅读更多关于 Creating Groups with Dplyr's “group_by” then Using Stringr to Find Differences Between Groups

问题 Using the example below, I want to group the dataframe by CaseWorker, then Client, then determine for each Client group whether the list of tasks in "Task" is the same as the list of tasks in "Task2". I would be happy witha simple true or false, or better yet, if each task that is in "Task2" but not "Task" could be extracted and displayed in a new column or dataframe. So basically I need to make sure "Task" and "Task2" contain the same entries for each individual Client. I would like to stick