Extract text using regex in R

后端 未结 5 1941
一整个雨季
一整个雨季 2021-01-25 02:15

I read the text file with below data and am trying to convert it to a dataframe

Id:   1
ASIN: 0827229534
  title: Patterns of Preaching: A Sermon Sampler
  group         


        
5条回答
  •  佛祖请我去吃肉
    2021-01-25 02:47

    This is just a start. Since im not a pro in regExp I will let others do the magic. :)

    Either you define the rules for every object and do something like this.

    ids <- do.call(rbind, regmatches(regexec(pattern = 'Id:\\s+', text = text), x = text))
    ASIN <- do.call(rbind, regmatches(regexec(pattern = 'ASIN:\\s+', text = text), x = text))
    title <- do.call(rbind, regmatches(regexec(pattern = 'title:\\s+', text = text), x = text))
    

    Or you define a general rule, which should work for every line. Something like this:

    sapply(text,  FUN = function(x) {
      regmatches(x, regexec(text = x, pattern = "([^:]+)"))
      })
    
    sapply(text,  FUN = function(x) {
      regmatches(x, regexec(text = x, pattern = "(:.*)"))
    })
    

提交回复
热议问题