Breaking a paragraph into a vector of sentences in R

问题

I have the following paragraph:

Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)

for the purpose of applying the calculate_total_presence_sentiment command from theRSentiment package I would like to break this paragraph into a vector of sentences as follows:

[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said."                                                             
[6] ":-)"

Would appreciate your help on this.

回答1:

qdap has a convenient function for this:

sent_detect_nlp - Detect and split sentences on endmark boundaries using openNLP & NLP utilities which matches the onld version of the openNLP package's now removed sentDetect function.

library(qdap)

txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"

sent_detect_nlp(txt)
#[1] "Well, um...such a personal topic."                                       
#[2] "No wonder I am the first to write a review."                             
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
#[4] "And I had, well, major problems in this area and now I don't."           
#[5] "'Nuff said."                                                             
#[6] ":-)"

回答2:

Dirty Solution

    > data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
    > ?"regular expression"
    > strsplit(data, "(?<=[^.][.][^.])", perl=TRUE)
    [[1]]
   [1] "Well, um...such a personal topic. "                                       
   [2] "No wonder I am the first to write a review. "                             
   [3] "Suffice to say this stuff does just what they claim and tastes pleasant. "
   [4] "And I had, well, major problems in this area and now I don't. "           
   [5] "'Nuff said. "                                                             
   [6] ":-)"

Use tools from https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

回答3:

You can save your text in a .txt file. Make sure that each line in the .txt file contains one statement that would like to be read as a vector. Use the base function readLines('filepath/filename.txt'). The resulting data frame will read each line In the original text file as a vector.

> mylines <- readLines('text.txt')
Warning message:
In readLines("text.txt") : incomplete final line found on 'text.txt'
> mylines
[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes
pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said'."                                                            
[6] ":-)"

> mylines[3]
[1] "Suffice to say this stuff does just what they claim and tastes
pleasant."

来源：https://stackoverflow.com/questions/40479496/breaking-a-paragraph-into-a-vector-of-sentences-in-r

标签

text-mining