问题
I have the following paragraph:
Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)
for the purpose of applying the calculate_total_presence_sentiment
command from theRSentiment
package I would like to break this paragraph into a vector of sentences as follows:
[1] "Well, um...such a personal topic."
[2] "No wonder I am the first to write a review."
[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
[4] "And I had, well, major problems in this area and now I don't."
[5] "'Nuff said."
[6] ":-)"
Would appreciate your help on this.
回答1:
qdap
has a convenient function for this:
sent_detect_nlp - Detect and split sentences on endmark boundaries using openNLP & NLP utilities which matches the onld version of the openNLP package's now removed
sentDetect
function.
library(qdap)
txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
sent_detect_nlp(txt)
#[1] "Well, um...such a personal topic."
#[2] "No wonder I am the first to write a review."
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
#[4] "And I had, well, major problems in this area and now I don't."
#[5] "'Nuff said."
#[6] ":-)"
回答2:
Dirty Solution
> data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
> ?"regular expression"
> strsplit(data, "(?<=[^.][.][^.])", perl=TRUE)
[[1]]
[1] "Well, um...such a personal topic. "
[2] "No wonder I am the first to write a review. "
[3] "Suffice to say this stuff does just what they claim and tastes pleasant. "
[4] "And I had, well, major problems in this area and now I don't. "
[5] "'Nuff said. "
[6] ":-)"
Use tools from https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
回答3:
You can save your text in a .txt file. Make sure that each line in the .txt file contains one statement that would like to be read as a vector.
Use the base function readLines('filepath/filename.txt')
.
The resulting data frame will read each line In the original text file as a vector.
> mylines <- readLines('text.txt')
Warning message:
In readLines("text.txt") : incomplete final line found on 'text.txt'
> mylines
[1] "Well, um...such a personal topic."
[2] "No wonder I am the first to write a review."
[3] "Suffice to say this stuff does just what they claim and tastes
pleasant."
[4] "And I had, well, major problems in this area and now I don't."
[5] "'Nuff said'."
[6] ":-)"
> mylines[3]
[1] "Suffice to say this stuff does just what they claim and tastes
pleasant."
来源:https://stackoverflow.com/questions/40479496/breaking-a-paragraph-into-a-vector-of-sentences-in-r