text-mining

How to convert a termDocumentMatrix which I have got from text mining in R into excel or CSV file?

耗尽温柔 提交于 2020-01-06 08:50:31
问题 To be more more specific. Lets say I have a character vector "names" with the following elements: Names[1]<-"aaron, matt, patrick", Names[2]<-"jiah, ron, melissa, john, patrick" and so on......I have 22956 elements like this. I want to separate all the names and assign them a separate column in excel. How do I do this? It requires text mining. But I am not sure how to do this. Thank you. 回答1: I assume you have a list of strings elements separated by a comma, with different number of elements.

Subset a corpus by meta data?

那年仲夏 提交于 2020-01-05 10:09:33
问题 I feel like this should be easier, but I cannot figure this out. How do I filter out documents from a corpus based on metadata. To be more specific, I have a corpus of 576 documents, each of which has the tag 'Section'. Section has a number of different values such as, "News", "Editorial" and "Comment". How do i use tm_filter to filter out documents, say, that have "Editorial" and or "Comment" in this? I'm sorry I haven't provided reproducible data. I don't really know how to go about

Categories Busineesses with Text analytics in Python

混江龙づ霸主 提交于 2020-01-05 09:04:48
问题 I'm a new-bee to AI and want to perform the below exercise. Can you please suggest the way to achieve it using python: Scenario - I have list of businesses of some companies as below like: 1. AI 2. Artificial Intelligence 3. VR 4. Virtual reality 5. Mobile application 6. Desktop softwares and want to categorize them as below: Technology ---> Category 1. AI ---> Category Artificial Intelligence 2. Artificial Intelligence ---> Category Artificial Intelligence 3. VR ---> Category Virtual Reality

Treat words separated by space in the same manner

送分小仙女□ 提交于 2020-01-03 07:30:21
问题 I am trying to find the words occurring in multiple documents at the same time. Let us take an example. doc1: "this is a document about milkyway" doc2: "milky way is huge" As you can see in above 2 documents, word "milkyway" is occurring in both the docs but in the second document term "milkyway" is separated by a space and in first doc it is not. I am doing the following to get the document term matrix in R. library(tm) tmp.text <- data.frame(rbind(doc1, doc2)) tmp.corpus <- Corpus

Extract Dates in any format from Text in R

纵然是瞬间 提交于 2020-01-03 06:35:31
问题 I want to Extract Dates from the Given Text , Dates can be in any format April 10 2018, 10-04-2018 , 10/04/2018, 2018/04/10, 04.10.2018 like in other formats .... I have news data and want to extract dates from the text for example : My Friend is coming on july 10 2018 or 10/07/2018 i want to extract date from the given text Please Help Thanks in advance 回答1: we extract it using str_extract and then with anydate get the format library(anytime) library(stringr) anydate(str_extract_all(str1, "[

Why is a self trained NER-Model incompatible with the version of OpenNLP?

*爱你&永不变心* 提交于 2020-01-03 05:48:28
问题 I trained OpenNLP NER-Model to detect a new Entity but when I am using this model I encountered the following Exception: Exception in thread "main" java.lang.IllegalArgumentException: opennlp.tools.util.InvalidFormatException: Model version 1.6.0 is not supported by this (1.5.3) version of OpenNLP! I am using OpenNLP version 1.6.0 and my source code is this: String [] sentences = Fragmentation.getSentences(Document); InputStream modelIn = new FileInputStream("Models/en-ner-cvskill.bin");

Java Executor service concurrency issue

北慕城南 提交于 2020-01-03 04:25:11
问题 I am learning multi-threading in Java. I am using executor service (callable) because I need to collect my result in the end and combine them before going further. I implemented multi-threading and its throwing an error, it's a type.cast error. Just to let you know that once the multi-threading worked and after it is not working. For every thread return type is TreeMap. And there is no dependency between them. This is my implementation: class AbnerCallable implements Callable<TreeMap<String,

How can I process Chinese/ Japanese characters with R [closed]

自闭症网瘾萝莉.ら 提交于 2020-01-03 03:02:25
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 6 years ago . I would like to be able to use a tm like package to be able to split and identify non English characters (mainly Japanese/Thai/Chinese) with R. What I would like to do is convert it into some sort of matrix like

how to write output from rapidminer to a txt file?

徘徊边缘 提交于 2020-01-02 10:23:08
问题 i am using rapidminer 5.3.I took a small document which contains around three english sentences , tokenized it and filtered it with respect to the length of words.i want to write the output into a different word document.i tried using Write document utility but it is not working,it is simply writing the same original document into the new one.However when i write the output to the console,it gives me the expected answer.Something wrong with the write document utility. Here is my process READ

convert from plural to singular using R

╄→尐↘猪︶ㄣ 提交于 2020-01-02 09:19:39
问题 How to convert plural text into singular from corpus using R i am tring with "tm" package but i am not able to find any function. i have try with this function but this i can not apply to the corpus. aggregate.plurals <- function (v) { aggro_fen <- function(v, singular, plural) { if (! is.na(v[plural])) { v[singular] <- v[singular] + v[plural] v <- v[-which(names(v) == plural)] } return(v) } for (n in names(v)) { n_pl <- paste(n, 's', Sep='') v <- aggro_fen(v, n, n_pl) n_pl <- paste(n, 'es',