text-mining

Are there any R packages or published code on topic models that account for time?

对着背影说爱祢 提交于 2020-08-03 07:31:28
问题 I am trying to perform topic modeling on a data set of political speeches that spans 2 centuries, and would ideally like to use a topic model that accounts for time, such as Topics over Time (McCallum and Wang 2006) or the Dynamic Topic model (Blei and Lafferty 2006). However, given that I am not an experienced coder, the help of an R package or some sample code implementing either of these topic models would really help. Does anyone know if such packages or published code exists for R? I

Are there any R packages or published code on topic models that account for time?

那年仲夏 提交于 2020-08-03 07:29:52
问题 I am trying to perform topic modeling on a data set of political speeches that spans 2 centuries, and would ideally like to use a topic model that accounts for time, such as Topics over Time (McCallum and Wang 2006) or the Dynamic Topic model (Blei and Lafferty 2006). However, given that I am not an experienced coder, the help of an R package or some sample code implementing either of these topic models would really help. Does anyone know if such packages or published code exists for R? I

Mapping the topic of the review in R

╄→尐↘猪︶ㄣ 提交于 2020-07-18 07:59:24
问题 I have two data sets, Review Data & Topic Data Dput code of my Review Data structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved", "Sports and physical exercise need to be given importance"), class = "factor")), class = "data.frame", row.names = c(NA, -2L)) Dput code of my Topic Data structure(list(word = structure(2:1, .Label = c("canteen food", "sports and physical"), class = "factor"), Topic = structure(2:1, .Label = c("Canteen", "Sports "), class = "factor")),

How to do fuzzy pattern matching with quanteda and kwic?

放肆的年华 提交于 2020-06-27 15:08:09
问题 I have texts written by doctors and I want to be able to highlight specific words in their context (5 words before and 5 words after the word I search for in their text). Say I want to search for the word 'suicidal'. I would then use the kwic function in the quanteda package: kwic(dataset, pattern = “suicidal”, window = 5) So far, so good, but say I want to allow for the possibility of typos. In this case I want to allow for three deviating characters, with no restriction on where in the word

Dynamic topic models/topic over time in R [closed]

喜欢而已 提交于 2020-06-25 06:54:11
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . Improve this question I have a database of newspaper articles about the water policy from 1998 to 2008. I would like to see how the newspaper release changes during this period. My question is, should I use Dynamic Topic Modeling or Topic Over Time model to handle this task? Would

probabilities returned by gensim's get_document_topics method doesn't add up to one

落爺英雄遲暮 提交于 2020-06-12 05:14:26
问题 Sometimes it returns probabilities for all topics and all is fine, but sometimes it returns probabilities for just a few topics and they don't add up to one, it seems it depends on the document. Generally when it returns few topics, the probabilities add up to more or less 80%, so is it returning just the most relevant topics? Is there a way to force it to return all probabilities? Maybe I'm missing something but I can't find any documentation of the method's parameters. 回答1: I had the same

File extension renaming in R

耗尽温柔 提交于 2020-05-15 11:12:37
问题 I am just trying to change the filename extensions to .doc. I'm trying the code below but it does not work. How come? I'm using instructions from here startingDir<-"C:/Data/SCRIPTS/R/TextMining/myData" filez<-list.files(startingDir) sapply(filez,FUN=function(eachPath){ file.rename(from=eachPath,to=sub(pattern =".LOG",replacement=".DOC",eachPath)) }) The output I get is: DD17-01.LOG DD17-02.LOG DD17-03.LOG DD17-4.LOG DD17-5.LOG DD37-01.LOG DD37-02.LOG DD39-01.LOG DD39-02.LOG DD39-03.LOG FALSE

File extension renaming in R

拈花ヽ惹草 提交于 2020-05-15 11:10:51
问题 I am just trying to change the filename extensions to .doc. I'm trying the code below but it does not work. How come? I'm using instructions from here startingDir<-"C:/Data/SCRIPTS/R/TextMining/myData" filez<-list.files(startingDir) sapply(filez,FUN=function(eachPath){ file.rename(from=eachPath,to=sub(pattern =".LOG",replacement=".DOC",eachPath)) }) The output I get is: DD17-01.LOG DD17-02.LOG DD17-03.LOG DD17-4.LOG DD17-5.LOG DD37-01.LOG DD37-02.LOG DD39-01.LOG DD39-02.LOG DD39-03.LOG FALSE

File extension renaming in R

蓝咒 提交于 2020-05-15 11:10:10
问题 I am just trying to change the filename extensions to .doc. I'm trying the code below but it does not work. How come? I'm using instructions from here startingDir<-"C:/Data/SCRIPTS/R/TextMining/myData" filez<-list.files(startingDir) sapply(filez,FUN=function(eachPath){ file.rename(from=eachPath,to=sub(pattern =".LOG",replacement=".DOC",eachPath)) }) The output I get is: DD17-01.LOG DD17-02.LOG DD17-03.LOG DD17-4.LOG DD17-5.LOG DD37-01.LOG DD37-02.LOG DD39-01.LOG DD39-02.LOG DD39-03.LOG FALSE

R text mining: grouping similar words using stemDocuments in tm package

半世苍凉 提交于 2020-04-18 06:10:15
问题 I am doing text mining of around 30000 tweets, Now the problem is to make results more reliable i want to convert "synonyms" to similar words for ex. some user use words "girl", some use "girls", some use "gal". similarly "give","gave" means only one thing. same for "come,"came". some user use short-form like "plz","pls" etc. Also, "stemdocument" from tm package is not working properly. it's is converting dance to danc, table to tabl.....is there any other good package for stemming. I want to