mining | 易学教程

Hashset handling to avoid stuck in loop during iteration

阅读更多关于 Hashset handling to avoid stuck in loop during iteration

问题 I'm working on image mining project, and I used Hashset instead of array to avoid adding duplicate urls while gathering urls, I reached to the point of code to iterate the Hashset that contains the main urls and within the iteration I go and download the the page of the main URL and add them to the Hashet, and go on , and during iteration I should exclude every scanned url, and also exclude ( remove ) every url that end with jpg, this until the Hashet of url count reaches 0, the question is

Converting a Document Term Matrix into a Matrix with lots of data causes overflow

阅读更多关于 Converting a Document Term Matrix into a Matrix with lots of data causes overflow

Let's do some Text Mining Here I stand with a document term matrix (from the tm Package) dtm <- TermDocumentMatrix( myCorpus, control = list( weight = weightTfIdf, tolower=TRUE, removeNumbers = TRUE, minWordLength = 2, removePunctuation = TRUE, stopwords=stopwords("german") )) When I do a typeof(dtm) I see that it is a "list" and the structure looks like Docs Terms 1 2 ... lorem 0 0 ... ipsum 0 0 ... ... ....... So I try a wordMatrix = as.data.frame( t(as.matrix( dtm )) ) That works for 1000 Documents. But when I try to use 40000 it doesn't anymore. I get this error: Fehler in vector(typeof(x

Converting a Document Term Matrix into a Matrix with lots of data causes overflow

阅读更多关于 Converting a Document Term Matrix into a Matrix with lots of data causes overflow

问题 Let's do some Text Mining Here I stand with a document term matrix (from the tm Package) dtm <- TermDocumentMatrix( myCorpus, control = list( weight = weightTfIdf, tolower=TRUE, removeNumbers = TRUE, minWordLength = 2, removePunctuation = TRUE, stopwords=stopwords("german") )) When I do a typeof(dtm) I see that it is a "list" and the structure looks like Docs Terms 1 2 ... lorem 0 0 ... ipsum 0 0 ... ... ....... So I try a wordMatrix = as.data.frame( t(as.matrix( dtm )) ) That works for 1000