text-mining

How to translate syntatic parse to a dependency parse tree?

╄→гoц情女王★ 提交于 2019-12-25 04:26:51
问题 Using Link Grammar I can have the syntaxic parse of sentences something like the following: +-------------------Xp------------------+ +------->WV------->+------Ost------+ | +-----Wd----+ | +----Ds**x---+ | | +Ds**c+--Ss--+ +-PHc+---A---+ | | | | | | | | | LEFT-WALL a koala.n is.v a cute.a animal.n . +---------------------Xp--------------------+ +------->WV------>+---------Osm--------+ | +-----Wd----+ | +------Ds**x------+ | | +Ds**c+--Ss-+ +--PHc-+-----A----+ | | | | | | | | | LEFT-WALL a

Find dates in text [duplicate]

丶灬走出姿态 提交于 2019-12-25 02:59:32
问题 This question already has answers here : Javascript date regex DD/MM/YYYY (11 answers) Closed 4 years ago . I want to find Dates in a document. And return this Dates in an array. Lets suppose I have this text: On the 03/09/2015 I am swiming in a pool, that was build on the 27-03-1994 Now my code should return ['03/09/2015','27-03-1994'] or simply two Date objects in an array. My idea was to solve this problem with regex, but the method search() only returns one result and with test() I only

determine the temporality of a sentence with POS tagging

风流意气都作罢 提交于 2019-12-25 00:53:03
问题 I want to find out whether an action has been carried out if will be carried out from a series of sentences. For example: "I will prescribe this medication" versus "I prescribed this medication" or "He had already taken the stuff" versus "he may take the stuff later" I was trying a tidytext approach and decided to simply look for past participle versus future participle verbs. However when I POS tag using the only types of verbs I get are "Verb intransitive" , "Verb (usu participle)" and

How to install package tm in R-3.3.0

一曲冷凌霜 提交于 2019-12-25 00:10:56
问题 I'm using R-3.3.3. I tried to install package tm using following commands install.packages('tm',dependencies = TRUE) library('tm') But I'm getting error message Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘slam’ In addition: Warning message: package ‘tm’ was built under R version 3.3.3 Error: package or namespace load failed for ‘tm’ I saw two solutions for same type of error here & dependency ‘slam’ is not available when

Splitting and grouping plain text (grouping text by chapter in dataframe)?

强颜欢笑 提交于 2019-12-24 20:45:28
问题 I have a data frame/tibble where I've imported a file of plain text (txt). The text very consistent and is grouped by chapter. Sometimes the chapter text is only one row, sometimes it's multiple row. Data is in one column like this: # A tibble: 10,708 x 1 x <chr> 1 "Chapter 1 " 2 "Chapter text. " 3 "Chapter 2 " 4 "Chapter text. " 5 "Chapter 3 " 6 "Chapter text. " 7 "Chapter text. " 8 "Chapter 4 " I'm trying to clean the data to have a new column for Chapter and the text from each chapter in

Regular expression not working in R but works on website. Text mining

扶醉桌前 提交于 2019-12-24 20:32:40
问题 I have a regex which works on the regular expression website but doesn't work when I copy it in R. Below is the code to recreate my data frame: text <- data.frame(page = c(1,1,2,3), sen = c(1,2,1,1), text = c("Dear Mr case 1", "the value of my property is £500,000.00 and it was built in 1980", "The protected percentage is 0% for 2 years", "The interest rate is fixed for 2 years at 4.8%")) regex working on website: https://regex101.com/r/OcVN5r/2 Below is the R codes I have tried so far and

Sentiment Analysis Text Analytics in Russian / Cyrillic languages

和自甴很熟 提交于 2019-12-24 18:33:44
问题 This is an incredible resource. I cant believe how generous contributors to the platform are. I would be grateful for any advice on dealing with text analytics / Sentiment Analysis using Russian / Cyrillic languages. Syuzhet is my preferred tool - the opportunity to obtain sentiment across 8 emotions as well as negative and positive polarity is outstanding. However, i don't think it supports Cyrillic languages. Is there any alternative? 回答1: I was just trying to figure out the same thing: how

Form bigrams without stopwords in R

末鹿安然 提交于 2019-12-24 01:59:15
问题 I have some trouble with bigram in text mining using R recently. The purpose is to find the meaningful keywords in news, for example are "smart car" and "data mining". Let's say if I have a string as follows: "IBM have a great success in the computer industry for the past decades..." After removing stopwords("have","a","in","the","for"), "IBM great success computer industry past decades..." In a result, bigrams like "success computer" or "industry past" will occur. But what I really need is

Python Pandas - How to format and split a text in column ?

我是研究僧i 提交于 2019-12-24 00:13:11
问题 I have a set of strings in a dataframe like below ID TextColumn 1 This is line number one 2 I love pandas, they are so puffy 3 [This $tring is with specia| characters, yes it is!] A. I want to format this string to eliminate all the special characters B. Once formatted, I'd like to get a list of unique words (space being the only split) Here is the code I have written: get_df_by_id dataframe has one selected frame, say ID 3. #replace all special characters formatted_title = get_df_by_id[

Text classification extract tags from text

送分小仙女□ 提交于 2019-12-24 00:03:32
问题 I have a lucene index with a lot of text data, each item has a description, I want to extract the more common words from the description and generate tags to classify each item based on the description, is there a lucene.net library for doing this or any other library for text classification? 回答1: No, lucene.net can make search, index, text normalization, "find more like this" funtionalty, but not a text classification. What to suggest to you depends from your requirements. So, maybe more