data-mining

Difference between classification and clustering in data mining? [closed]

杀马特。学长 韩版系。学妹 提交于 2019-12-17 15:04:39
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 months ago . Can someone explain what the difference is between classification and clustering in data mining? If you can, please give examples of both to understand the main idea. 回答1: In general, in classification you have a set of predefined classes and want to know which class a new object

Clustering values by their proximity in python (machine learning?) [duplicate]

霸气de小男生 提交于 2019-12-17 10:29:28
问题 This question already has answers here : Cluster one-dimensional data optimally? [closed] (1 answer) 1D Number Array Clustering (2 answers) Closed 6 years ago . I have an algorithm that is running on a set of objects. This algorithm produces a score value that dictates the differences between the elements in the set. The sorted output is something like this: [1,1,5,6,1,5,10,22,23,23,50,51,51,52,100,112,130,500,512,600,12000,12230] If you lay these values down on a spreadsheet you see that

How do I extract keywords used in text? [closed]

送分小仙女□ 提交于 2019-12-17 10:07:36
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 years ago . Locked . This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions. How do I data mine a pile of text to get keywords by usage? ("Jacob Smith" or "fence") And is there a

How to optimal K in K - Means Algorithm [duplicate]

前提是你 提交于 2019-12-17 10:00:18
问题 This question already has answers here : Closed 8 years ago . Possible Duplicate: How do I determine k when using k-means clustering? How can i choose the K initially, if i do not know about the data? Can someone help me in choosing the K. Thanks Navin 回答1: The base idea is to evaluate cluster scoring on sample data, usally it is distance inside cluster and distance between clusters. The more this measure the better clustering, based on this mesure you can select best clustring paramters. One

Finding 2 & 3 word Phrases Using R TM Package

时光毁灭记忆、已成空白 提交于 2019-12-17 06:27:53
问题 I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it that I do not know). I have been trying to use the tokenizer, but seem to have no luck. If you worked on a similar situation in the past, could you post a code that is tested and actually works? Thank you so much! 回答1: You can pass in a custom tokenizing function to tm 's DocumentTermMatrix function, so if you have package

Can someone give an example of cosine similarity, in a very simple, graphical way?

99封情书 提交于 2019-12-17 06:19:23
问题 Cosine Similarity article on Wikipedia Can you show the vectors here (in a list or something) and then do the math, and let us see how it works? I'm a beginner. 回答1: Here are two very short texts to compare: Julie loves me more than Linda loves me Jane likes me more than Julie loves me We want to know how similar these texts are, purely in terms of word counts (and ignoring word order). We begin by making a list of the words from both texts: me Julie loves Linda than more likes Jane Now we

Expectation Maximization coin toss examples

醉酒当歌 提交于 2019-12-13 11:41:41
问题 I've been self-studying the Expectation Maximization lately, and grabbed myself some simple examples in the process: http://cs.dartmouth.edu/~cs104/CS104_11.04.22.pdf There are 3 coins 0, 1 and 2 with P0, P1 and P2 probability landing on Head when tossed. Toss coin 0, if the result is Head, toss coin 1 three times else toss coin 2 three times. The observed data produced by coin 1 and 2 is like this: HHH, TTT, HHH, TTT, HHH. The hidden data is coin 0's result. Estimate P0, P1 and P2. http://ai

Implementing NavieBayes in C# [closed]

和自甴很熟 提交于 2019-12-13 10:25:59
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 6 years ago . Just wondering can I implement NavieBayes algorithm in C#? I just want to calculate the precision, TP-rate, FP-rate etc using Navie Bayes algorithm in C#. I just calculate mean and standard deviation for my

“Zero frequent items” when using the eclat to mine frequent itemsets

╄→гoц情女王★ 提交于 2019-12-13 08:03:42
问题 So I want to find patterns and "clusters" based on what items that are bought together, and according to the wiki for eclat: The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains. Though, when I use the eclat in R, i get "zero frequent items" and "NULL" when when retrieving the results through tidLists.

Ranking algorithm with missing values and bias

对着背影说爱祢 提交于 2019-12-13 05:00:12
问题 The problem is : A set of 5 independent users where asked to rate 50 products given to them. All 50 products would have been used by the users in some point of time. Some users have more bias towards certain products. One user did not truly complete the survey and gave random values. It is not necessary for the users to rate all the products. Now given a 4 sample dataset , rank the products based on ratings datset : product #user1 #user2 #user3 #user4 #user5 0 29 - 10 90 12 1 - - - - 7 2 - -