data-mining | 易学教程

Difference between classification and clustering in data mining? [closed]

阅读更多关于 Difference between classification and clustering in data mining? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 6 months ago . Can someone explain what the difference is between classification and clustering in data mining? If you can, please give examples of both to understand the main idea. 回答1: In general, in classification you have a set of predefined classes and want to know which class a new object

Clustering values by their proximity in python (machine learning?) [duplicate]

阅读更多关于 Clustering values by their proximity in python (machine learning?) [duplicate]

问题 This question already has answers here : Cluster one-dimensional data optimally? [closed] (1 answer) 1D Number Array Clustering (2 answers) Closed 6 years ago . I have an algorithm that is running on a set of objects. This algorithm produces a score value that dictates the differences between the elements in the set. The sorted output is something like this: [1,1,5,6,1,5,10,22,23,23,50,51,51,52,100,112,130,500,512,600,12000,12230] If you lay these values down on a spreadsheet you see that

How do I extract keywords used in text? [closed]

阅读更多关于 How do I extract keywords used in text? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 years ago . Locked . This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions. How do I data mine a pile of text to get keywords by usage? ("Jacob Smith" or "fence") And is there a

How to optimal K in K - Means Algorithm [duplicate]

阅读更多关于 How to optimal K in K - Means Algorithm [duplicate]

问题 This question already has answers here : Closed 8 years ago . Possible Duplicate: How do I determine k when using k-means clustering? How can i choose the K initially, if i do not know about the data? Can someone help me in choosing the K. Thanks Navin 回答1: The base idea is to evaluate cluster scoring on sample data, usally it is distance inside cluster and distance between clusters. The more this measure the better clustering, based on this mesure you can select best clustring paramters. One

Finding 2 & 3 word Phrases Using R TM Package

阅读更多关于 Finding 2 & 3 word Phrases Using R TM Package

问题 I am trying to find a code that actually works to find the most frequently used two and three word phrases in R text mining package (maybe there is another package for it that I do not know). I have been trying to use the tokenizer, but seem to have no luck. If you worked on a similar situation in the past, could you post a code that is tested and actually works? Thank you so much! 回答1: You can pass in a custom tokenizing function to tm 's DocumentTermMatrix function, so if you have package

Can someone give an example of cosine similarity, in a very simple, graphical way?

阅读更多关于 Can someone give an example of cosine similarity, in a very simple, graphical way?

问题 Cosine Similarity article on Wikipedia Can you show the vectors here (in a list or something) and then do the math, and let us see how it works? I'm a beginner. 回答1: Here are two very short texts to compare: Julie loves me more than Linda loves me Jane likes me more than Julie loves me We want to know how similar these texts are, purely in terms of word counts (and ignoring word order). We begin by making a list of the words from both texts: me Julie loves Linda than more likes Jane Now we

Expectation Maximization coin toss examples

阅读更多关于 Expectation Maximization coin toss examples

问题 I've been self-studying the Expectation Maximization lately, and grabbed myself some simple examples in the process: http://cs.dartmouth.edu/~cs104/CS104_11.04.22.pdf There are 3 coins 0, 1 and 2 with P0, P1 and P2 probability landing on Head when tossed. Toss coin 0, if the result is Head, toss coin 1 three times else toss coin 2 three times. The observed data produced by coin 1 and 2 is like this: HHH, TTT, HHH, TTT, HHH. The hidden data is coin 0's result. Estimate P0, P1 and P2. http://ai

Implementing NavieBayes in C# [closed]

阅读更多关于 Implementing NavieBayes in C# [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 6 years ago . Just wondering can I implement NavieBayes algorithm in C#? I just want to calculate the precision, TP-rate, FP-rate etc using Navie Bayes algorithm in C#. I just calculate mean and standard deviation for my

“Zero frequent items” when using the eclat to mine frequent itemsets

阅读更多关于 “Zero frequent items” when using the eclat to mine frequent itemsets

问题 So I want to find patterns and "clusters" based on what items that are bought together, and according to the wiki for eclat: The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains. Though, when I use the eclat in R, i get "zero frequent items" and "NULL" when when retrieving the results through tidLists.

Ranking algorithm with missing values and bias

阅读更多关于 Ranking algorithm with missing values and bias

问题 The problem is : A set of 5 independent users where asked to rate 50 products given to them. All 50 products would have been used by the users in some point of time. Some users have more bias towards certain products. One user did not truly complete the survey and gave random values. It is not necessary for the users to rate all the products. Now given a 4 sample dataset , rank the products based on ratings datset : product #user1 #user2 #user3 #user4 #user5 0 29 - 10 90 12 1 - - - - 7 2 - -