How to decide group assignments in Dirichlet process clustering

旧巷老猫 提交于 2019-12-13 01:43:26

问题


As in the Dirichlet clustering, the dirichlet process can be represented by the following:

  • Chinese Restaurant Process
  • Stick Breaking Process
  • Poly Urn Model

For instance, if we consider Chinese Restaurant Process the process is as follows:

  • Initially the restaurant is empty
  • The first person to enter (Alice) sits down at a table (selects a group).
  • The second person to enter (Bob) sits down at a table.
  • Which table does he sit at?
  • He sits down at a new table with probability α/(1+α)
  • He sits with at existing table with Alice (mean he'll join existing group) with probability 1/(1+α)
  • The (n+1)-st person sits down at a new table with probability α/(n+α)α/(n+α), and at table k with probability nk/(n+α)nk/(n+α), where nk is the number of people currently sitting at table k.

The question is:

Initially, the first person will join, say G1 (i.e. group 1),
Now the second person will join

new group      = G2 with probability α/(1+α) = P(N)  
existing group = G1 with probability 1/(1+α) = P(E)

Now if I calculate the probabilities for new entry, I'll have values for both i.e. P(N) and P(E). Then,

  • How will I decide that new entry will join which group G1 or G2?
  • Would it be decided on basis of values of both probabilities?

As,

If (P(N) > P(E))  
then  
   _new entry_ will join G2    
AND  
If (P(E) > P(N))  
then
_new entry_ will join G1  

回答1:


Based on the CRP representation,

  • customer 1 sits at table 1
  • customer i, sits at pre-occupied table k with p_k and at a new table with p_new where


Note that the sum of the probabilities is equal to 1. To find the table assignment, all you have to do is toss a coin and select the relevant table.

For example for customer i, assume you have the following probability vector

which means the probability of sitting at table 1 is 0.2, table 2 is 0.4, table 3 is 0.3, and a new table is 0.1. By constructing the cumulative probability vector and drawing a random number, you can sample the table. Let's say the random number 0.81, therefore your customer sits at table 3.



来源:https://stackoverflow.com/questions/35859521/how-to-decide-group-assignments-in-dirichlet-process-clustering

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!