问题
As in the Dirichlet clustering, the dirichlet process can be represented by the following:
- Chinese Restaurant Process
- Stick Breaking Process
- Poly Urn Model
For instance, if we consider Chinese Restaurant Process
the process is as follows:
- Initially the restaurant is empty
- The first person to enter (Alice) sits down at a table (selects a group).
- The second person to enter (Bob) sits down at a table.
- Which table does he sit at?
- He sits down at a new table with probability
α/(1+α)
- He sits with at existing table with Alice (mean he'll join existing group)
with probability
1/(1+α)
- The (n+1)-st person sits down at a new table with probability
α/(n+α)α/(n+α)
, and at table k with probabilitynk/(n+α)nk/(n+α)
, wherenk
is the number of people currently sitting at table k.
The question is:
Initially, the first person will join, say G1 (i.e. group 1),
Now the second person will join
new group = G2 with probability α/(1+α) = P(N)
existing group = G1 with probability 1/(1+α) = P(E)
Now if I calculate the probabilities for new entry, I'll have values for both i.e. P(N)
and P(E)
. Then,
- How will I decide that new entry will join which group G1 or G2?
- Would it be decided on basis of values of both probabilities?
As,
If (P(N) > P(E))
then
_new entry_ will join G2
AND
If (P(E) > P(N))
then
_new entry_ will join G1
回答1:
Based on the CRP representation,
- customer 1 sits at table 1
- customer i, sits at pre-occupied table k with p_k and at a new table with p_new where
Note that the sum of the probabilities is equal to 1. To find the table assignment, all you have to do is toss a coin and select the relevant table.
For example for customer i, assume you have the following probability vector
which means the probability of sitting at table 1 is 0.2, table 2 is 0.4, table 3 is 0.3, and a new table is 0.1. By constructing the cumulative probability vector and drawing a random number, you can sample the table. Let's say the random number 0.81, therefore your customer sits at table 3.
来源:https://stackoverflow.com/questions/35859521/how-to-decide-group-assignments-in-dirichlet-process-clustering