问题
For a task I am to use ConditionalProbDist using LidstoneProbDist as the estimator, adding +0.01 to the sample count for each bin.
I thought the following line of code would achieve this, but it produces a value error
fd = nltk.ConditionalProbDist(fd,nltk.probability.LidstoneProbDist,0.01)
I'm not sure how to format the arguments within ConditionalProbDist and haven't had much luck in finding out how to do so via python's help feature or google, so if anyone could set me right, it would be much appreciated!
回答1:
I found the probability tutorial on the NLTK website quite helpful as a reference.
As mentioned in the answer above, using a lambda expression is a good idea, since the ConditionalProbDist
will generate a frequency distribution (nltk.FreqDist
) on the fly that's passed through to the estimator.
A more subtle point is that passing through the bins parameter can't be done if you don't know how many bins you originally have in your input sample! However, a FreqDist
has the number of bins available as FreqDist.B()
(docs).
Instead use FreqDist
as the only parameter to your lambda:
from nltk.probability import *
# ...
# Using the given parameters of one extra bin and a gamma of 0.01
lidstone_estimator = lambda fd: LidstoneProbDist(fd, 0.01, fd.B() + 1)
conditional_pd = ConditionalProbDist(conditional_fd, lidstone_estimator)
I know this question is very old now, but I too struggled to find documentation, so I'm documenting it here in case someone else down the line runs into a similar struggle.
Good luck (with fnlp)!
回答2:
You probably don't need this anymore as the question is very old, but still, you can pass LidstoneProbDist arguments to ConditionalProbDist with the help of lambda:
estimator = lambda fdist, bins: nltk.LidstoneProbDist(fdist, 0.01, bins)
cpd = nltk.ConditionalProbDist(fd, estimator, bins)
来源:https://stackoverflow.com/questions/35869561/python-nltk-valueerror-a-lidstone-probability-distribution-must-have-at-least