What is the meaning of “isolated symbol probabilities of English”

我只是一个虾纸丫 提交于 2019-12-22 12:53:41

问题


In a note I found this phrase:

Using isolated symbol probabilities of English language, you can find out the entropy of the language.

What is actually meant by "isolated symbol probabilities"? This is related to the entropy of an information source.


回答1:


It would be helpful to know where the note came from and what the context is, but even without that I am quite sure this simply means that they use the frequency of individual symbols (e.g. characters) as the basis for entropy, rather than for example the joint probability (of character sequences), or the conditional probability (of one particular character to follow another).

So if you have an alphabet X={a,b,c,...,z} and a probability P(a), P(b),... for each character to appear in text (e.g. based on the frequency found in a data example), you'd compute the entropy by computing -P(x) * log(P(x)) for each character x individually and then taking the sum of all. Then, obviously, you'd have used the probability of each character in isolation, rather than the probability of each character in context.

Note, however, that the term symbol in the note you found does not necessarily refer to characters. It might refer to words or other units of text. Nevertheless, the point they are making is that they apply the classical formula for entropy to probabilities of individual events (characters, words, whatever), not probabilities of complex or conditional events.



来源:https://stackoverflow.com/questions/9564979/what-is-the-meaning-of-isolated-symbol-probabilities-of-english

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!