问题
How can you detect / find out the meaning (the extension) of an acronym using NLP / Information Extraction (IE) methods?
We want to detect in free text if a word or it's acronym is used and map it to the same entity / token.
Most papers available online are about medical acronyms and they do not provide a library for acomplish this task.
Any ideas?
回答1:
Reading your question and the comments I understand that you want to create a mapping from an acronym to its extension.
Assuming you have a collection of textual documents where both the acronym and its expansion occur you can apply an algorithm to extract (acronym,extension) pairs.
A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text by A.S Schwartz and M.A. Hearst, does exactly this by looking at patterns. The Java implementation is available here.
I applied this algorithm to the English Wikipedia, you can see the results here. I also applied it to a collection of Portuguese new articles, results are here.
回答2:
Wordnet contains acronym for tons of words which you can use in variety of programming languages: http://wordnet.princeton.edu/wordnet/
Or get from Freebase. See this: What is one way to find related names using the web?
来源:https://stackoverflow.com/questions/26716622/how-to-automatically-detect-acronym-meaning-extension