问题
I am using WordNet, accessed through Python's NLTK to compare the synsets of words from social media. Many of those words aren't in the version of WordNet that NLTK connects to.
When I say I words I mean domain-specific terms, not abbreviations or emoticons.
I've compiled a list of these words and would like to merge that list with WordNet.
Searching for prior efforts turns up on attempts to develop methods of automatically updating WordNet.
The steps I imagine are:
- Clone the WordNet db
- Write an extension of the WordNet module that looks for a local copy
- Update that local copy.
How reasonable does this sound?
回答1:
I haven't changed WordNet myself yet, but I had good experiences working with the Multilingual Central Repository, and I believe you should be able to do what you want using that.
It contains the data files for WordNet 3.0 in several languages including English, which have been tied to each other through so-called Inter-Lingual Indexes (ILI). The data files can be loaded into a MySQL or PostgreSQL database tables, from which point it should be relatively easy not just to query it using SQL commands, but to insert new items, maintaining correspondence between tables. You can of course export the changed database as well, e.g. into CSV files, if using SQL is not enough for your purposes.
来源:https://stackoverflow.com/questions/20749730/add-words-to-a-local-copy-of-wordnet