问题
I would like to download an English dictionary -- not just a word list -- in a structured format such as TXT, XML, or SQL.
Specifically, I need phonetic pronunciation and parts of speech (definition is not required).
Surprisingly, I can't find this online anywhere. Wiktionary is available for download, but it is only the MediaWiki articles themselves. Crawling all articles and extracting the phonetics and parts of speech would be a huge exercise.
Is this available anywhere? I don't mind paying.
Edit: a few people have asked what I would like to do. My immediate need is just curiosity, for example "what the most common two-syllable verbs?". Eventually my hope would be a tool that helps you find available domain names, and does so by pairing the correct parts of speech, with bonus points for phonetic matches.
Note: cross-posted on English Language and Usage.
回答1:
Go to http://www.speech.cs.cmu.edu/cgi-bin/cmudict and you will find the download page for the pronunciation dictionary at https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/
The latest version is currently cmudict.0.7a.
This is what I am currently using to implement the syllable counter for http://www.haikuvillage.com. It's in Ruby and I'd be happy to open source it for you if that helps.
回答2:
Parts of Speech Dictionary in the public domain with highly structured format: http://icon.shef.ac.uk/Moby/mpos.html
Each line is an entry, separated by ×, with the word value on the left and the part-of-speech value (verb, etc.) on the right. Simple text file.
回答3:
Wordnet is one of the best dictionaries i know. Perhaps you will find something there: http://wordnet.princeton.edu/wordnet/related-projects/
回答4:
Portman, while I used the SpellChecker tool from DevExpress I knew that there existed the OpenOffice dictionaries I'm pretty sure they have a well defined data structure. I recommend you to use that in combination with any free/paid text to speech tool.
Hope that helps,
回答5:
This is not a direct answer to your question, but the Double Metaphone algorithm is very good at finding word or phrase matches for search engine application servers (such as Solr and others).
I cannot tell what your intended use of this is, so I can't tell if my suggestion is useful or not. If it is close to your intended use, the Wikipedia page about Double Metaphone has a listing of about a dozen implementations of it which may be worth exploring.
http://en.wikipedia.org/wiki/Double_Metaphone
来源:https://stackoverflow.com/questions/3794454/where-can-i-obtain-an-english-dictionary-with-structured-data