Where can I obtain an English dictionary with structured data? [closed]

本秂侑毒 提交于 2019-12-20 08:26:09

问题


I would like to download an English dictionary -- not just a word list -- in a structured format such as TXT, XML, or SQL.

Specifically, I need phonetic pronunciation and parts of speech (definition is not required).

Surprisingly, I can't find this online anywhere. Wiktionary is available for download, but it is only the MediaWiki articles themselves. Crawling all articles and extracting the phonetics and parts of speech would be a huge exercise.

Is this available anywhere? I don't mind paying.

Edit: a few people have asked what I would like to do. My immediate need is just curiosity, for example "what the most common two-syllable verbs?". Eventually my hope would be a tool that helps you find available domain names, and does so by pairing the correct parts of speech, with bonus points for phonetic matches.

Note: cross-posted on English Language and Usage.


回答1:


Go to http://www.speech.cs.cmu.edu/cgi-bin/cmudict and you will find the download page for the pronunciation dictionary at https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/

The latest version is currently cmudict.0.7a.

This is what I am currently using to implement the syllable counter for http://www.haikuvillage.com. It's in Ruby and I'd be happy to open source it for you if that helps.




回答2:


Parts of Speech Dictionary in the public domain with highly structured format: http://icon.shef.ac.uk/Moby/mpos.html

Each line is an entry, separated by ×, with the word value on the left and the part-of-speech value (verb, etc.) on the right. Simple text file.




回答3:


Wordnet is one of the best dictionaries i know. Perhaps you will find something there: http://wordnet.princeton.edu/wordnet/related-projects/




回答4:


Portman, while I used the SpellChecker tool from DevExpress I knew that there existed the OpenOffice dictionaries I'm pretty sure they have a well defined data structure. I recommend you to use that in combination with any free/paid text to speech tool.

Hope that helps,




回答5:


This is not a direct answer to your question, but the Double Metaphone algorithm is very good at finding word or phrase matches for search engine application servers (such as Solr and others).

I cannot tell what your intended use of this is, so I can't tell if my suggestion is useful or not. If it is close to your intended use, the Wikipedia page about Double Metaphone has a listing of about a dozen implementations of it which may be worth exploring.

http://en.wikipedia.org/wiki/Double_Metaphone



来源:https://stackoverflow.com/questions/3794454/where-can-i-obtain-an-english-dictionary-with-structured-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!