linguistics

words usage database?

旧巷老猫 提交于 2019-12-04 07:03:04
Is there any free database/place out there with commonality/usage ratios of English words? (British or U.S. English, doesn't matter) I don't care about the exact numbers, only relative to eachother. Something like: the | 0.2 car | 0.08 chroma | 0.005 overspread | 0.0000007 Edit: I have found http://en.wiktionary.org/wiki/Wiktionary%3aFrequency_lists which I can scrape for data. However I would prefer an sql-format which is easier to work with. The term you want to google is "word frequency". One of the top hits is http://www.wordfrequency.info/ 来源: https://stackoverflow.com/questions/7248834

Word Stemming in iOS - Not working for single word

岁酱吖の 提交于 2019-12-04 03:28:05
I am using NSLinguisticTagger for word stemming. I am able to get a stem words of words in a sentence, but not able to get a stem word for a single word. Following is the code I am using, NSString *stmnt = @"i waited"; NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames; NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:@[NSLinguisticTagSchemeLemma] options:options]; tagger.string = stmnt; [tagger enumerateTagsInRange:NSMakeRange(0, [stmnt length]) scheme:NSLinguisticTagSchemeLemma

English Language Dictionary api [closed]

无人久伴 提交于 2019-12-03 16:16:08
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . Is there a public API which would let me lookup definitions for words ? I've been searching for this for a bit but it's getting mixed up with the dictionary datastructure. I'm planing on using it in a c# app. Thanks 回答1: If you are language agnostic, you could try Ruby WordNet http://deveiate.org/projects/Ruby

stanford corenlp not working

ε祈祈猫儿з 提交于 2019-12-03 16:16:05
I'm using Windows 8, and running python in eclipse with pyDev. I installed Stanford coreNLP (python version) from the site: https://github.com/relwell/stanford-corenlp-python When I try to import corenlp, I get the following error message. Traceback (most recent call last): File "C:\Users\Ghantauke\workspace\PythonTest2\test.py", line 1, in <module> import corenlp File "C:\Python27\lib\site-packages\corenlp\__init__.py", line 13, in <module> from corenlp import StanfordCoreNLP, ParserError, TimeoutError, ProcessError File "C:\Python27\lib\site-packages\corenlp\corenlp.py", line 28, in <module>

Linguistic meaning of 'let' variable in programming [duplicate]

拜拜、爱过 提交于 2019-12-03 04:30:30
问题 This question already has answers here : Why was the name 'let' chosen for block-scoped variable declarations in JavaScript? (7 answers) Closed 2 years ago . So, I'm a javascript programmer and the new version of JavaScript (ES6) has a new keyword for declaring variables: let , next to the old one var . I know the difference between these two, but I was asking myself: what does let stand for? var obviously is an abbreviation of " var iable" , but let ? Is it an abbreviation as well? Where

Spacy custom tokenizer to include only hyphen words as tokens using Infix regex

风流意气都作罢 提交于 2019-12-03 03:28:30
I want to include hyphen words for example: long-term, self-esteem, etc. as a single token in Spacy. After looking at some similar posts on Stackoverflow, Github , its documentation and elsewhere , I also wrote a custom tokenizer as below. import re from spacy.tokenizer import Tokenizer prefix_re = re.compile(r'''^[\[\("']''') suffix_re = re.compile(r'''[\]\)"']$''') infix_re = re.compile(r'''[.\,\?\:\;\...\‘\’\`\“\”\"\'~]''') def custom_tokenizer(nlp): return Tokenizer(nlp.vocab, prefix_search=prefix_re.search, suffix_search=suffix_re.search, infix_finditer=infix_re.finditer, token_match=None

Where can I find a list of English phrases? [closed]

心已入冬 提交于 2019-12-02 23:34:56
I'm tasked with searching for the use of cliches and common phrases in text. The phrases are similar to the phrases you might see for the phrase puzzles on Wheel of Fortune. Here are a few examples: Easy Come Easy Go Too Good To be True Winning Isn't Everything I cannot find a list of phrases however. Does anybody know of such a list? Seriously, even a list of all Wheel of Fortune solutions would suffice. I know an answer has been accepted... but the answer is dated. Currently, wiktionary is the best place to go (~8000 entries): https://en.wiktionary.org/wiki/Category:English_idioms atp Here's

Linguistic meaning of 'let' variable in programming [duplicate]

可紊 提交于 2019-12-02 16:59:06
This question already has an answer here: Why was the name 'let' chosen for block-scoped variable declarations in JavaScript? 7 answers So, I'm a javascript programmer and the new version of JavaScript (ES6) has a new keyword for declaring variables: let , next to the old one var . I know the difference between these two, but I was asking myself: what does let stand for? var obviously is an abbreviation of " var iable" , but let ? Is it an abbreviation as well? Where does it come from? I googled this and to my amazement, I couldn't find an answer. I already knew Swift also has a let keyword

NLP: Building (small) corpora, or “Where to get lots of not-too-specialized English-language text files?”

北城余情 提交于 2019-12-01 05:29:37
Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a working prototype, and would like to incorporate more contemporary language. A recent answer here pointed indirectly to a great archive of usenet movie reviews , which hadn't occurred to me, and is very good. For this particular program technical usenet archives or programming mailing lists would tilt the results and be hard to analyze, but any kind of general blog text, or chat transcripts, or anything that may have been

Generating the plural form of a noun

拈花ヽ惹草 提交于 2019-11-30 20:28:42
Given a word, which may or may not be a singular-form noun, how would you generate its plural form? Based on this NLTK tutorial and this informal list on pluralization rules, I wrote this simple function: def plural(word): """ Converts a word to its plural form. """ if word in c.PLURALE_TANTUMS: # defective nouns, fish, deer, etc return word elif word in c.IRREGULAR_NOUNS: # foot->feet, person->people, etc return c.IRREGULAR_NOUNS[word] elif word.endswith('fe'): # wolf -> wolves return word[:-2] + 'ves' elif word.endswith('f'): # knife -> knives return word[:-1] + 'ves' elif word.endswith('o')