language-detection

Language detection with data in PostgreSQL

末鹿安然 提交于 2019-11-29 14:39:53
问题 I have a table in PostgreSQL where a column is a text. I need a library or tool that can identify the language of each text for a test purpose. There is no need for a PostgreSQL code because I'm having problems to install languages, but any language that can connect to the database, retrieve the texts and identify it arewelcome. I used Lingua::Identify suggested in the answers right in the Perl script, it worked, but the results are not precise. The texts I want to identify comes from the web

Automatically selecting country and language for user in Java Servlet

寵の児 提交于 2019-11-29 14:05:08
I have to detect user country and language automatically in Java Servlet using request details (IP address, browser information etc.). Is it possible to detect these settings for the most of users (~90%)? Paweł Dyda Detecting the language Detecting the correct language is easy. Web browsers tend to send AcceptLanguage header and Java Servlet API is so nice to actually convert it contents to Locale object(s). All you would have to do, is just access this information and implement fall-back mechanism. To do that you actually need a list of Locales your application is going to support (you could

Testing for Japanese/Chinese Characters in a string

廉价感情. 提交于 2019-11-29 07:17:57
I have a program that reads a bunch of text and analyzes it. The text may be in any language, but I need to test for japanese and chinese specifically to analyze them a different way. I have read that I can test each character on it's unicode number to find out if it is in the range of CJK characters. This is helpful, however I would like to separate them if possible to process the text against different dictionaries. Is there a way to test if a character is Japanese OR Chinese? You won't be able to test a single character to tell with certainty that it is Japanese or Chinese because of the

What is the best language detect library or web api available? [even paid] [closed]

妖精的绣舞 提交于 2019-11-29 04:39:54
First of all, i have lot of text available. Let's say, i have 10000 characters for each try. The script is php based, but i can use whatever i want. C++, java, no problem. The google language api can't be used: their usage limits are to low. I'ts 6 hours that i try to come out with anything great, but none for now. Can someone point me to my best chance? There is Language Detection API which provides both free and premium service. It accepts text through GET or POST and provides JSON output with scores. lisak Java based tools are : Apache Tika : not "all" language profiles, but you can add

Detecting whether or not text is English (in bulk)

╄→гoц情女王★ 提交于 2019-11-29 02:08:25
I'm looking for a simple way to detect whether a short excerpt of text, a few sentences, is English or not. Seems to me that this problem is much easier than trying to detect an arbitrary language. Is there any software out there that can do this? I'm writing in python, and would prefer a python library, but something else would be fine too. I've tried google, but then realized the TOS didn't allow automated queries. HyLian I read a method to detect English language by using Trigrams You can go over the text, and try to detect the most used trigrams in the words. If the most used ones match

How to detect language of text?

孤人 提交于 2019-11-28 21:29:41
I have a form which lets users input text snippets. So how can figure out the language of the entered text? Specifically these languages for now: Arabic: هذه هي بعض النصوص العربية Chinese: 这是一些阿拉伯文字 Japanese: これは、いくつかのアラビア語のテキストです [Edit] The detection has work on text which is retrieved via an API too (no browser involved) You can figure out whether the characters are from the Arabic, Chinese, or Japanese sections of the Unicode map. If you look at the list on Wikipedia , you'll see that each of those languages has many sections of the map. But you're not doing translation, so you don't need

Detect language of text [duplicate]

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-28 09:02:46
This question already has an answer here: How to detect the language of a string? 9 answers Is there any C# library which can detect the language of a particular piece of text? i.e. for an input text "This is a sentence" , it should detect the language as "English" . Or for "Esto es una sentencia" it should detect the language as "Spanish" . I understand that language detection from text is not a deterministic problem. But both Google Translate and Bing Translator have an "Auto detect" option, which best-guesses the input language. Is there something similar available publicly, preferably in C

Automatically selecting country and language for user in Java Servlet

北慕城南 提交于 2019-11-28 07:37:24
问题 I have to detect user country and language automatically in Java Servlet using request details (IP address, browser information etc.). Is it possible to detect these settings for the most of users (~90%)? 回答1: Detecting the language Detecting the correct language is easy. Web browsers tend to send AcceptLanguage header and Java Servlet API is so nice to actually convert it contents to Locale object(s). All you would have to do, is just access this information and implement fall-back mechanism

PHP: How do I detect if an input string is Arabic

夙愿已清 提交于 2019-11-28 05:59:09
Is there a way to detect the language of the data being entered via the input field? The Surrican hmm i may offer an improved version of DimaKrasun's function: functoin is_arabic($string) { if($string === 'arabic') { return true; } return false; } okay, enough joking! Pekkas suggestion to use the google translate api is a good one! but you are relying on an external service which is always more complicated etc. i think Rushyos approch is good! its just not that easy. i wrote the following function for you but its not tested, but it should work... <? function uniord($u) { // i just copied this

(human) Language of a document

百般思念 提交于 2019-11-28 04:36:25
问题 Is there a way (a program, a library) to approximately know which language a document is written in? I have a bunch of text documents (~500K) in mixed languages to import in a i18n enabled CMS (Drupal).. I don't need perfect matches, only some guess. 回答1: There is a pretty easy way to do this, given that you have corpus data in all the different languages you'll need to identify. It's called n-gram modeling. I think Lingua::Identify does this already, though, so that is your best bet rather