What is the best language detect library or web api available? [even paid] [closed]

别来无恙 提交于 2019-12-29 06:26:22

问题


First of all, i have lot of text available. Let's say, i have 10000 characters for each try. The script is php based, but i can use whatever i want. C++, java, no problem.

The google language api can't be used: their usage limits are to low.

I'ts 6 hours that i try to come out with anything great, but none for now. Can someone point me to my best chance?


回答1:


There is Language Detection API which provides both free and premium service.

It accepts text through GET or POST and provides JSON output with scores.




回答2:


Java based tools are :

Apache Tika : not "all" language profiles, but you can add them yourself

public String detectLangTika(String text) throws SystemException {
    LanguageIdentifier li = new LanguageIdentifier(text);
    if (li.isReasonablyCertain())
        return li.getLanguage();
    else
        throw new Exception("Tika lang detection not reasonably certain");
}

language-detection : A lot of language profiles, works great for me.

    DetectorFactory.loadProfile(new File(LangDetector.class.getClassLoader().getResource("profiles").toURI()));

public String detectLangLD(String text) throws SystemException {

    Detector detector;
    String lang;
    try {
        detector = DetectorFactory.create();
        detector.append(text);
        lang = detector.detect();
    } catch (LangDetectException e) {
        throw new SystemException("LangDetector Failure", e);
    }
    return lang;
}

The most precise tool was the Google API lang detection, which was discontinued and replaced with the paid Google Translate API.




回答3:


A bit late, but I wrote this library (and I'm implementing a free API service without limites).

https://github.com/crodas/LanguageDetector




回答4:


If you are willing to give python a go...take a look at nltk. And I hope you did go through this.




回答5:


There's another freemium API here: Language Detection API

You can easily test the endpoints from that page.

it accepts both GET and POST requests (for longer input) and has a response JSON with this structure:

{
  language: "eng",
  isReliable: "true",
  confidence: "0.9979894639898946"
}

Disclaimer: I'm providing that API.




回答6:


I'd recommend using languagelayer.com, they're offering a free RESTful JSON API web service that can detect around 170 languages. Batch requests are offered as well.

A GET API request (POST encouraged) looks something like this:

https://apilayer.net/api/detect
    ? access_key = YOUR_ACCESS_KEY
    & query = I like apples and oranges

And here's the JSON response:

{
  "success": true,
  "results": [
    {
    "language_code": "en",
    "language_name": "English",
    "probability": 83.896703655741,
    "percentage": 100,
    "reliable_result": true
    }
  ]
} 

5,000 monthly requests are free, if you need more (like I did) then the cheapest subscription is $4.99/mo for 50,000 requests. (More info here)




回答7:


You can use Rosoka. It detects 230 different languages. You can try it through Amazon AWS Market at Rosoka Cloud

You pay for the time used.



来源:https://stackoverflow.com/questions/7025915/what-is-the-best-language-detect-library-or-web-api-available-even-paid

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!