First of all, i have lot of text available. Let's say, i have 10000 characters for each try. The script is php based, but i can use whatever i want. C++, java, no problem.
The google language api can't be used: their usage limits are to low.
I'ts 6 hours that i try to come out with anything great, but none for now. Can someone point me to my best chance?
There is Language Detection API which provides both free and premium service.
It accepts text through GET or POST and provides JSON output with scores.
Java based tools are :
Apache Tika : not "all" language profiles, but you can add them yourself
public String detectLangTika(String text) throws SystemException {
LanguageIdentifier li = new LanguageIdentifier(text);
if (li.isReasonablyCertain())
return li.getLanguage();
else
throw new Exception("Tika lang detection not reasonably certain");
}
language-detection : A lot of language profiles, works great for me.
DetectorFactory.loadProfile(new File(LangDetector.class.getClassLoader().getResource("profiles").toURI()));
public String detectLangLD(String text) throws SystemException {
Detector detector;
String lang;
try {
detector = DetectorFactory.create();
detector.append(text);
lang = detector.detect();
} catch (LangDetectException e) {
throw new SystemException("LangDetector Failure", e);
}
return lang;
}
The most precise tool was the Google API lang detection, which was discontinued and replaced with the paid Google Translate API.
A bit late, but I wrote this library (and I'm implementing a free API service without limites).
There's another freemium API here: Language Detection API
You can easily test the endpoints from that page.
it accepts both GET and POST requests (for longer input) and has a response JSON with this structure:
{
language: "eng",
isReliable: "true",
confidence: "0.9979894639898946"
}
Disclaimer: I'm providing that API.
I'd recommend using languagelayer.com, they're offering a free RESTful JSON API web service that can detect around 170 languages. Batch requests are offered as well.
A GET API request (POST encouraged) looks something like this:
https://apilayer.net/api/detect
? access_key = YOUR_ACCESS_KEY
& query = I like apples and oranges
And here's the JSON response:
{
"success": true,
"results": [
{
"language_code": "en",
"language_name": "English",
"probability": 83.896703655741,
"percentage": 100,
"reliable_result": true
}
]
}
5,000 monthly requests are free, if you need more (like I did) then the cheapest subscription is $4.99/mo for 50,000 requests. (More info here)
You can use Rosoka. It detects 230 different languages. You can try it through Amazon AWS Market at Rosoka Cloud
You pay for the time used.
来源:https://stackoverflow.com/questions/7025915/what-is-the-best-language-detect-library-or-web-api-available-even-paid