问题
First of all, i have lot of text available. Let's say, i have 10000 characters for each try. The script is php based, but i can use whatever i want. C++, java, no problem.
The google language api can't be used: their usage limits are to low.
I'ts 6 hours that i try to come out with anything great, but none for now. Can someone point me to my best chance?
回答1:
There is Language Detection API which provides both free and premium service.
It accepts text through GET or POST and provides JSON output with scores.
回答2:
Java based tools are :
Apache Tika : not "all" language profiles, but you can add them yourself
public String detectLangTika(String text) throws SystemException {
LanguageIdentifier li = new LanguageIdentifier(text);
if (li.isReasonablyCertain())
return li.getLanguage();
else
throw new Exception("Tika lang detection not reasonably certain");
}
language-detection : A lot of language profiles, works great for me.
DetectorFactory.loadProfile(new File(LangDetector.class.getClassLoader().getResource("profiles").toURI()));
public String detectLangLD(String text) throws SystemException {
Detector detector;
String lang;
try {
detector = DetectorFactory.create();
detector.append(text);
lang = detector.detect();
} catch (LangDetectException e) {
throw new SystemException("LangDetector Failure", e);
}
return lang;
}
The most precise tool was the Google API lang detection, which was discontinued and replaced with the paid Google Translate API.
回答3:
A bit late, but I wrote this library (and I'm implementing a free API service without limites).
https://github.com/crodas/LanguageDetector
回答4:
If you are willing to give python a go...take a look at nltk. And I hope you did go through this.
回答5:
There's another freemium API here: Language Detection API
You can easily test the endpoints from that page.
it accepts both GET and POST requests (for longer input) and has a response JSON with this structure:
{
language: "eng",
isReliable: "true",
confidence: "0.9979894639898946"
}
Disclaimer: I'm providing that API.
回答6:
I'd recommend using languagelayer.com, they're offering a free RESTful JSON API web service that can detect around 170 languages. Batch requests are offered as well.
A GET API request (POST encouraged) looks something like this:
https://apilayer.net/api/detect
? access_key = YOUR_ACCESS_KEY
& query = I like apples and oranges
And here's the JSON response:
{
"success": true,
"results": [
{
"language_code": "en",
"language_name": "English",
"probability": 83.896703655741,
"percentage": 100,
"reliable_result": true
}
]
}
5,000 monthly requests are free, if you need more (like I did) then the cheapest subscription is $4.99/mo for 50,000 requests. (More info here)
回答7:
You can use Rosoka. It detects 230 different languages. You can try it through Amazon AWS Market at Rosoka Cloud
You pay for the time used.
来源:https://stackoverflow.com/questions/7025915/what-is-the-best-language-detect-library-or-web-api-available-even-paid