问题
I'm looking for a library or technique to detect the input language of blocks of text provided by users. Online lookups (like Google translate) won't work for this task as I'm writing an app which must run offline.
Thanks.
回答1:
Here are two more n-gram-based gems you might want to try. They work offline.
- https://github.com/echen/unsupervised-language-identification, optimized for separating english and other languages (has a live demo)
- https://github.com/feedbackmine/language_detector, less specialized, will detect more languages. Some languages may need some extra training — I found it to be not precise enough for German text.
回答2:
For anyone interested, I've found http://rubygems.org/gems/kenwaln-whatlanguage, which is performing excellently.
回答3:
I'm using CLD which I really like, succinct and easy to use. Give it a try.
回答4:
A quick demo of WhatLanguage in Ruby:
http://www.youtube.com/watch?v=lNqZ2cqOReo&list=UUJ_3fstMOH-g4yBxtvgAWkw&index=0&feature=plcp
来源:https://stackoverflow.com/questions/3285511/how-can-i-detect-a-users-input-language-using-ruby-without-using-an-online-serv