Somebody told me about a class for language recognition in Cocoa. Does anybody know which one it is?
This is not working:
NSSpellCh
There is API in cocoa available to check the language of a string, and it is always best to use Foundation over CoreFoundation whenever possible.
NSArray *tagschemes = [NSArray arrayWithObjects:NSLinguisticTagSchemeLanguage, nil];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options:0];
[tagger setString:@"Das ist ein bisschen deutscher Text. Bitte löschen Sie diesen nicht."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];
Alternatively, if you happen to have mixed language text, you can use the enumerateLinguisticTagsInRange API to get the language of each word in the text.
With Swift 5, you can choose one of the following approaches in order to detect the language of a given string.
NSLinguisticTagger
's dominantLanguage
propertySince iOS 11, NSLinguisticTagger
has a property called dominantLanguage. dominantLanguage
has the following declaration:
var dominantLanguage: String? { get }
Returns the dominant language of the string set for the linguistic tagger.
The Playground sample code below show how to use dominantLanguage
in order to know the dominant language of a string:
import Foundation
let text = "あなたはそれを行うべきではありません。"
let tagger = NSLinguisticTagger(tagSchemes: [.language], options: 0)
tagger.string = text
let language = tagger.dominantLanguage
print(language) // Optional("ja")
NSLinguisticTagger
's dominantLanguage(for:)
methodAs an alternative, NSLinguisticTagger
has a convenience method called dominantLanguage(for:) for creating a new linguistic tagger, setting its string property and getting the dominantLanguage
property. dominantLanguage(for:)
has the following declaration:
class func dominantLanguage(for string: String) -> String?
Returns the dominant language for the specified string.
Usage:
import Foundation
let text = "Die Kleinen haben friedlich zusammen gespielt."
let language = NSLinguisticTagger.dominantLanguage(for: text)
print(language) // Optional("de")
NLLanguageRecognizer
's dominantLanguage
propertySince iOS 12, NLLanguageRecognizer
has a property called dominantLanguage. dominantLanguage
has the following declaration:
var dominantLanguage: NLLanguage? { get }
The most likely language for the processed text.
Here’s how to use dominantLanguage
to guess the dominant language of natural language text:
import NaturalLanguage
let string = "J'ai deux amours. Mon pays et Paris."
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
let language = recognizer.dominantLanguage
print(language?.rawValue) // Optional("fr")
You can use -requestCheckingOfString:…
instead. NSTextCheckingTypeOrthography
attempts to identify the language used in the string, and the completion handler receives an NSOrthography
parameter that can be used to get information about the orthography in the string, including its dominant language.
The following example outputs dominant language = de
:
NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = @"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker requestCheckingOfString:spellCheckText
range:(NSRange){0, [spellCheckText length]}
types:NSTextCheckingTypeOrthography
options:nil
inSpellDocumentWithTag:0
completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
NSLog(@"dominant language = %@", orthography.dominantLanguage);
}];
A swift String extension for Jennifer's answer:
extension String {
func language() -> String? {
let tagger = NSLinguisticTagger(tagSchemes: [NSLinguisticTagSchemeLanguage], options: 0)
tagger.string = self
return tagger.tagAtIndex(0, scheme: NSLinguisticTagSchemeLanguage, tokenRange: nil, sentenceRange: nil)
}
}
Usage:
let language = "What language is this?".language()
As of iOS 11 you can use the dominantLanguage(for:)
/dominantLanguageForString:
class method of NSLinguisticTagger
.
Swift:
extension String {
var language: String? {
return NSLinguisticTagger.dominantLanguage(for: self)
}
}
print("Good morning".language)
print("Buenos días".language)
Objective-C:
@interface NSString (Tagger)
@property (nonatomic, readonly, nullable) NSString *language;
@end
@implementation NSString (Tagger)
- (NSString *)language {
return [NSLinguisticTagger dominantLanguageForString:self];
}
@end
NSLog(@"%@", @"Good morning".language);
NSLog(@"%@", @"Buenos días".language);
Output (for both):
en
es
Thats the result:
- (NSString *)languageForString:(NSString *) text{
if (text.length < 100) {
return (NSString *) CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, text.length));
} else {
return (NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, 100));
}
}