I have a set of documents in two languages: English and German. There is no usable meta information about these documents, a program can look at the content only. Based on t
The stop words approach for the two languages is quick and would be made quicker by heavily weighting ones that don't occur in the other language "das" in German and "the" in English, for example. The use of the "exclusive words" would help extend this approach robustly over a larger group of languages as well.