Suppose I have a dataset with each row containing a sentence stemming from an open-ended question in a very large survey (German and French). Most sentences (answers) are lo