Removing duplicate strings using javascript

后端 未结 6 1846
耶瑟儿~
耶瑟儿~ 2021-01-15 00:27

I have an array of 800 sentences. I want to remove all duplicates (sentences that have the same exact words, but in different order) from the array. So for example \"this is

6条回答
  •  情话喂你
    2021-01-15 01:13

    Use an Object as a lookup to get a quick hashtable-backed check. That means using string as your key type, which means normalising the case/ordering/etc of the words first to get a unique key for each combination of words.

    // Get key for sentence, removing punctuation and normalising case and word order
    // eg 'Hello, a  horse!' -> 'x_a hello horse'
    // the 'x_' prefix is to avoid clashes with any object properties with undesirable
    // special behaviour (like prototype properties in IE) and get a plain lookup
    //
    function getSentenceKey(sentence) {
        var trimmed= sentence.replace(/^\s+/, '').replace(/\s+$/, '').toLowerCase();
        var words= trimmed.replace(/[^\w\s]+/g, '').replace(/\s+/, ' ').split(' ');
        words.sort();
        return 'x_'+words.join(' ');
    }
    
    var lookup= {};
    for (var i= sentences.length; i-->0;) {
        var key= getSentenceKey(sentences[i]);
        if (key in lookup)
            sentences.splice(i, 1);
        else
            lookup[key]= true;
    }
    

    Would need some work if you need to support non-ASCII characters (\w doesn't play well with Unicode in JS, and the question of what constitutes a word in some languages is a difficult one). Also, is "foo bar foo" the same sentence as "bar bar foo"?

提交回复
热议问题